[ 
https://issues.apache.org/jira/browse/IMPALA-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715789#comment-16715789
 ] 

ASF subversion and git services commented on IMPALA-1003:
---------------------------------------------------------

Commit 04d027df13e1c3c5c654b5a0bc965b670483b535 in impala's branch 
refs/heads/master from Bharath Vissapragada
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=04d027d ]

IMPALA-7659: Populate NULL count while computing column stats

It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.

This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).

Tests:
=====

- Updated the affected tests to include the null counts.
- Added unit tests that verify IS [NOT] NULL predicates' cardinality
  estimation.

Perf note:
=========

I reran the compute stats child query with null counts included on the
store_sales table from 1000 SF (1TB) tpcds dataset. The table had 22
non-partitioned columns (on which null counts were computed) and ~2.8B
rows. This experiment showed around 7-8% perf drop compared to the same
child query without null counts for these columns.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Reviewed-on: http://gerrit.cloudera.org:8080/11565
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Improve compute stats performance
> ---------------------------------
>
>                 Key: IMPALA-1003
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1003
>             Project: IMPALA
>          Issue Type: Improvement
>    Affects Versions: Impala 1.3
>            Reporter: Matthew Jacobs
>            Assignee: Ippokratis Pandis
>            Priority: Major
>             Fix For: Impala 1.4
>
>
> We should remove unnecessary computations from the compute stats query and 
> use more codegen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to