[email protected] has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: Optimize expression to collect NULLs count
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11565/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@250
PS1, Line 250:       columnStatsSelectList.add("COUNT(*) - COUNT(" + colRefSql 
+ ")");
> It's not obvious to me that this should be faster after we do code generati
I haven't. I naively assume there must be something to do with COUNT over IF 
expression that caused slow down. Otherwise I couldn't understand why would 
counting NULLs be ever slower than NDV or SAMPLED_NDV used for counting 
distinct values.

I don't have an Impala installation that I could use for experiments. For now, 
I will just remove possibly misleading suggestions from the commit message. 
Would you be able to do some experiment with this?



--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 1
Gerrit-Owner: [email protected]
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: [email protected]
Gerrit-Comment-Date: Wed, 03 Oct 2018 09:20:52 +0000
Gerrit-HasComments: Yes

Reply via email to