[email protected] has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )
Change subject: Optimize expression to collect NULLs count ...................................................................... Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11565/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/11565/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@250 PS1, Line 250: columnStatsSelectList.add("COUNT(*) - COUNT(" + colRefSql + ")"); > It's not obvious to me that this should be faster after we do code generati I haven't. I naively assume there must be something to do with COUNT over IF expression that caused slow down. Otherwise I couldn't understand why would counting NULLs be ever slower than NDV or SAMPLED_NDV used for counting distinct values. I don't have an Impala installation that I could use for experiments. For now, I will just remove possibly misleading suggestions from the commit message. Would you be able to do some experiment with this? -- To view, visit http://gerrit.cloudera.org:8080/11565 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a Gerrit-Change-Number: 11565 Gerrit-PatchSet: 1 Gerrit-Owner: [email protected] Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: [email protected] Gerrit-Comment-Date: Wed, 03 Oct 2018 09:20:52 +0000 Gerrit-HasComments: Yes
