Joe McDonnell has uploaded a new patch set (#4). Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs ......................................................................
IMPALA-4792: Fix number of distinct values for a CASE with constant outputs If all the return values of a Case expression have a known number of distinct values (i.e. they are constant or statistics exist), then the number of distinct values for the Case can be computed using this information. In order for the value from Case to be used at higher levels in the tree, the implementation of computeNumDistinctValues for Expr needed to change. Previously, Expr calculated the number of distinct values by finding any SlotRefs in its tree and taking the maximum of the distinct values from those SlotRefs. This would ignore the value from CaseExpr. To fix this, Expr now takes the maximum number of distinct values across all of its children. -- explaining this statement shows cardinality = 2 explain select distinct case when id = 1 then 'yes' else 'no' end from functional.alltypes; -- explaining this statement shows cardinality = 2 explain select distinct char_length(case when id = 1 then 'yes' else 'no' end) from functional.alltypes; -- explaining this statement shows cardinality = 7300 explain select distinct case when id = 1 then 0 else id end from functional.alltypes; -- explaining this statement shows cardinality = 737 (date_string_col has lower -- cardinality than id) explain select distinct case when id = 1 then 'yes' else date_string_col end from functional.alltypes; For cases when the number of distinct values is not known for all the outputs, this will return -1, indicating that the number of distinct values is not known. The inputs (whens) are not used for calculating the number of distinct values. Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 --- M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java 2 files changed, 73 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/5768/4 -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]>
