Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16349 )
Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs. ...................................................................... Patch Set 4: (2 comments) > Patch Set 3: > > (1 comment) > > I think this makes sense to me - the only question really would be whether > this should be broader and increase the NDV in all cases where we know there > are nulls. I think fixing this really bad case first is OK though. Yes I tried the general fix but got some bad plans. The extra change given like booleans, we know how many NULL there are in the stats predicates with value checks that definitely filter out NULLs need should: 1. Not include the +1 for null in the selectivity. 2. Remove the null value count from cardinality. That's just a much larger change in terms of plan diffs and corner cases, so wanted to get this worst case "base case" in sooner. http://gerrit.cloudera.org:8080/#/c/16349/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/16349/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@96 PS3, Line 96: if (numDistinctValues_ == 0 && desc_.getIsNullable() && desc_.getStats() != null > Just wonder if the new logic here should be moved to ::getNumDistinctValues I was thinking about that but we'd want to check the slot nullability which is awkward to do from with the stats object. I'm going to leave it in place like this for now. http://gerrit.cloudera.org:8080/#/c/16349/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@97 PS3, Line 97: && (desc_.getStats().hasNulls() || > It might be safer to assume that there are nulls in the absence of null sta Yes it would be rare at the least but should be an easy check. -- To view, visit http://gerrit.cloudera.org:8080/16349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032 Gerrit-Change-Number: 16349 Gerrit-PatchSet: 4 Gerrit-Owner: Shant Hovsepian <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Shant Hovsepian <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Tue, 25 Aug 2020 02:49:20 +0000 Gerrit-HasComments: Yes
