Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16349 )

Change subject: IMPALA-7310: Partial fix for NDV cardinality with NULLs.
......................................................................


Patch Set 4:

(2 comments)

> Patch Set 3:
>
> (1 comment)
>
> I think this makes sense to me - the only question really would be whether 
> this should be broader and increase the NDV in all cases where we know there 
> are nulls. I think fixing this really bad case first is OK though.

Yes I tried the general fix but got some bad plans. The extra change given like 
booleans, we know how many NULL there are in the stats predicates with value 
checks that definitely filter out NULLs need should:
1. Not include the +1 for null in the selectivity.
2. Remove the null value count from cardinality.

That's just a much larger change in terms of plan diffs and corner cases, so 
wanted to get this worst case "base case" in sooner.

http://gerrit.cloudera.org:8080/#/c/16349/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/16349/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@96
PS3, Line 96: if (numDistinctValues_ == 0 && desc_.getIsNullable() && 
desc_.getStats() != null
> Just wonder if the new logic here should be moved to ::getNumDistinctValues
I was thinking about that but we'd want to check the slot nullability which is 
awkward to do from with the stats object. I'm going to leave it in place like 
this for now.


http://gerrit.cloudera.org:8080/#/c/16349/3/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@97
PS3, Line 97: && (desc_.getStats().hasNulls() ||
> It might be safer to assume that there are nulls in the absence of null sta
Yes it would be rare at the least but should be an easy check.



--
To view, visit http://gerrit.cloudera.org:8080/16349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec967053b4991f8c67cde62adf003cbd3f429032
Gerrit-Change-Number: 16349
Gerrit-PatchSet: 4
Gerrit-Owner: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Tue, 25 Aug 2020 02:49:20 +0000
Gerrit-HasComments: Yes

Reply via email to