Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18543 )

Change subject: IMPALA-11301: Fix extreme = and != selectivity for NDV=1
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@267
PS1, Line 267:     if (distinctValues == 1) distinctValues = 2;
> If NDV == 1,  this would now produce 0.5 instead of 1 which is not right.
Yes, and I do think that 0.5 is "more right" than 1.

For example WHERE NOT col = 1; would still lead to selectivity of 0.0 as NOT is 
calculated as selectivity = 1 -child_selectivity.

The patch was changed to only affect ndv=1, but it still affect both = and !=


http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@273
PS1, Line 273:       if (op_ == Operator.DISTINCT_FROM && rChildIsNull) {
> Since the distinctValues == 1 is a special case which causes 0 to be produc
I have changed the patch to only affect ndv=1.

The reason is the large number of tests it broke and the complexity of handling 
the case when distinctValues + 1 becomes larger than than the maximum ndv for a 
give type (for example 2 for booleans).

I am not sure whether "distinctValues + 1" "distinctValues" is more correct. as 
the original formula doesn't count with the possibility of the value not being 
present in the table.



--
To view, visit http://gerrit.cloudera.org:8080/18543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b5334a8d7d6ca46a450ff98ae03e5269faaa3c6
Gerrit-Change-Number: 18543
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 18 May 2022 20:23:28 +0000
Gerrit-HasComments: Yes

Reply via email to