Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18543 )
Change subject: IMPALA-11301: Fix extreme = and != selectivity for NDV=1 ...................................................................... Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@267 PS1, Line 267: if (distinctValues == 1) distinctValues = 2; > If NDV == 1, this would now produce 0.5 instead of 1 which is not right. Yes, and I do think that 0.5 is "more right" than 1. For example WHERE NOT col = 1; would still lead to selectivity of 0.0 as NOT is calculated as selectivity = 1 -child_selectivity. The patch was changed to only affect ndv=1, but it still affect both = and != http://gerrit.cloudera.org:8080/#/c/18543/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@273 PS1, Line 273: if (op_ == Operator.DISTINCT_FROM && rChildIsNull) { > Since the distinctValues == 1 is a special case which causes 0 to be produc I have changed the patch to only affect ndv=1. The reason is the large number of tests it broke and the complexity of handling the case when distinctValues + 1 becomes larger than than the maximum ndv for a give type (for example 2 for booleans). I am not sure whether "distinctValues + 1" "distinctValues" is more correct. as the original formula doesn't count with the possibility of the value not being present in the table. -- To view, visit http://gerrit.cloudera.org:8080/18543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6b5334a8d7d6ca46a450ff98ae03e5269faaa3c6 Gerrit-Change-Number: 18543 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 18 May 2022 20:23:28 +0000 Gerrit-HasComments: Yes
