Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18543


Change subject: WIP: IMPALA-11301: Fix = and != selectivity for very low NDVs
......................................................................

WIP: IMPALA-11301: Fix = and != selectivity for very low NDVs

The original selectivity of 1.0/ndv and 1.0 - 1.0/ndv make sense
for large NDVs, but the result is 1.0 and 0.0 in case of ndv==1,
which can lead to extremely wrong cardinality estimates.

Adding 1 to the ndv in the formulas also accounts for the possibility
of 0 matching elements in the dataset and helps avoiding extreme
selectivity. IMPALA-7601 contains some discussion about these formulas.

The change is in WIP state because I had not time to run the
tests yet.

Change-Id: I6b5334a8d7d6ca46a450ff98ae03e5269faaa3c6
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
1 file changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/18543/1
--
To view, visit http://gerrit.cloudera.org:8080/18543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6b5334a8d7d6ca46a450ff98ae03e5269faaa3c6
Gerrit-Change-Number: 18543
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <[email protected]>

Reply via email to