Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@665
PS1, Line 665: buildKeyNdv_ = srcExpr_.getNumDistinctValues();
> My understanding about bloom filter is that it is better to have slightly b
AFAIK the NDV estimates are much more conservative than the cardinality 
estimate we use, as the cardinality is decreased based on the estimated 
selectivity of predicates while NDV comes from column stats (possibly modified 
by expression, e.g. % 8 would decrease it 8, or a * b would multiply their NDVs 
too). If Impala creates too small Bloom filters at the moment then I think that 
it is the result of the "optimistic" cardinality estimates / small target FPP / 
small RUNTIME_FILTER_MAX_SIZE

So I think that this is a safe change in general. There is already query option 
RUNTIME_FILTER_MIN_SIZE to set a minimum for the Bloom filter size.

Was thinking a bit about the problem of wrong Bloom filter sizes and created a 
ticket about trying to get the sizes during query execution instead of relying 
on the planner:
IMPALA-11928


http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@774
PS1, Line 774:         if (numBloomFilters >= maxNumBloomFilters) continue;
> Not directly related to the NDV change but it would be useful to log under
I agree with the usefulness of this, will add them in the next patch.



--
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Fri, 17 Feb 2023 09:46:14 +0000
Gerrit-HasComments: Yes

Reply via email to