Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )
Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV ...................................................................... Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@665 PS1, Line 665: buildKeyNdv_ = srcExpr_.getNumDistinctValues(); > My understanding about bloom filter is that it is better to have slightly b AFAIK the NDV estimates are much more conservative than the cardinality estimate we use, as the cardinality is decreased based on the estimated selectivity of predicates while NDV comes from column stats (possibly modified by expression, e.g. % 8 would decrease it 8, or a * b would multiply their NDVs too). If Impala creates too small Bloom filters at the moment then I think that it is the result of the "optimistic" cardinality estimates / small target FPP / small RUNTIME_FILTER_MAX_SIZE So I think that this is a safe change in general. There is already query option RUNTIME_FILTER_MIN_SIZE to set a minimum for the Bloom filter size. Was thinking a bit about the problem of wrong Bloom filter sizes and created a ticket about trying to get the sizes during query execution instead of relying on the planner: IMPALA-11928 http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@774 PS1, Line 774: if (numBloomFilters >= maxNumBloomFilters) continue; > Not directly related to the NDV change but it would be useful to log under I agree with the usefulness of this, will add them in the next patch. -- To view, visit http://gerrit.cloudera.org:8080/19506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a Gerrit-Change-Number: 19506 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Fri, 17 Feb 2023 09:46:14 +0000 Gerrit-HasComments: Yes
