Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21377 )
Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column ...................................................................... Patch Set 5: (9 comments) http://gerrit.cloudera.org:8080/#/c/21377/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21377/2//COMMIT_MSG@18 PS2, Line 18: This patch adds a narrow optimization of BetweenPredicate selectivity > If the user's query has a predicate unique_col >=5 AND unique_col <= 10 in Correct. This is mainly because the analysis happen at BetweenToCompoundRule.java. A more general approach will require analyzing all range BinaryPredicate and quickly becomes complicated when they are overlaps with each other. I leave TODO at PlanNode.java to think about a more general approach to this problem. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2626 PS2, Line 2626: > The 'prioritization' part was not clear.. why exactly is it needed ? .. cou Added comment. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@1856 PS2, Line 1856: */ > nit: in the comments above, could you add a note about the acceptDate param Done http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@759 PS2, Line 759: * lower selectivity if analyzed as a pair. > nit: this comment needs updating to reflect the between selectivity. Added as issue number 3. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@774 PS2, Line 774: > nit: have 'been' assigned .. Done http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@104 PS2, Line 104: hasNumDist > Looking at the hasStats() method: It is enough to check for hasNumDistinctValues() only. Updated this code accordingly. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@107 PS2, Line 107: numNotNulls > Is there any test that covers the case where the null count made a differen Added tests against functional_parquet.unique_with_nulls. http://gerrit.cloudera.org:8080/#/c/21377/5/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java: http://gerrit.cloudera.org:8080/#/c/21377/5/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@127 PS5, Line 127: double sel = Math.max(0.0, (double) diff / stats.getNumDistinctValues()); I think test against unique_with_nulls returns wrong cardinality estimate. Should be half of what it is now. I'll check this line again. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@544 PS2, Line 544: * something like 0.33^2. > nit: this last sentence should be updated for the new behavior. Done -- To view, visit http://gerrit.cloudera.org:8080/21377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Gerrit-Change-Number: 21377 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Wed, 22 May 2024 00:44:15 +0000 Gerrit-HasComments: Yes
