Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/21377 )
Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column ...................................................................... Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/21377/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21377/2//COMMIT_MSG@18 PS2, Line 18: This patch adds a narrow optimization of BetweenPredicate selectivity If the user's query has a predicate unique_col >=5 AND unique_col <= 10 instead of the BETWEEN, then this patch is not applying the selectivity calculation even though BETWEEN gets converted to the first representation. Is the reason mainly to keep this narrow in scope ? http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2626 PS2, Line 2626: // Prioritize members of 'betweenPredicates' ahead of 'result'. The 'prioritization' part was not clear.. why exactly is it needed ? .. could you elaborate a bit on this ? http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@1856 PS2, Line 1856: public long evalToInteger(Analyzer analyzer, String name, boolean acceptDate) nit: in the comments above, could you add a note about the acceptDate parameter ? http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@759 PS2, Line 759: * The first issue is addressed by using a single default selectivity that is nit: this comment needs updating to reflect the between selectivity. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@774 PS2, Line 774: assigned nit: have 'been' assigned .. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@104 PS2, Line 104: hasStats() Looking at the hasStats() method: public boolean hasStats() { return numNulls_ != -1 || numDistinctValues_ != -1; } It can return true if either of the 2 stats are present. But the computation below depends on both being present. So it would be good to check for both explicitly. http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@107 PS2, Line 107: numNotNulls Is there any test that covers the case where the null count made a difference to the Between selectivity ? http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java: http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@544 PS2, Line 544: * something like 0.33^2. nit: this last sentence should be updated for the new behavior. -- To view, visit http://gerrit.cloudera.org:8080/21377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Gerrit-Change-Number: 21377 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Mon, 20 May 2024 06:44:01 +0000 Gerrit-HasComments: Yes
