Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21377 )

Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique 
column
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/21377/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21377/2//COMMIT_MSG@18
PS2, Line 18: This patch adds a narrow optimization of BetweenPredicate 
selectivity
If the user's query has  a predicate unique_col >=5 AND unique_col <= 10 
instead of the BETWEEN, then this patch is not applying the selectivity 
calculation even though BETWEEN gets converted to the first representation.  Is 
the reason mainly to keep this narrow in scope ?


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2626
PS2, Line 2626:       // Prioritize members of 'betweenPredicates' ahead of 
'result'.
The 'prioritization' part was not clear.. why exactly is it needed ? .. could 
you elaborate a bit on this ?


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@1856
PS2, Line 1856:   public long evalToInteger(Analyzer analyzer, String name, 
boolean acceptDate)
nit: in the comments above, could you add a note about the acceptDate parameter 
?


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@759
PS2, Line 759:    * The first issue is addressed by using a single default 
selectivity that is
nit: this comment needs updating to reflect the between selectivity.


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@774
PS2, Line 774: assigned
nit: have 'been' assigned ..


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java
File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java:

http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@104
PS2, Line 104: hasStats()
Looking at the hasStats() method:
   public boolean hasStats() { return numNulls_ != -1 || numDistinctValues_ != 
-1; }

It can return true if either of the 2 stats are present.  But the computation 
below depends on both being present.  So it would be good to check for both 
explicitly.


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@107
PS2, Line 107: numNotNulls
Is there any test that covers the case where the null count made a difference 
to the Between selectivity ?


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@544
PS2, Line 544:    * something like 0.33^2.
nit: this last sentence should be updated for the new behavior.



--
To view, visit http://gerrit.cloudera.org:8080/21377
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10
Gerrit-Change-Number: 21377
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Mon, 20 May 2024 06:44:01 +0000
Gerrit-HasComments: Yes

Reply via email to