Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23122 )

Change subject: IMPALA-14094: Calcite planner: Use table and column statistics 
for optimization
......................................................................


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java:

http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java@91
PS7, Line 91: HdfsEstimatedTableStats
In the other related commit, I added a review comment about the class naming 
and the member variable naming.


http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java@237
PS7, Line 237: estimatedStats_
Based on line 101, the estimatedStats_ can be null.   This should do a 
precondition non-null check.


http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java:

http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@56
PS7, Line 56: COMPARISON_SELECTIVITY
Snce this is specific to range comparison, not equality comparison, can we 
rename this to RANGE_COMPARISON_SELECTIVITY ?
Also, it would be good to make this and the BETWEEN one public such that it can 
be referenced from the TestCalciteStats unit test.   We want the unit tests to 
pass without changes even when these ratios are changed in the future.


http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@215
PS7, Line 215:    * Disjunction Selectivity -> (1 D(1-m1/n)(1-m2/n)) where n is 
the total
Is this similar to what Calcite does for disjunctions ?  Would be useful to use 
common code where possible.


http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@262
PS7, Line 262: Currently it picks the
             :    * col with max no of nulls.
Not clear what 'picks the col with max no of nulls' means here .. since the 
code below is  returning the null count for a particular column.


http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/test/java/org/apache/impala/planner/TestCalciteStats.java
File 
java/calcite-planner/src/test/java/org/apache/impala/planner/TestCalciteStats.java:

http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/test/java/org/apache/impala/planner/TestCalciteStats.java@270
PS7, Line 270: (1.0 / 3.0)
See prior comment about referencing the default RANGE_COMPARISON_SELECTIVITY 
public static variable.


http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/test/java/org/apache/impala/planner/TestCalciteStats.java@273
PS7, Line 273: 10.0
Shouldn't this reference BIGINT_NDV instead of hardcoding the value here ?


http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/test/java/org/apache/impala/planner/TestCalciteStats.java@273
PS7, Line 273: (1.0 / 3.0)
Same as above


http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/test/java/org/apache/impala/planner/TestCalciteStats.java@288
PS7, Line 288: 1.0 / 9.0)
Same as above.



--
To view, visit http://gerrit.cloudera.org:8080/23122
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d5bb50eb562c28e4b7c7a6529d140f98e77295c
Gerrit-Change-Number: 23122
Gerrit-PatchSet: 7
Gerrit-Owner: Steve Carlin <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Anonymous Coward (816)
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Comment-Date: Sun, 20 Jul 2025 19:24:03 +0000
Gerrit-HasComments: Yes

Reply via email to