Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/23122 )
Change subject: IMPALA-14094: Calcite planner: Use table and column statistics for optimization ...................................................................... Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java: http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java@91 PS7, Line 91: HdfsEstimatedTableStats In the other related commit, I added a review comment about the class naming and the member variable naming. http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/CalciteTable.java@237 PS7, Line 237: estimatedStats_ Based on line 101, the estimatedStats_ can be null. This should do a precondition non-null check. http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java: http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@56 PS7, Line 56: COMPARISON_SELECTIVITY Snce this is specific to range comparison, not equality comparison, can we rename this to RANGE_COMPARISON_SELECTIVITY ? Also, it would be good to make this and the BETWEEN one public such that it can be referenced from the TestCalciteStats unit test. We want the unit tests to pass without changes even when these ratios are changed in the future. http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@215 PS7, Line 215: * Disjunction Selectivity -> (1 D(1-m1/n)(1-m2/n)) where n is the total Is this similar to what Calcite does for disjunctions ? Would be useful to use common code where possible. http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@262 PS7, Line 262: Currently it picks the : * col with max no of nulls. Not clear what 'picks the col with max no of nulls' means here .. since the code below is returning the null count for a particular column. http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/test/java/org/apache/impala/planner/TestCalciteStats.java File java/calcite-planner/src/test/java/org/apache/impala/planner/TestCalciteStats.java: http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/test/java/org/apache/impala/planner/TestCalciteStats.java@270 PS7, Line 270: (1.0 / 3.0) See prior comment about referencing the default RANGE_COMPARISON_SELECTIVITY public static variable. http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/test/java/org/apache/impala/planner/TestCalciteStats.java@273 PS7, Line 273: 10.0 Shouldn't this reference BIGINT_NDV instead of hardcoding the value here ? http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/test/java/org/apache/impala/planner/TestCalciteStats.java@273 PS7, Line 273: (1.0 / 3.0) Same as above http://gerrit.cloudera.org:8080/#/c/23122/7/java/calcite-planner/src/test/java/org/apache/impala/planner/TestCalciteStats.java@288 PS7, Line 288: 1.0 / 9.0) Same as above. -- To view, visit http://gerrit.cloudera.org:8080/23122 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d5bb50eb562c28e4b7c7a6529d140f98e77295c Gerrit-Change-Number: 23122 Gerrit-PatchSet: 7 Gerrit-Owner: Steve Carlin <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Anonymous Coward (816) Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Comment-Date: Sun, 20 Jul 2025 19:24:03 +0000 Gerrit-HasComments: Yes
