Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23930 )

Change subject: IMPALA-14716: Calcite Planner: Make condition estimates more 
similar to original planner
......................................................................


Patch Set 12:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG@10
PS12, Line 10: little bit more similar to what
             : is being done with the original Impala planner.
The comment here seems incomplete without a reason for doing this. Can you add 
the rationale here ?  I presume it is to avoid regressions.  In some selected 
cases, we should hopefully improve on the original estimates.


http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG@13
PS12, Line 13: conditions
nit: did you mean unknown 'estimates' ?  Not sure what unknown 'conditions' 
means here since the predicates are known patterns.


http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG@14
PS12, Line 14: functions
nit: expressions rather than functions


http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG@23
PS12, Line 23: XXXXX will be filled in after review
nit: Assuming this is created, pls add the jira id.


http://gerrit.cloudera.org:8080/#/c/23930/11/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java:

http://gerrit.cloudera.org:8080/#/c/23930/11/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@206
PS11, Line 206: IMPALA-XXXXX
I think you already filed a jira for this ? Best to add the number here.


http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java:

http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@138
PS12, Line 138: it will be ignored.
It's not ignored from what I can tell .. on line #73 it assigns the 
Expr.DEFAULT_SELECTIVITY for such cases.


http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@254
PS12, Line 254:     return Math.max(0.0, Math.min(1.0, selectivity));
This would return 0 if the loop is not executed.  The default should be 1.0, 
not 0.


http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@273
PS12, Line 273:     Collections.sort(selectivities);
nit: pls add a comment about why sorting the selectivities is needed as it is 
not obvious unless one is aware of the fact that we want to apply most 
selective predicates first.


http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@280
PS12, Line 280:     return selectivity;
The corresponding Impala estimator applies a final bounds check between [0, 1]:
 Math.max(0.0, Math.min(1.0, result));
Best to do that here too.


http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java
File 
java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java:

http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java@195
PS12, Line 195: //XXX:  FIX THIS
Unclear why we need the hardcoded value. What can be done in the above 
calculation to provide the expected value.


http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java@334
PS12, Line 334: 10.0
Where did this constant 10 come from ? if it's the NDV for this column, can you 
declare it as a static constant in the class so it can be referenced elsewhere 
too.


http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java@337
PS12, Line 337: distinctRows
'distinctRows' is not used anymore ?  Also, I am not sure why we care about 
distinct row count.  We just want the row count for the IN predicate.  It's the 
same row count we would expect for an OR predicate.



--
To view, visit http://gerrit.cloudera.org:8080/23930
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3b9a25259916504296dbd9a9cb9466be8fac8718
Gerrit-Change-Number: 23930
Gerrit-PatchSet: 12
Gerrit-Owner: Steve Carlin <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Comment-Date: Tue, 31 Mar 2026 02:58:40 +0000
Gerrit-HasComments: Yes

Reply via email to