Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23679 )

Change subject: IMPALA-14525: Calcite planner: Add support for RexSimplify
......................................................................


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/23679/6/fe/src/main/java/org/apache/impala/analysis/InPredicate.java
File fe/src/main/java/org/apache/impala/analysis/InPredicate.java:

http://gerrit.cloudera.org:8080/#/c/23679/6/fe/src/main/java/org/apache/impala/analysis/InPredicate.java@169
PS6, Line 169:     boolean useSetLookup = isAllConstant && (children_.size() - 
1 >= setLookupThreshold);
> This inverts the previous logic
Not intentional.  Great catch, thanks!


http://gerrit.cloudera.org:8080/#/c/23679/6/java/calcite-planner/src/main/java/org/apache/impala/calcite/coercenodes/CoerceOperandShuttle.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/coercenodes/CoerceOperandShuttle.java:

http://gerrit.cloudera.org:8080/#/c/23679/6/java/calcite-planner/src/main/java/org/apache/impala/calcite/coercenodes/CoerceOperandShuttle.java@362
PS6, Line 362:    * Returns true if RexNode is nullable. Literals are only 
nullable if they contain
> nit: double space inconsistent with the other sentences
My 80's-ness is showing.


http://gerrit.cloudera.org:8080/#/c/23679/6/java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/RexLiteralConverter.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/RexLiteralConverter.java:

http://gerrit.cloudera.org:8080/#/c/23679/6/java/calcite-planner/src/main/java/org/apache/impala/calcite/functions/RexLiteralConverter.java@98
PS6, Line 98:         // Always treat all string literals as type STRING
> This change isn't mentioned in the commit message. What prompted it?
This is just a bug that showed up.

String literals should always be type string, and should never be varchar.  
Will include in commit message.


http://gerrit.cloudera.org:8080/#/c/23679/6/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaInOperator.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaInOperator.java:

http://gerrit.cloudera.org:8080/#/c/23679/6/java/calcite-planner/src/main/java/org/apache/impala/calcite/operators/ImpalaInOperator.java@30
PS6, Line 30:   private ImpalaInOperator() {
> Should this be public? You seem to create a canonical version in OP.
Done


http://gerrit.cloudera.org:8080/#/c/23679/6/java/calcite-planner/src/main/java/org/apache/impala/calcite/rules/ImpalaCoreRules.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rules/ImpalaCoreRules.java:

http://gerrit.cloudera.org:8080/#/c/23679/6/java/calcite-planner/src/main/java/org/apache/impala/calcite/rules/ImpalaCoreRules.java@50
PS6, Line 50:  * can differ from Impala's constant folding.
> nit: "differ from Impala's constant folding."
Done


http://gerrit.cloudera.org:8080/#/c/23679/6/java/calcite-planner/src/main/java/org/apache/impala/calcite/rules/ImpalaCoreRules.java@56
PS6, Line 56:   private static final RelBuilder.Config CONFIG_NO_SIMPLIFY =
> This seems like it could be annoying to maintain. I don't see any way aroun
Sigh, ack, but I think the one saving grace here is that I put all the rules in 
here.  If someone were to add a new rule, they would see that all the rules are 
defined in ImpalaCoreRules and hopefully notice to use the NO_SIMPLIFY

Another helpful thing is that not all rules call simplify either.  I decided to 
use the NO_SIMPLIFY RelBuilderFactory for all rules for safety purposes in case 
someone changes a rule within Calcite which adds simplify.

Lastly, the worst thing that really happens if it's not maintained properly 
isn't really a huge disaster and is prolly something no one will notice.  The 
Calcite simplify changes the way NaN is used which doesn't map the way Impala 
uses it.  Some unit tests will break without this change.  If a bug creeps out 
into the real world, it would prolly be extremely rare since and perhaps not 
even noticed since NaN behavior is kinda undefined across databases, though 
theoretically consistent within ours.

But yeah, I admit, it's not pretty.


http://gerrit.cloudera.org:8080/#/c/23679/6/testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q01.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q01.test:

http://gerrit.cloudera.org:8080/#/c/23679/6/testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q01.test@29
PS6, Line 29: Per-Host Resource Estimates: Memory=6.13GB
> This is a large change. It might be related to https://gerrit.cloudera.org/
This was actually mentioned in the commit message already, but I only mentioned 
it as a cardinality change.  It prolly changed the memory estimate too.

If you look at 01: SCAN HDFS, you will see that the cardinality estimate 
changed from 16.08M to 147M.  Previous to this change, the predicate used the 
function "is_not_null_pred".  The filter selectivity estimator treated this as 
a normal function and assumed 90% of the rows would be filtered.  But since it 
is really a "IS NOT NULL" predicate, we can be more precise since there are 
statistics that handle this.  And most rows are not null here, so reducing by 
90% was a bad estimate before this commit.



--
To view, visit http://gerrit.cloudera.org:8080/23679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I44792688f361bf15affa565e5de5709f64dcf18c
Gerrit-Change-Number: 23679
Gerrit-PatchSet: 7
Gerrit-Owner: Steve Carlin <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Pranav Lodha <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Comment-Date: Tue, 30 Dec 2025 17:18:13 +0000
Gerrit-HasComments: Yes

Reply via email to