Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22091 )

Change subject: IMPALA-13531: Calcite CTE frontend
......................................................................


Patch Set 10:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/22091/10/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/22091/10/be/src/service/frontend.cc@93
PS10, Line 93: DEFINE_string(cte_suggester_class, 
"org.apache.impala.calcite.cte.IdentitySuggester",
> Is there a reason we are making this a backend config option rather than a
I hadn't thought of it, it does seem like it could be handled per-query. It was 
based off 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/CommonTableExpressionSuggesterFactory.java,
 which uses a config option.


http://gerrit.cloudera.org:8080/#/c/22091/7/fe/src/main/java/org/apache/impala/planner/CTEScanNode.java
File fe/src/main/java/org/apache/impala/planner/CTEScanNode.java:

http://gerrit.cloudera.org:8080/#/c/22091/7/fe/src/main/java/org/apache/impala/planner/CTEScanNode.java@59
PS7, Line 59:     Preconditions.checkArgument(cteExprs_.size() == 
desc.getSlots().size());
> Nit: CTEScanNode.class
Also seems to be unused.


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java:

http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java@36
PS7, Line 36:       return (query, conf) -> Collections.emptyList();
> My java brain is fuzzing out here.  Does this compile?
Apparently. Copied from 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/CommonTableExpressionSuggesterFactory.java#L35.


http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java:

http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java@27
PS10, Line 27:   public static final String CTE_THRESHOLD = 
"impala.cte_threshold";
> Is the threshold used?
It's used in CalciteOptimizer; probably doesn't make sense to declare it here.


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java:

http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java@86
PS7, Line 86:   public void setProducer(ImpalaCTEProducer producer) {
> Does anyone call this?  I would like to avoid making this a mutable class
I think all the mutability is needed because there's a link between CTEConsumer 
and CTEProducer that I'm not representing as a parent/child relationship. That 
was one of the things switching to an Exchange cleaned up, but they may be 
orthogonal.


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java:

http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@40
PS7, Line 40:
> I made some comments below, but this might be dangerous to have.
It's necessary for the node to function, and I didn't see a better base class 
to use. Open to suggestions.


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@71
PS7, Line 71:   public double estimateRowCount(RelMetadataQuery mq) {
> Is this because we are deriving from AbstractRelNode which requires this?
It's necessary for VolcanoPlanner to get correct row counts. When I was looking 
at this before, estimateRowCount seemed like the expected way to clarify row 
counts. The default implementation defers to RelMetadataQuery, which will use 
your configured handlers.


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@81
PS7, Line 81:   public List<RelNode> getInputs() {
> I'm surprised we need this.  Could we derive off of something lower than Ab
I didn't see an obvious candidate. AbstractRelNode defines getChildren and 
methods to set/update them, but leaves it up to concrete implementations how to 
store them.


http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@87
PS7, Line 87:     inputs_.set(ordinalInParent, p);
> Same as above comment.  Keeping track of "inputs_" actually gets weird duri
I specifically had to implement this to work with the planners. They update 
inputs as part of their mutation.


http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java:

http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@203
PS10, Line 203:         consumer.setProducer(producer);
> So I don't have this quite mapped out in my head right now, but as I mentio
I have a patch that gets rid of this by using a parent/child relationship (part 
of the Exchange experiment). I'll try to bring it in.


http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@359
PS10, Line 359:       // CTEs currently only supported in SingleNodePlanner.
> You mean an executor with a single node here, not the SingleNodePlanner cla
Well, the plan fails in DistributedPlanner, so this is a guard against it. 
Testing for a single node was how I forced it to only use the SingleNodePlanner.


http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@438
PS10, Line 438:     optCluster.invalidateMetadataQuery();
> What is the particular reason we need to invalidate?
Maybe Stamatis can help clarify? I think this was based on the original Hive 
code.

I think the goal was to make sure the planner is in an initialized state before 
we use it, and to reset it afterwards.



-- 
To view, visit http://gerrit.cloudera.org:8080/22091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0840c0859d2fe25628d799a18d302cee1eb36e8
Gerrit-Change-Number: 22091
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Anonymous Coward (816)
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Comment-Date: Mon, 01 Dec 2025 17:00:13 +0000
Gerrit-HasComments: Yes

Reply via email to