Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/22091 )
Change subject: IMPALA-13531: Calcite CTE frontend ...................................................................... Patch Set 10: (12 comments) http://gerrit.cloudera.org:8080/#/c/22091/10/be/src/service/frontend.cc File be/src/service/frontend.cc: http://gerrit.cloudera.org:8080/#/c/22091/10/be/src/service/frontend.cc@93 PS10, Line 93: DEFINE_string(cte_suggester_class, "org.apache.impala.calcite.cte.IdentitySuggester", > Is there a reason we are making this a backend config option rather than a I hadn't thought of it, it does seem like it could be handled per-query. It was based off https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/CommonTableExpressionSuggesterFactory.java, which uses a config option. http://gerrit.cloudera.org:8080/#/c/22091/7/fe/src/main/java/org/apache/impala/planner/CTEScanNode.java File fe/src/main/java/org/apache/impala/planner/CTEScanNode.java: http://gerrit.cloudera.org:8080/#/c/22091/7/fe/src/main/java/org/apache/impala/planner/CTEScanNode.java@59 PS7, Line 59: Preconditions.checkArgument(cteExprs_.size() == desc.getSlots().size()); > Nit: CTEScanNode.class Also seems to be unused. http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java: http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java@36 PS7, Line 36: return (query, conf) -> Collections.emptyList(); > My java brain is fuzzing out here. Does this compile? Apparently. Copied from https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/CommonTableExpressionSuggesterFactory.java#L35. http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java: http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/cte/SuggesterFactory.java@27 PS10, Line 27: public static final String CTE_THRESHOLD = "impala.cte_threshold"; > Is the threshold used? It's used in CalciteOptimizer; probably doesn't make sense to declare it here. http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java: http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaCTEConsumer.java@86 PS7, Line 86: public void setProducer(ImpalaCTEProducer producer) { > Does anyone call this? I would like to avoid making this a mutable class I think all the mutability is needed because there's a link between CTEConsumer and CTEProducer that I'm not representing as a parent/child relationship. That was one of the things switching to an Exchange cleaned up, but they may be orthogonal. http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java: http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@40 PS7, Line 40: > I made some comments below, but this might be dangerous to have. It's necessary for the node to function, and I didn't see a better base class to use. Open to suggestions. http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@71 PS7, Line 71: public double estimateRowCount(RelMetadataQuery mq) { > Is this because we are deriving from AbstractRelNode which requires this? It's necessary for VolcanoPlanner to get correct row counts. When I was looking at this before, estimateRowCount seemed like the expected way to clarify row counts. The default implementation defers to RelMetadataQuery, which will use your configured handlers. http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@81 PS7, Line 81: public List<RelNode> getInputs() { > I'm surprised we need this. Could we derive off of something lower than Ab I didn't see an obvious candidate. AbstractRelNode defines getChildren and methods to set/update them, but leaves it up to concrete implementations how to store them. http://gerrit.cloudera.org:8080/#/c/22091/7/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/node/ImpalaSequence.java@87 PS7, Line 87: inputs_.set(ordinalInParent, p); > Same as above comment. Keeping track of "inputs_" actually gets weird duri I specifically had to implement this to work with the planners. They update inputs as part of their mutation. http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java: http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@203 PS10, Line 203: consumer.setProducer(producer); > So I don't have this quite mapped out in my head right now, but as I mentio I have a patch that gets rid of this by using a parent/child relationship (part of the Exchange experiment). I'll try to bring it in. http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@359 PS10, Line 359: // CTEs currently only supported in SingleNodePlanner. > You mean an executor with a single node here, not the SingleNodePlanner cla Well, the plan fails in DistributedPlanner, so this is a guard against it. Testing for a single node was how I forced it to only use the SingleNodePlanner. http://gerrit.cloudera.org:8080/#/c/22091/10/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java@438 PS10, Line 438: optCluster.invalidateMetadataQuery(); > What is the particular reason we need to invalidate? Maybe Stamatis can help clarify? I think this was based on the original Hive code. I think the goal was to make sure the planner is in an initialized state before we use it, and to reset it afterwards. -- To view, visit http://gerrit.cloudera.org:8080/22091 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0840c0859d2fe25628d799a18d302cee1eb36e8 Gerrit-Change-Number: 22091 Gerrit-PatchSet: 10 Gerrit-Owner: Michael Smith <[email protected]> Gerrit-Reviewer: Anonymous Coward (816) Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Comment-Date: Mon, 01 Dec 2025 17:00:13 +0000 Gerrit-HasComments: Yes
