peter-toth commented on code in PR #53019:
URL: https://github.com/apache/spark/pull/53019#discussion_r2519472775
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala:
##########
@@ -61,8 +61,8 @@ class SparkOptimizer(
new RowLevelOperationRuntimeGroupFiltering(OptimizeSubqueries)),
Batch("InjectRuntimeFilter", FixedPoint(1),
InjectRuntimeFilter),
- Batch("MergeScalarSubqueries", Once,
- MergeScalarSubqueries,
+ Batch("MergeSubplans", Once,
+ MergeSubplans,
Review Comment:
This is a very good question and I too was thinking about it lot. Actually,
initially I was trying to add a new rule to handle `Aggregate` nodes without
grouping expressions in join groups only, but then I realized that having more
rules just adds more complexity.
Our goal is to recognize mergeable subparts of the whole plan and extract it
to CTEs and just referece them at the original place. It doesn't matter if such
a subplan is the whole plan of a scalar subquery expression or it is a
non-grouping aggegate node somewhere in the middle of the plan. Actually these
2 do overlap in many cases as the root node of a scalar subquery expression is
an `Aggregate` node in most of the cases. So it doesn't make sense to handle
them differently in 2 rules.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]