Re: [PR] [SPARK-44571][SQL] Merge subplans with one row result [spark]

via GitHub Wed, 12 Nov 2025 11:07:59 -0800


peter-toth commented on code in PR #53019:
URL: https://github.com/apache/spark/pull/53019#discussion_r2519472775



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala:
##########
@@ -61,8 +61,8 @@ class SparkOptimizer(
       new RowLevelOperationRuntimeGroupFiltering(OptimizeSubqueries)),
     Batch("InjectRuntimeFilter", FixedPoint(1),
       InjectRuntimeFilter),
-    Batch("MergeScalarSubqueries", Once,
-      MergeScalarSubqueries,
+    Batch("MergeSubplans", Once,
+      MergeSubplans,

Review Comment:
   This is a very good question and I too was thinking about it lot. Actually, 
initially I was trying to add a new rule to handle `Aggregate` nodes without 
grouping expressions in join groups only, but then I realized that having more 
rules just adds more complexity.
   Out goal here is to recognize mergeable subparts of the whole plan and 
extract it to CTEs and just referece them at the original place. It doesn't 
matter if such a subplan is the whole plan of a scalar subquery expression or 
it is a non-grouping aggegate node somewhere in the middle of the plan. 
Actually these 2 do overlap in many cases as the root node of a scalar subquery 
expression is an `Aggregate` node in most of the cases. So it doesn't make 
sense to handle them differently in 2 rules.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-44571][SQL] Merge subplans with one row result [spark]

Reply via email to