peter-toth opened a new pull request, #55654:
URL: https://github.com/apache/spark/pull/55654

   ### What changes were proposed in this pull request?
   
   When `PlanMerger` merges N non-grouping subplans where the first has no 
filter and the 2nd and 3rd share the same filter condition, the merged child 
`Project` already contains an alias for that condition after the 1st+2nd merge 
round. The 3rd merge should reuse that alias instead of creating a redundant 
one. Two fixes are applied.
   
   **Fix 1 — symmetric reuse check in `(np: Filter, cp)`.** The `(np: Filter, 
cp: Filter)` case already had a reuse check: when the cp filter carries 
`MERGED_FILTER_TAG`, it looks for an existing alias in the child `Project` and 
reuses it instead of creating a new one. The `(np: Filter, cp)` case now gets 
the same check, making the two cases symmetric.
   
   **Fix 2 — reorder match cases so Filter cases precede Project-peeling 
cases.** For the reuse check in fix 1 to work, the merged child must still be a 
`Project` at the point the check runs. When the cached plan's child is itself a 
`Project` (as it is after the first merge round), the generic `(np, cp: 
Project)` case was firing first and peeling that Project layer, causing the 
recursion to see a `LocalRelation` with no aliased conditions. The fix reorders 
the match so that all Filter cases precede the generic Project-peeling cases. 
The `(np: Filter, cp: Filter)` case is kept before `(np: Filter, cp)` to 
prevent `(Filter, Filter)` pairs from being handled by the asymmetric 
propagation path. The `(np: Project, cp: Project)` case is also moved into the 
Project group for clarity.
   
   ### Why are the changes needed?
   
   Without this fix, merging three non-grouping subplans where the 2nd and 3rd 
carry the same filter condition produces two redundant `propagatedFilter` 
aliases with identical expressions, resulting in an unnecessarily larger merged 
plan.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Added `SPARK-56703: Merge three non-grouping subqueries where the third has 
the same filter condition as the second` to `MergeSubplansSuite`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Sonnet 4.6
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to