diffray-bot commented on PR #53636:
URL: https://github.com/apache/spark/pull/53636#issuecomment-3699348977

   ## Changes Summary
   
   This PR adds support for normalizing recursive CTEs (rCTEs) in Spark's query 
plan normalization logic by introducing handlers for UnionLoop and UnionLoopRef 
nodes. This enables the single-pass analyzer to correctly compare semantically 
identical recursive CTE queries with different internal IDs.
   
   **Type:** feature
   
   **Components Affected:** Catalyst Query Plan Normalization, Recursive CTE 
Support, Query Plan Comparison Infrastructure
   
   <details>
   <summary><strong>Files Changed</strong></summary>
   
   | File | Summary | Change | Impact |
   |:-----|:--------|:------:|:------:|
   | `...he/spark/sql/catalyst/plans/NormalizePlan.scala` | Added UnionLoop and 
UnionLoopRef normalization handlers, plus a fix to prevent duplicate CTE ID 
mappings | ✏️ | 🟡 |
   | `...ark/sql/catalyst/plans/NormalizePlanSuite.scala` | Added three 
comprehensive test cases for UnionLoopRef, UnionLoop, and recursive CTE 
normalization | ✏️ | 🟢 |
   
   </details>
   
   <details>
   <summary><strong>Architecture Impact</strong></summary>
   
   - **New Patterns:** Visitor pattern extension for rCTE nodes, ID mapping 
pattern for maintaining referential integrity
   - **Dependencies:** added: UnionLoop and UnionLoopRef logical plan classes
   - **Coupling:** NormalizePlan normalization logic now explicitly depends on 
recursive CTE node types (UnionLoop, UnionLoopRef), increasing coupling to the 
CTE implementation details
   
   </details>
   
   **Risk Areas:** Bug fix in normalizeDef() changes behavior - need to verify 
it doesn't break existing non-recursive CTE normalization, ID remapping logic 
complexity - UnionLoopRef mapping uses counter differently than CTERelationRef, 
potential for confusion, Interaction between UnionLoop ID normalization and 
UnionLoopRef counter-based normalization - asymmetric pattern could be 
error-prone
   
   <details>
   <summary><strong>Suggestions</strong></summary>
   
   - Consider adding comments explaining why UnionLoopRef uses counter-based 
assignment vs. mapping lookup (unlike UnionLoop)
   - Verify the bug fix in normalizeDef() doesn't have unintended side effects 
on regular CTEs
   - Consider adding integration tests that verify recursive CTEs normalize 
correctly within larger query plans
   
   </details>
   
   
   <sub>Full review in progress... | Powered by <a 
href="https://diffray.ai?utm_source=github-summary";>diffray</a></sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to