diffray-bot commented on PR #53636:
URL: https://github.com/apache/spark/pull/53636#issuecomment-3698102591

   ## Changes Summary
   
   This PR adds support for normalizing UnionLoop and UnionLoopRef nodes in the 
plan comparison logic. It introduces new normalization methods in 
CteIdNormalizer and corresponding test cases to enable testing of recursive 
CTEs in the single-pass analyzer.
   
   **Type:** feature
   
   **Components Affected:** catalyst/plans/NormalizePlan, 
catalyst/plans/CteIdNormalizer, test suite for NormalizePlan
   
   <details>
   <summary><strong>Files Changed</strong></summary>
   
   | File | Summary | Change | Impact |
   |:-----|:--------|:------:|:------:|
   | `...he/spark/sql/catalyst/plans/NormalizePlan.scala` | Added handling for 
UnionLoop and UnionLoopRef nodes in the plan normalization transformation and 
added corresponding normalization methods to CteIdNormalizer class. | ✏️ | 🟡 |
   | `...ark/sql/catalyst/plans/NormalizePlanSuite.scala` | Added three new 
test cases for UnionLoopRef ID normalization, UnionLoop ID and outputAttrIds 
normalization, and complete recursive CTE normalization. | ✏️ | 🟡 |
   
   </details>
   
   <details>
   <summary><strong>Architecture Impact</strong></summary>
   
   - **New Patterns:** Parallel normalization patterns for recursive CTE nodes 
(similar to existing CTE handling)
   - **Dependencies:** added imports: CTERelationDef, UnionLoop, UnionLoopRef 
in test file
   - **Coupling:** Introduces tight coupling between UnionLoop/UnionLoopRef and 
CteIdNormalizer. The normalization logic assumes specific traversal order 
(UnionLoop before UnionLoopRef) similar to existing CTE handling.
   
   </details>
   
   **Risk Areas:** ID mapping correctness: The normalization relies on 
traversal order where UnionLoop is encountered before UnionLoopRef to build the 
mapping correctly, Duplicate ID handling in normalizeDef: The logic checks if a 
CTERelationDef ID is already mapped (new check at line 256), which could affect 
existing behavior for duplicate CTEs, UnionLoop ID mapping: The 
normalizeUnionLoop method only remaps if ID already exists, but doesn't insert 
new mappings. This is asymmetric compared to normalizeUnionLoopRef behavior
   
   <details>
   <summary><strong>Suggestions</strong></summary>
   
   - Add documentation explaining the assumption about traversal order in plan 
transformation
   - Consider adding a comment explaining why UnionLoop doesn't insert new 
mappings unlike UnionLoopRef
   - Verify the logic in normalizeDef change (line 256-257) doesn't introduce 
unintended regressions for existing CTE handling
   
   </details>
   
   
   <sub>Full review in progress... | Powered by <a 
href="https://diffray.ai?utm_source=github-summary";>diffray</a></sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to