diffray-bot commented on PR #53636: URL: https://github.com/apache/spark/pull/53636#issuecomment-3698102591
## Changes Summary This PR adds support for normalizing UnionLoop and UnionLoopRef nodes in the plan comparison logic. It introduces new normalization methods in CteIdNormalizer and corresponding test cases to enable testing of recursive CTEs in the single-pass analyzer. **Type:** feature **Components Affected:** catalyst/plans/NormalizePlan, catalyst/plans/CteIdNormalizer, test suite for NormalizePlan <details> <summary><strong>Files Changed</strong></summary> | File | Summary | Change | Impact | |:-----|:--------|:------:|:------:| | `...he/spark/sql/catalyst/plans/NormalizePlan.scala` | Added handling for UnionLoop and UnionLoopRef nodes in the plan normalization transformation and added corresponding normalization methods to CteIdNormalizer class. | ✏️ | 🟡 | | `...ark/sql/catalyst/plans/NormalizePlanSuite.scala` | Added three new test cases for UnionLoopRef ID normalization, UnionLoop ID and outputAttrIds normalization, and complete recursive CTE normalization. | ✏️ | 🟡 | </details> <details> <summary><strong>Architecture Impact</strong></summary> - **New Patterns:** Parallel normalization patterns for recursive CTE nodes (similar to existing CTE handling) - **Dependencies:** added imports: CTERelationDef, UnionLoop, UnionLoopRef in test file - **Coupling:** Introduces tight coupling between UnionLoop/UnionLoopRef and CteIdNormalizer. The normalization logic assumes specific traversal order (UnionLoop before UnionLoopRef) similar to existing CTE handling. </details> **Risk Areas:** ID mapping correctness: The normalization relies on traversal order where UnionLoop is encountered before UnionLoopRef to build the mapping correctly, Duplicate ID handling in normalizeDef: The logic checks if a CTERelationDef ID is already mapped (new check at line 256), which could affect existing behavior for duplicate CTEs, UnionLoop ID mapping: The normalizeUnionLoop method only remaps if ID already exists, but doesn't insert new mappings. This is asymmetric compared to normalizeUnionLoopRef behavior <details> <summary><strong>Suggestions</strong></summary> - Add documentation explaining the assumption about traversal order in plan transformation - Consider adding a comment explaining why UnionLoop doesn't insert new mappings unlike UnionLoopRef - Verify the logic in normalizeDef change (line 256-257) doesn't introduce unintended regressions for existing CTE handling </details> <sub>Full review in progress... | Powered by <a href="https://diffray.ai?utm_source=github-summary">diffray</a></sub> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
