diffray-bot commented on PR #53636: URL: https://github.com/apache/spark/pull/53636#issuecomment-3699348977
## Changes Summary This PR adds support for normalizing recursive CTEs (rCTEs) in Spark's query plan normalization logic by introducing handlers for UnionLoop and UnionLoopRef nodes. This enables the single-pass analyzer to correctly compare semantically identical recursive CTE queries with different internal IDs. **Type:** feature **Components Affected:** Catalyst Query Plan Normalization, Recursive CTE Support, Query Plan Comparison Infrastructure <details> <summary><strong>Files Changed</strong></summary> | File | Summary | Change | Impact | |:-----|:--------|:------:|:------:| | `...he/spark/sql/catalyst/plans/NormalizePlan.scala` | Added UnionLoop and UnionLoopRef normalization handlers, plus a fix to prevent duplicate CTE ID mappings | ✏️ | 🟡 | | `...ark/sql/catalyst/plans/NormalizePlanSuite.scala` | Added three comprehensive test cases for UnionLoopRef, UnionLoop, and recursive CTE normalization | ✏️ | 🟢 | </details> <details> <summary><strong>Architecture Impact</strong></summary> - **New Patterns:** Visitor pattern extension for rCTE nodes, ID mapping pattern for maintaining referential integrity - **Dependencies:** added: UnionLoop and UnionLoopRef logical plan classes - **Coupling:** NormalizePlan normalization logic now explicitly depends on recursive CTE node types (UnionLoop, UnionLoopRef), increasing coupling to the CTE implementation details </details> **Risk Areas:** Bug fix in normalizeDef() changes behavior - need to verify it doesn't break existing non-recursive CTE normalization, ID remapping logic complexity - UnionLoopRef mapping uses counter differently than CTERelationRef, potential for confusion, Interaction between UnionLoop ID normalization and UnionLoopRef counter-based normalization - asymmetric pattern could be error-prone <details> <summary><strong>Suggestions</strong></summary> - Consider adding comments explaining why UnionLoopRef uses counter-based assignment vs. mapping lookup (unlike UnionLoop) - Verify the bug fix in normalizeDef() doesn't have unintended side effects on regular CTEs - Consider adding integration tests that verify recursive CTEs normalize correctly within larger query plans </details> <sub>Full review in progress... | Powered by <a href="https://diffray.ai?utm_source=github-summary">diffray</a></sub> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
