cloud-fan opened a new pull request, #55949:
URL: https://github.com/apache/spark/pull/55949

   ### What changes were proposed in this pull request?
   
   Root-cause fix for SPARK-46625. Supersedes #55706.
   
   Push `PlanWithUnresolvedIdentifier` into the identifier slot of write 
commands at parse time (`InsertIntoStatement.table`, 
`CreateTableAsSelect.name`, `ReplaceTableAsSelect.name`) instead of wrapping 
the entire command. `CTESubstitution` then sees the real `CTEInChildren` and 
places `WithCTE` on the command's children by construction, so the invalid 
`WithCTE(InsertIntoStatement, ...)` / `WithCTE(CreateTableAsSelect, ...)` shape 
never appears.
   
   Specifically:
   
   - `AstBuilder.withInsertInto`, `visitCreateTable`, `visitReplaceTable` build 
the command directly, with a `PlanWithUnresolvedIdentifier` (or a plain 
`UnresolvedRelation` / `UnresolvedIdentifier` when the identifier is a constant 
string) in the name slot. New helper `buildWriteTableSlot` covers the INSERT 
cases.
   - `ResolveIdentifierClause` gets a parallel handler that materializes 
placeholders living inside the `innerPlans` of a command (the non-child 
`LogicalPlan` slots) — needed for `InsertIntoStatement.table`, which is not a 
child (`child = query`).
   - `LogicalPlan` gains a small `innerPlans` / `withNewInnerPlans` hook 
(default `Nil` / no-op). `InsertIntoStatement` overrides it. 
`LogicalPlan.getDefaultTreePatternBits` unions the inner plans' pattern bits, 
and `BindParameters.bind` recurses into them. This is required so that 
`containsPattern(PARAMETER)` and `BindParameters` reach a placeholder inside 
`InsertIntoStatement.table` — without it, `INSERT ... IDENTIFIER(:p)` under 
`spark.sql.legacy.parameterSubstitution.constantsOnly=true` fails to bind the 
parameter.
   - `CreateTableAsSelect.name` and `ReplaceTableAsSelect.name` are already 
children via `V2CreateTableAsSelectPlan.childrenToAnalyze`, so no `innerPlans` 
hook is needed for them.
   - A narrow post-hoc `WithCTE(c: CTEInChildren, _)` collapse is retained in 
`ResolveIdentifierClause` for the two `withIdentClause` call sites where the 
identifier slot's type prevents pushing the placeholder in directly: 
`OverwriteByExpression.table: NamedRelation` (INSERT REPLACE WHERE) and 
`CacheTableAsSelect.tempViewName: String`.
   
   ### Why are the changes needed?
   
   ```sql
   WITH t AS (...)
   INSERT [INTO|OVERWRITE] TABLE IDENTIFIER('t')
   SELECT * FROM t
   ```
   
   previously produced an analysed plan with `WithCTE` wrapping the 
`InsertIntoStatement` — a structurally invalid shape. Plug-in datasources that 
re-analyse the `InsertIntoStatement`'s query subtree throw 
`NoSuchElementException: key not found`. Zero-day issue since SPARK-46625.
   
   \#55706 fixed the symptom by collapsing `WithCTE(c: CTEInChildren, defs)` 
after the fact in `ResolveIdentifierClause`. That works but encodes an undo for 
a placement that should never have happened. The review on that PR 
(https://github.com/apache/spark/pull/55706#discussion_r3196249628) asked for 
the root-cause fix; this PR implements it. Credit to @stevomitric for surfacing 
in the original PR thread that the parser-level placement also has to work for 
legacy parameter substitution — that's why this PR adds the `innerPlans` 
traversal.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New tests in `ParametersSuite`:
   
   - `WITH ... INSERT OVERWRITE TABLE IDENTIFIER(:p) SELECT ... FROM cte`
   - `WITH ... INSERT INTO IDENTIFIER(:p) SELECT ... FROM cte`
   - `CREATE TABLE IDENTIFIER(:p) AS WITH ... SELECT ... FROM cte`
   - `INSERT IDENTIFIER(:p) under legacy parameter substitution` (covers 
`spark.sql.legacy.parameterSubstitution.constantsOnly=true`, which exercises 
the `innerPlans` hook on `LogicalPlan` and the corresponding recursion in 
`BindParameters.bind`)
   
   Each CTE test asserts that no `WithCTE(CTEInChildren, _)` shape leaks 
through analysis.
   
   Existing suites run clean: `ParametersSuite` (150), 
`IdentifierClauseParserSuite` (15), `AnalysisSuite` / `PlanParserSuite` / 
`AnalysisErrorSuite` and siblings (675), `DataSourceV2SQLSuiteV1Filter` (319), 
`InsertSuite` (154), `CTEInlineSuite` / `CTEHintSuite` (156), 
`CachedTableSuite` (104), `CreateTableAsSelectSuite` / `DDLParserSuite` (53).
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude (Claude Code, Opus 4.7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to