mikhailnik-db opened a new pull request, #55986:
URL: https://github.com/apache/spark/pull/55986

   ### What changes were proposed in this pull request?
   
   Skip the non-recursive `CTERelationRef` schema snapshot in `ResolveWithCTE` 
while the matching `CTERelationDef` still contains an unresolved 
`SQLFunctionExpression`. A subsequent fixed-point iteration retries the 
substitution once `ResolveSQLFunctions` has inlined the UDF body.
   
   ### Why are the changes needed?
   
   `SQLFunctionExpression` hard-codes `nullable = true` but is `resolved` as 
soon as its inputs resolve. `CTERelationRef.output` is a `val` snapshot of 
`cteDef.output`, so capturing it before the UDF inlines freezes `nullable = 
true`.
   
   For nested UDF calls like `wrap_int(non_null_one())`, the outer placeholder 
survives one analyzer iteration (`ResolveSQLFunctions` skips UDFs whose inputs 
themselves contain a `SQLFunctionExpression`). `ResolveWithCTE`, which runs 
later in the same batch, snapshots the still-incorrect output, and the 
`!ref.resolved` gate prevents a fix-up on the next iteration.
   
   Single-level UDF cases inline fully in iter 1 and avoid the bug. Recursive 
CTEs are unaffected: they already force `withNullability(true)` by design.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. CTE columns wrapping nested non-nullable SQL UDFs now report `nullable 
= false` instead of `nullable = true`. Row-level results are unchanged.
   
   Before:
   ```sql
   CREATE FUNCTION non_null_one() RETURNS INT RETURN 1;
   CREATE FUNCTION wrap_int(x INT) RETURNS INT RETURN x;
   WITH cte AS (SELECT wrap_int(non_null_one()) AS x) SELECT * FROM cte;
   -- x: int (nullable = true)
   ```
   After:
   ```
   -- x: int (nullable = false)
   ```
   
   ### How was this patch tested?
   
   New regression test in `SQLFunctionSuite` (`SPARK-56945: CTE preserves 
non-nullable SQL UDF body in materialized schema`). Fails on master with `x: 
integer (nullable = true)`, passes with this PR. No SQL UDF golden file diffs.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude (Anthropic, Claude Code, Opus 4.7)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to