yadavay-amzn opened a new pull request, #56767:
URL: https://github.com/apache/spark/pull/56767

   ### What changes were proposed in this pull request?
   
   Make `MergeRows` implement `SupportsNonDeterministicExpression` with 
`allowNonDeterministicExpression = true`. This allows non-deterministic 
expressions (e.g. `uuid()`, `rand()`) in MERGE INTO action assignments (`UPDATE 
SET col = uuid()`, `INSERT ... VALUES (..., rand())`) to pass analysis and 
execute correctly on DSv2 row-level operation tables.
   
   ### Why are the changes needed?
   
   This is a follow-up to SPARK-56729 (PR #55858) which handled 
non-deterministic expressions in MERGE *source* queries. However, 
non-deterministic expressions in MERGE *action assignments* still fail with 
`INVALID_NON_DETERMINISTIC_EXPRESSIONS` because the rewritten plan places these 
expressions inside `MergeRows`, which was not in CheckAnalysis's allowlist.
   
   `MergeRows` evaluates each WHEN-clause output projection exactly once per 
produced output row:
   - Interpreted path (MergeRowsExec lines 579-590): first matching instruction 
evaluates `projection.apply(row)` then returns.
   - Codegen path (MergeRowsExec lines 237-249): condition checked, if true: 
consume projection then return.
   
   This satisfies the safety condition for `SupportsNonDeterministicExpression` 
(same rationale as operators already in the allowlist, e.g. `NearestByJoin`).
   
   The single change covers both the group-based (copy-on-write: MergeRows -> 
ReplaceData) and delta-based (merge-on-read: MergeRows -> WriteDelta) MERGE 
paths, since both route action expressions through `MergeRows`.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. MERGE INTO statements with non-deterministic expressions in action 
assignments (e.g. `WHEN MATCHED THEN UPDATE SET id = uuid()`) now succeed on 
DSv2 tables instead of failing with `INVALID_NON_DETERMINISTIC_EXPRESSIONS`.
   
   ### How was this patch tested?
   
   - Converted existing failure-intercept tests into success tests covering 
both group-based and delta-based paths.
   - Tests verify that `uuid()` in UPDATE SET produces non-null 36-char UUID 
strings.
   - Tests verify that `rand()` in INSERT VALUES produces values in expected 
range.
   - All existing CheckAnalysis, MergeIntoTable, GroupBasedMerge, and 
DeltaBasedMerge test suites pass.
   - Scalastyle passes with 0 errors.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to