yadavay-amzn opened a new pull request, #56767: URL: https://github.com/apache/spark/pull/56767
### What changes were proposed in this pull request? Make `MergeRows` implement `SupportsNonDeterministicExpression` with `allowNonDeterministicExpression = true`. This allows non-deterministic expressions (e.g. `uuid()`, `rand()`) in MERGE INTO action assignments (`UPDATE SET col = uuid()`, `INSERT ... VALUES (..., rand())`) to pass analysis and execute correctly on DSv2 row-level operation tables. ### Why are the changes needed? This is a follow-up to SPARK-56729 (PR #55858) which handled non-deterministic expressions in MERGE *source* queries. However, non-deterministic expressions in MERGE *action assignments* still fail with `INVALID_NON_DETERMINISTIC_EXPRESSIONS` because the rewritten plan places these expressions inside `MergeRows`, which was not in CheckAnalysis's allowlist. `MergeRows` evaluates each WHEN-clause output projection exactly once per produced output row: - Interpreted path (MergeRowsExec lines 579-590): first matching instruction evaluates `projection.apply(row)` then returns. - Codegen path (MergeRowsExec lines 237-249): condition checked, if true: consume projection then return. This satisfies the safety condition for `SupportsNonDeterministicExpression` (same rationale as operators already in the allowlist, e.g. `NearestByJoin`). The single change covers both the group-based (copy-on-write: MergeRows -> ReplaceData) and delta-based (merge-on-read: MergeRows -> WriteDelta) MERGE paths, since both route action expressions through `MergeRows`. ### Does this PR introduce _any_ user-facing change? Yes. MERGE INTO statements with non-deterministic expressions in action assignments (e.g. `WHEN MATCHED THEN UPDATE SET id = uuid()`) now succeed on DSv2 tables instead of failing with `INVALID_NON_DETERMINISTIC_EXPRESSIONS`. ### How was this patch tested? - Converted existing failure-intercept tests into success tests covering both group-based and delta-based paths. - Tests verify that `uuid()` in UPDATE SET produces non-null 36-char UUID strings. - Tests verify that `rand()` in INSERT VALUES produces values in expected range. - All existing CheckAnalysis, MergeIntoTable, GroupBasedMerge, and DeltaBasedMerge test suites pass. - Scalastyle passes with 0 errors. ### Was this patch authored or co-authored using generative AI tooling? Yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
