yadavay-amzn commented on code in PR #55858:
URL: https://github.com/apache/spark/pull/55858#discussion_r3469634240
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala:
##########
@@ -367,7 +367,10 @@ case class ReplaceData(
originalTable: NamedRelation,
projections: ReplaceDataProjections,
groupFilterCondition: Option[Expression] = None,
- write: Option[Write] = None) extends RowLevelWrite {
+ write: Option[Write] = None)
+ extends RowLevelWrite with SupportsNonDeterministicExpression {
+
+ override def allowNonDeterministicExpression: Boolean = operation.command()
== MERGE
Review Comment:
Thanks for catching this; you were right that it was unsafe. I reproduced
the silent lost update (a non-deterministic source whose key set differs
between the two scans makes the pruning subquery drop a target group the main
merge matches, losing the update) and pushed a fix.
Rather than materialize/checkpoint the source, I went at the root of the
hazard: skip runtime group filtering when its condition is non-deterministic -
a `cond.deterministic` gate in `canInjectGroupFilters`, since the
`groupFilterCondition` is an `Exists` over the source and is non-deterministic
exactly when the source is. That removes the double-scan entirely, and being a
single centralized gate it covers both the ReplaceData and WriteDelta paths.
Deterministic-source MERGE still gets the group filter, so the optimization is
unaffected for the common case. I did evaluate materialization first, but it
turned out to depend on exchange/subquery reuse, which deliberately doesn't
dedup non-deterministic plans - so correctness would hinge on an optimization
that can be disabled; the guard instead follows Spark's existing stance of not
optimizing non-deterministic plans (e.g. InlineCTE).
Kept the `SupportsNonDeterministicExpression` relaxation so the analysis
false-positive is fixed. Tests: an end-to-end suite over
`InMemoryRowLevelOperationTable` that (a) reproduces the original
`INVALID_NON_DETERMINISTIC_EXPRESSIONS` failure now passing and (b) runs with
the runtime group filter enabled and asserts correct rows (fails without the
guard), plus a deterministic-source test confirming the filter is still
injected, and UPDATE/DELETE negative cases. Also folded the parens nit and
updated the PR description.
Does this approach look okay to you?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]