yadavay-amzn commented on code in PR #55858:
URL: https://github.com/apache/spark/pull/55858#discussion_r3469634240
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala:
##########
@@ -367,7 +367,10 @@ case class ReplaceData(
originalTable: NamedRelation,
projections: ReplaceDataProjections,
groupFilterCondition: Option[Expression] = None,
- write: Option[Write] = None) extends RowLevelWrite {
+ write: Option[Write] = None)
+ extends RowLevelWrite with SupportsNonDeterministicExpression {
+
+ override def allowNonDeterministicExpression: Boolean = operation.command()
== MERGE
Review Comment:
Thanks for catching this; you were right that it was unsafe. I reproduced
the silent lost update (a non-deterministic source whose key set differs
between the two scans makes the pruning subquery drop a target group the main
merge matches, losing the update) and pushed a fix.
Rather than materialize/checkpoint the source, I went at the root of the
hazard: skip runtime group filtering when its condition is non-deterministic -
a `cond.deterministic` gate in `canInjectGroupFilters`, since the
`groupFilterCondition` is an `Exists` over the source and is non-deterministic
exactly when the source is. That removes the double-scan entirely, and being a
single centralized gate it covers both the ReplaceData and WriteDelta paths.
Deterministic-source MERGE still gets the group filter, so the optimization is
unaffected for the common case.
I did evaluate materialization first, but it turned out to depend on
exchange/subquery reuse, which deliberately doesn't dedup non-deterministic
plans - so correctness would hinge on an optimization that can be disabled; the
guard instead follows Spark's existing stance of not optimizing
non-deterministic plans (e.g. InlineCTE).
Kept the `SupportsNonDeterministicExpression` relaxation so the analysis
false-positive is fixed.
Tests: an end-to-end suite over `InMemoryRowLevelOperationTable` that (a)
reproduces the original `INVALID_NON_DETERMINISTIC_EXPRESSIONS` failure now
passing and (b) runs with the runtime group filter enabled and asserts correct
rows (fails without the guard), plus a deterministic-source test confirming the
filter is still injected, and UPDATE/DELETE negative cases. Also folded the
parens nit and updated the PR description.
Does this approach look okay to you?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]