sunchao commented on code in PR #56575:
URL: https://github.com/apache/spark/pull/56575#discussion_r3463048838
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:
##########
@@ -454,19 +454,59 @@ trait RuntimeReplaceable extends Expression {
override val nodePatterns: Seq[TreePattern] = Seq(RUNTIME_REPLACEABLE)
override def nullable: Boolean = replacement.nullable
override def dataType: DataType = replacement.dataType
+ // The actual evaluation is delegated to `replacement`, so determinism must
reflect `replacement`,
+ // not this expression's `children` (which are the original arguments). For
example, the children
+ // of `Uniform` are literal bounds and a seed (all deterministic), while its
`replacement` is a
+ // non-deterministic `Rand`. This matters once a `RuntimeReplaceable` may
survive into the
+ // physical plan (see `eagerReplace`): the survival decision relies on an
accurate determinism
+ // signal.
+ override lazy val deterministic: Boolean = replacement.deterministic
Review Comment:
[P2] Delegate `throwable` to the surviving replacement
`RuntimeReplaceable` now delegates `deterministic` and `foldable` to
`replacement`, but it still inherits `Expression.throwable`, which checks only
the wrapper's original `children`. A surviving wrapper can therefore look
non-throwing even when its replacement contains an explicitly throwable
expression such as `Sequence(..., step)`.
For a concrete example, consider a boolean wrapper whose replacement is
`size(sequence(start, stop, step)) > 0`, and an inner join with these inputs:
- left row `(key=1, start=0, stop=1, step=1)`; right contains `key=1`
- left row `(key=2, start=0, stop=1, step=-1)`; right has no `key=2`
With the predicate above the join, the second row is removed by the join
before `sequence(0, 1, -1)` is evaluated, so the query succeeds. With
`eagerReplace = false`, the wrapper reports `throwable = false`;
`PushPredicateThroughJoin` moves it onto the left input, and the second row now
throws `Illegal sequence boundaries` before the join can discard it. Physical
materialization happens too late to undo that relocation. Spark's existing
SPARK-46707 test specifically keeps explicit-step `Sequence` predicates above
joins for this reason.
Could we also delegate this metadata, e.g. `override lazy val throwable:
Boolean = replacement.throwable`, and add the wrapped join case as a regression
test?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]