sunchao commented on code in PR #56575:
URL: https://github.com/apache/spark/pull/56575#discussion_r3463048838


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:
##########
@@ -454,19 +454,59 @@ trait RuntimeReplaceable extends Expression {
   override val nodePatterns: Seq[TreePattern] = Seq(RUNTIME_REPLACEABLE)
   override def nullable: Boolean = replacement.nullable
   override def dataType: DataType = replacement.dataType
+  // The actual evaluation is delegated to `replacement`, so determinism must 
reflect `replacement`,
+  // not this expression's `children` (which are the original arguments). For 
example, the children
+  // of `Uniform` are literal bounds and a seed (all deterministic), while its 
`replacement` is a
+  // non-deterministic `Rand`. This matters once a `RuntimeReplaceable` may 
survive into the
+  // physical plan (see `eagerReplace`): the survival decision relies on an 
accurate determinism
+  // signal.
+  override lazy val deterministic: Boolean = replacement.deterministic

Review Comment:
   [P2] Delegate `throwable` to the surviving replacement
   
   `RuntimeReplaceable` now delegates `deterministic` and `foldable` to 
`replacement`, but it still inherits `Expression.throwable`, which checks only 
the wrapper's original `children`. A surviving wrapper can therefore look 
non-throwing even when its replacement contains an explicitly throwable 
expression such as `Sequence(..., step)`.
   
   For a concrete example, consider a boolean wrapper whose replacement is 
`size(sequence(start, stop, step)) > 0`, and an inner join with these inputs:
   
   - left row `(key=1, start=0, stop=1, step=1)`; right contains `key=1`
   - left row `(key=2, start=0, stop=1, step=-1)`; right has no `key=2`
   
   With the predicate above the join, the second row is removed by the join 
before `sequence(0, 1, -1)` is evaluated, so the query succeeds. With 
`eagerReplace = false`, the wrapper reports `throwable = false`; 
`PushPredicateThroughJoin` moves it onto the left input, and the second row now 
throws `Illegal sequence boundaries` before the join can discard it. Physical 
materialization happens too late to undo that relocation. Spark's existing 
SPARK-46707 test specifically keeps explicit-step `Sequence` predicates above 
joins for this reason.
   
   Could we also delegate this metadata, e.g. `override lazy val throwable: 
Boolean = replacement.throwable`, and add the wrapped join case as a regression 
test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to