cloud-fan commented on code in PR #39248:
URL: https://github.com/apache/spark/pull/39248#discussion_r1058115431
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala:
##########
@@ -117,10 +111,6 @@ object InterpretedMutableProjection {
* Returns a [[MutableProjection]] for given sequence of bound Expressions.
*/
def createProjection(exprs: Seq[Expression]): MutableProjection = {
- // We need to make sure that we do not reuse stateful expressions.
- val cleanedExpressions = exprs.map(_.transform {
- case s: Stateful => s.freshCopy()
Review Comment:
The old usage of `.transform` here contained a subtle bug related to how
`fastEquals` works.
Let's say that we have a tree which looks like this:
> Outer(Middle(Stateful()))
where `Outer` and `Middle` are non-Stateful expressions.
When the `.transform` is applied to `Stateful()` and `.freshCopy()` is
called, the returned value will be `==` to the original Stateful expression but
will have a different object identity (because it's a fresh object).
Internally, `.transform` will use `fastEquals` to check whether the
transformation modified the node. `Stateful` overrides `fastEquals` so that it
only considers object identity, so the transform will return the `freshCopy()`
result.
At the next level up, Middle will check whether any of its children have
been changed in the recursive bottom-up transformation (see
[childrenFastEquals() in
withNewChildren()](https://github.com/apache/spark/blob/9305cc744d27daa6a746d3eb30e7639c63329072/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala#L455-L456),
which is called from `mapChildren()`). It will detect that its children have
changed, so the transform will return a new Middle node.
Finally, at the top level, `Outer` will perform the same check to see if any
of its children have changed. This time, however, it will be calling
`Middle.fastEquals` instead of `Stateful.fastEquals`. Middle's fastEquals
method is the regular implementation which also considers object equality. Both
the original and new `Middle` nodes will be `==`, so `fastEquals` will be true
and `Outer` will conclude that its children have not been changed by the
transformation and the original `Outer` will be returned (losing the copy of
the stateful expression).
In other words, the old `.transform` and copying logic here was incorrect if
the Stateful expression was nested more than a single level deep.
In this PR I chose to fix this by adding a
`freshCopyIfContainsStatefulExpression()` method to `Expression` which
implements a custom tree traversal considers _only_ object identity when
determining whether the transform has changed a node or a node's children.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]