peter-toth edited a comment on pull request #29771: URL: https://github.com/apache/spark/pull/29771#issuecomment-694177024
> 1. The fix involved a bit of refactoring. Is the bug caused by "leaking" the `foldableMap` from a sibling node, such that it's not properly just propagating along the child->parent flow? And that the refactoring eliminated the `var` the threaded the propagation via returns in a recursive depth-first traversal to make sure that it's always only propagating along the child->parent flow, and otherwise the existing logic is unchanged? If so, this looks like a nice refactoring. This is exactly what happens without this PR. Basically `foldableMap` is filled up with all foldables in the beginning and then we start the preorder traversal and replacing references from the map. Sometimes we narrow down the map when traversing through particular nodes, but this is still incorrect. With this PR we do a proper fill-up and child->parent propagation of the map. > 2. `TreeNode`'s `transformXXX` functions take care of the case where a new node is actually equal to the old one, so that after transformation the logically unchanged nodes will stay with their original identity. It looks like with the explicit recursion after this PR, that it might introduce more, unnecessary copies of the logically unchanged nodes that end up staying in the transformed tree. Is that the case? If so, can we make it better? I think this is a very good point actually. `mapChildren()` in the "other" case seems to be ok as that method takes care of the comparison. `withNewChildren()` also takes care of it in `UnaryNode` and `Join`. The `replaceFoldable()` call after `withNewChildren()` is also ok because it uses `transformExpressions`. Finally `Project` and `Expand`, well, we could change the simple `copy()` in these cases. But I think if there is a parent "other"/`UnaryNode`/`Join` above these 2 they will be also fine in the end. But let me change the `copy()`, though. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org