peter-toth edited a comment on pull request #29771:
URL: https://github.com/apache/spark/pull/29771#issuecomment-694177024


   > 1. The fix involved a bit of refactoring. Is the bug caused by "leaking" 
the `foldableMap` from a sibling node, such that it's not properly just 
propagating along the child->parent flow? And that the refactoring eliminated 
the `var` the threaded the propagation via returns in a recursive depth-first 
traversal to make sure that it's always only propagating along the 
child->parent flow, and otherwise the existing logic is unchanged? If so, this 
looks like a nice refactoring.
   
   This is exactly what happens without this PR. Basically `foldableMap` is 
filled up with all foldables in the beginning and then we start the preorder 
traversal and replacing references from the map. Sometimes we narrow down the 
map when traversing through particular nodes, but this is still incorrect.
   With this PR we do a proper fill-up and child->parent propagation of the map.
   
   > 2. `TreeNode`'s `transformXXX` functions take care of the case where a new 
node is actually equal to the old one, so that after transformation the 
logically unchanged nodes will stay with their original identity. It looks like 
with the explicit recursion after this PR, that it might introduce more, 
unnecessary copies of the logically unchanged nodes that end up staying in the 
transformed tree. Is that the case? If so, can we make it better?
   
   I think this is a very good point actually. `mapChildren()` in the "other" 
case seems to be ok as that method takes care of the comparison. 
`withNewChildren()` also takes care of it in `UnaryNode` and `Join`. The 
`replaceFoldable()` call after `withNewChildren()` is also ok because it uses 
`transformExpressions`.  Finally `Project` and `Expand`, well, we could change 
the simple `copy()` in these cases. But I think if there is a parent 
"other"/`UnaryNode`/`Join` above these 2 they will be also fine in the end. But 
let me change the `copy()`, though.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to