cloud-fan edited a comment on pull request #30555:
URL: https://github.com/apache/spark/pull/30555#issuecomment-763702232


   After a second look, I'm a bit worried about this half-baked solution. The 
correlated subquery handling is split into 3 steps in general:
   1. `CheckAnalysis` makes sure correlated subquery can only exist in 
`SupportsSubquery`, `Filter`, and a few other operators.
   2. `PullupCorrelatedPredicates` pulls up the outer references in the 
correlated subquery to the root node. It handles `SupportsSubquery` and 
`UnaryNode`.
   3. `RewriteCorrelatedScalarSubquery` and `RewritePredicateSubquery` rewrite 
correlated subquery to join. They only handle `Filter`, `Aggregate` and 
`Project`.
   
   I have a hard time imagining how we can rewrite UPDATE/DELETE/MERGE commands 
with correlated subquery to joins, and start to doubt if this is the right 
direction to go. Before this PR, `SupportsSubquery` is mostly a marker-trait, 
to let `CheckAnalysis` not get in the way (fail UPDATE/DELETE/MERGE commands 
with correlated subquery). We assume users would add catalyst rules and/or 
provide proper UPDATE/DELETE/MERGE physical plans to support correlated 
subquery. Now `PullupCorrelatedPredicates` can get in the way as well.
   
   @aokolnychyi can you share your plan of supporting UPDATE/DELETE/MERGE 
commands with correlated subquery? It's better to leave this half-baked state 
ASAP, by either reverting this patch or finishing the feature completely.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to