dilipbiswal opened a new pull request #25258: [SPARK-19712] Move subquery 
rewrite to beginning of optimizer
URL: https://github.com/apache/spark/pull/25258
 
 
   ## What changes were proposed in this pull request?
   Currently predicate subqueries (IN/EXISTS) are converted to Joins at the end 
of optimizer in RewritePredicateSubquery. This change moves the rewrite close 
to beginning of optimizer. The original idea was to keep the subquery 
expressions in Filter form so that we can push them down as deep as possible. 
One disadvantage is that, after the subqueries are rewritten in join form, they 
are not subjected to further optimizations. In this change, we convert the 
subqueries to join form early in the rewrite phase.  
   
   I will combine the pullupCorrelatedPredicates and RewritePredicateSubquery 
in a follow-up PR.
   
   ## How was this patch tested?
   A new test suite `LeftSemiAntiJoinAndSubqueryEquivalencySuite` is added to 
verify that the correlated subqueries and queries that explicitly use 
leftsemi/anti joins result in same plan after optmization.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to