dilipbiswal opened a new pull request #25258: [SPARK-19712] Move subquery rewrite to beginning of optimizer URL: https://github.com/apache/spark/pull/25258 ## What changes were proposed in this pull request? Currently predicate subqueries (IN/EXISTS) are converted to Joins at the end of optimizer in RewritePredicateSubquery. This change moves the rewrite close to beginning of optimizer. The original idea was to keep the subquery expressions in Filter form so that we can push them down as deep as possible. One disadvantage is that, after the subqueries are rewritten in join form, they are not subjected to further optimizations. In this change, we convert the subqueries to join form early in the rewrite phase. I will combine the pullupCorrelatedPredicates and RewritePredicateSubquery in a follow-up PR. ## How was this patch tested? A new test suite `LeftSemiAntiJoinAndSubqueryEquivalencySuite` is added to verify that the correlated subqueries and queries that explicitly use leftsemi/anti joins result in same plan after optmization.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
