allisonwang-db opened a new pull request #32072:
URL: https://github.com/apache/spark/pull/32072


   ### What changes were proposed in this pull request?
   This PR implements the decorrelation technique in the paper "Unnesting 
Arbitrary Queries" by T. Neumann; A. Kemper
   
(http://www.btw-2015.de/res/proceedings/Hauptband/Wiss/Neumann-Unnesting_Arbitrary_Querie.pdf).
 It currently supports Filter, Project, Aggregate, Join, and UnaryNode that 
passes CheckAnalysis. 
   
   This feature can be controlled by the config 
`spark.sql.optimizer.decorrelateInnerQuery.enabled` (default: true).
   
   A few notes:
   1. This PR does not relax any constraints in CheckAnalysis for correlated 
subqueries, even though some cases can be supported by this new framework, such 
as aggregate with correlated non-equality predicates. This PR focuses on adding 
the new framework and making sure all existing cases can be supported. 
Constraints can be relaxed gradually in the future via separate PRs.
   2. The new framework is only enabled for correlated scalar subqueries, as 
the first step. EXISTS/IN subqueries can be supported in the future.
   
   ### Why are the changes needed?
   Currently, Spark has limited support for correlated subqueries. It only 
allows `Filter` to reference outer query columns and does not support 
non-equality predicates when the subquery is aggregated. This new framework 
will allow more operators to host outer column references and support 
correlated non-equality predicates and more types of operators in correlated 
subqueries.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Existing unit and SQL query tests and new optimizer plan tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to