jchen5 opened a new pull request, #39375:
URL: https://github.com/apache/spark/pull/39375

   ## What changes were proposed in this pull request?
   
   Adds support for subquery decorrelation with UNION operators on the 
correlation paths. For example:
   ```
   SELECT t1a, (
     SELECT avg(b) FROM (
       SELECT t2b as b FROM t2 WHERE t2a = t1a
       UNION
       SELECT t3b as b FROM t3 WHERE t3a = t1a
   ))
   FROM t1
   ```
   
   [This 
doc](https://docs.google.com/document/d/11b9ClCF2jYGU7vU2suOT7LRswYkg6tZ8_6xJbvxfh2I/edit#)
 describes how the decorrelation rewrite works for UNION and the code changes 
for it.
   
   The other set operators INTERSECT and EXCEPT should work with the same 
logic, I will add those in a follow-up after this PR is ready.
   
   In this PR, we always add DomainJoins for correlation through UNIONs, and 
never do direct substitution of the outer refs. That can also be added as an 
optimization in a follow-up - it only affects performance, not surface area 
coverage.
   
   ### Why are the changes needed?
   To improve subquery support in Spark.
   
   ### Does this PR introduce _any_ user-facing change?
   Before this change, queries like this would return an error like: 
`Decorrelate inner query through Union is not supported.`
   
   After this PR, this query can run successfully.
   
   ### How was this patch tested?
   Unit tests and SQL query tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to