allisonwang-db opened a new pull request #33284:
URL: https://github.com/apache/spark/pull/33284


   ### What changes were proposed in this pull request?
   This PR adds optimization for scalar and lateral subqueries with 
OneRowRelation as leaf nodes. It inlines such subqueries before decorrelation 
to avoid rewriting them as left outer joins. It also introduces a flag to turn 
on/off this optimization: `spark.sql.optimizer.optimizeOneRowRelationSubquery` 
(default: True).
   
   For example:
   ```sql
   select (select c1) from t
   ```
   Analyzed plan:
   ```
   Project [scalar-subquery#17 [c1#18] AS scalarsubquery(c1)#22]
   :  +- Project [outer(c1#18)]
   :     +- OneRowRelation
   +- LocalRelation [c1#18, c2#19]
   ```
   
   Optimized plan before this PR:
   ```
   Project [c1#18#25 AS scalarsubquery(c1)#22]
   +- Join LeftOuter, (c1#24 <=> c1#18)
      :- LocalRelation [c1#18]
      +- Aggregate [c1#18], [c1#18 AS c1#18#25, c1#18 AS c1#24]
         +- LocalRelation [c1#18]
   ```
   
   Optimized plan after this PR:
   ```
   LocalRelation [scalarsubquery(c1)#22]
   ```
   
   ### Why are the changes needed?
   To optimize query plans.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Added new unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to