allisonwang-db opened a new pull request #33903:
URL: https://github.com/apache/spark/pull/33903


   ### What changes were proposed in this pull request?
   This PR adds an additional check in the `CollapseProject` rule to prevent 
inlining expressions with correlated scalar subqueries.
   
   ### Why are the changes needed?
   To avoid introducing unnecessary left outer joins when rewriting correlated 
subqueries. Projects should be collapsed after the scalar subqueries have been 
rewritten into left outer joins.
   
   For example
   ```
   // Before CollapseProject
   Project [c1, s, (s * 10)]
   +- Project [c1, scalar-subquery [c1] AS s]
      :  +- Aggregate [c1], [first(c2), c1] 
      :      +- LocalRelation [c1, c2]
      +- LocalRelation [c1, c2]
   
   // After (scalar subqueries are inlined)
   Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
   :  +- Aggregate [c1], [first(c2), c1] 
   :      +- LocalRelation [c1, c2]
   :  +- Aggregate [c1], [first(c2), c1] 
   :      +- LocalRelation [c1, c2]
   +- LocalRelation [c1, c2]
   ```
   Then the rule `RewriteCorrelatedScalarSubquery` rule will create two left 
outer joins instead of one. Also, since duplicate join attributes are not 
handled in this rule, this rewrite will break the structural integrity of the 
plan.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Unit test. Both test cases throw the following error before this PR:
   ```
   After applying rule 
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery in 
batch Operator Optimization before Inferring Filters, the structural integrity 
of the plan is broken.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to