Allison Wang created SPARK-36656:
------------------------------------

             Summary: CollapseProject should not collapse correlated scalar 
subqueries
                 Key: SPARK-36656
                 URL: https://issues.apache.org/jira/browse/SPARK-36656
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: Allison Wang


Currently, the optimizer rule `CollapseProject` inlines expressions generated 
from correlated scalar subqueries, which can create unnecessary left outer 
joins.

{code:scala}
// Before
Project [c1, s, (s * 10)]
+- Project [c1, scalar-subquery [c1] AS s]
   :  +- Aggregate [c1], [first(c2), c1] 
   :      +- LocalRelation [c1, c2]
   +- LocalRelation [c1, c2]

// After (scalar subqueries are inlined)
Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
:  +- Aggregate [c1], [first(c2), c1] 
:      +- LocalRelation [c1, c2]
:  +- Aggregate [c1], [first(c2), c1] 
:      +- LocalRelation [c1, c2]
+- LocalRelation [c1, c2]
{code}

Then this query will have two LeftOuter joins created. We should only collapse 
projects after correlated subqueries are rewritten as joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to