Maryann Xue created SPARK-23368:
-----------------------------------

             Summary: OutputOrdering and OutputPartitioning in ProjectExec 
should reflect the projected columns
                 Key: SPARK-23368
                 URL: https://issues.apache.org/jira/browse/SPARK-23368
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Maryann Xue


After column rename projection, the ProjectExec's outputOrdering and 
outputPartitioning should reflect the projected columns as well. For example,
{code:java}
SELECT b1
FROM (
    SELECT a a1, b b1
    FROM testData2
    ORDER BY a
)
ORDER BY a1{code}
The inner query is ordered on a1 as well. If we had a rule to eliminate Sort on 
sorted result, together with this fix, the order-by in the outer query could 
have been optimized out.

 

Similarly, the below query
{code:java}
SELECT *
FROM (
    SELECT t1.a a1, t2.a a2, t1.b b1, t2.b b2
    FROM testData2 t1
    LEFT JOIN testData2 t2
    ON t1.a = t2.a
)
JOIN testData2 t3
ON a1 = t3.a{code}
is equivalent to
{code:java}
SELECT *
FROM testData2 t1
LEFT JOIN testData2 t2
ON t1.a = t2.a
JOIN testData2 t3
ON t1.a = t3.a{code}
, so the unnecessary sorting and hash-partitioning that have been optimized out 
for the second query should have be eliminated in the first query as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to