Maryann Xue created SPARK-23368: ----------------------------------- Summary: OutputOrdering and OutputPartitioning in ProjectExec should reflect the projected columns Key: SPARK-23368 URL: https://issues.apache.org/jira/browse/SPARK-23368 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Maryann Xue
After column rename projection, the ProjectExec's outputOrdering and outputPartitioning should reflect the projected columns as well. For example, {code:java} SELECT b1 FROM ( SELECT a a1, b b1 FROM testData2 ORDER BY a ) ORDER BY a1{code} The inner query is ordered on a1 as well. If we had a rule to eliminate Sort on sorted result, together with this fix, the order-by in the outer query could have been optimized out. Similarly, the below query {code:java} SELECT * FROM ( SELECT t1.a a1, t2.a a2, t1.b b1, t2.b b2 FROM testData2 t1 LEFT JOIN testData2 t2 ON t1.a = t2.a ) JOIN testData2 t3 ON a1 = t3.a{code} is equivalent to {code:java} SELECT * FROM testData2 t1 LEFT JOIN testData2 t2 ON t1.a = t2.a JOIN testData2 t3 ON t1.a = t3.a{code} , so the unnecessary sorting and hash-partitioning that have been optimized out for the second query should have be eliminated in the first query as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org