GitHub user heary-cao opened a pull request: https://github.com/apache/spark/pull/20541
[SPARK-23356][SQL]Pushes Project to both sides of Union when expression is non-deterministic ## What changes were proposed in this pull request? Currently, PushProjectionThroughUnion optimizer only supports pushdown project operator to both sides of a Union operator when expression is deterministic , in fact, we can be like pushdown filters, also support pushdown project operator to both sides of a Union operator when expression is non-deterministic , this PR description fix this problemãnow the explain looks likeï¼ ``` === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion === Input LogicalPlan: Project [a#0, rand(10) AS rnd#9] +- Union :- LocalRelation <empty>, [a#0, b#1, c#2] :- LocalRelation <empty>, [d#3, e#4, f#5] +- LocalRelation <empty>, [g#6, h#7, i#8] Output LogicalPlan: Project [a#0, rand(10) AS rnd#9] +- Union :- Project [a#0] : +- LocalRelation <empty>, [a#0, b#1, c#2] :- Project [d#3] : +- LocalRelation <empty>, [d#3, e#4, f#5] +- Project [g#6] +- LocalRelation <empty>, [g#6, h#7, i#8] ``` ## How was this patch tested? add new test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/heary-cao/spark PushProjectionThroughUnion Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20541.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20541 ---- commit 36dbc9c543f36dc5952a89c354bd70067ddd6883 Author: caoxuewen <cao.xuewen@...> Date: 2018-02-08T08:02:17Z Pushes Project to both sides of Union when expression is non-deterministic ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org