GitHub user heary-cao opened a pull request:
https://github.com/apache/spark/pull/20541
[SPARK-23356][SQL]Pushes Project to both sides of Union when expression is
non-deterministic
## What changes were proposed in this pull request?
Currently, PushProjectionThroughUnion optimizer only supports pushdown
project operator to both sides of a Union operator when expression is
deterministic , in fact, we can be like pushdown filters, also support pushdown
project operator to both sides of a Union operator when expression is
non-deterministic , this PR description fix this problemãnow the explain
looks likeï¼
```
=== Applying Rule
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion ===
Input LogicalPlan:
Project [a#0, rand(10) AS rnd#9]
+- Union
:- LocalRelation <empty>, [a#0, b#1, c#2]
:- LocalRelation <empty>, [d#3, e#4, f#5]
+- LocalRelation <empty>, [g#6, h#7, i#8]
Output LogicalPlan:
Project [a#0, rand(10) AS rnd#9]
+- Union
:- Project [a#0]
: +- LocalRelation <empty>, [a#0, b#1, c#2]
:- Project [d#3]
: +- LocalRelation <empty>, [d#3, e#4, f#5]
+- Project [g#6]
+- LocalRelation <empty>, [g#6, h#7, i#8]
```
## How was this patch tested?
add new test cases
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/heary-cao/spark PushProjectionThroughUnion
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20541.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20541
----
commit 36dbc9c543f36dc5952a89c354bd70067ddd6883
Author: caoxuewen <cao.xuewen@...>
Date: 2018-02-08T08:02:17Z
Pushes Project to both sides of Union when expression is non-deterministic
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]