[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/20541 @gatorsmileï¼ OKï¼ I will do it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20541 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20541 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/20541 oh, I see, I fallback to the modification of the non-deterministic expression, and to keep the newly added test cases for a+1 and a+b, can you? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20541 I don't agree. `a + 1`/`a + b` are evaluated the same number of time, no matter you push in through Union or not. I don't see any performance benefit by doing this, except you can eliminate the entire project above Union. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/20541 oh ,yeah, there is a little difference, for a + 1 and a + b. **for a + 1**: ``` `PushProjectionThroughUnion `rule handles: Union :- Project [(a#0 + 1) AS aa#10] : +- LocalRelation , [a#0, b#1, c#2] :- Project [(d#3 + 1) AS aa#11] : +- LocalRelation , [d#3, e#4, f#5] +- Project [(g#6 + 1) AS aa#12] +- LocalRelation , [g#6, h#7, i#8] `ColumnPruning `rule handles: Project [(a#0 + 1) AS aa#9] Union :- Project [a#0] : +- LocalRelation , [a#0, b#1, c#2] :- Project [d#3] : +- LocalRelation , [d#3, e#4, f#5] +- Project [g#6] +- LocalRelation , [g#6, h#7, i#8] ``` **for a + b**: ``` `PushProjectionThroughUnion `rule handles: Union :- Project [(a#0 + b#1) AS ab#10] : +- LocalRelation , [a#0, b#1, c#2] :- Project [(d#3 + e#4) AS ab#11] : +- LocalRelation , [d#3, e#4, f#5] +- Project [(g#6 + h#7) AS ab#12] +- LocalRelation , [g#6, h#7, i#8] `ColumnPruning `rule handles: Project [(a#0 + b#1) AS ab#9] Union :- Project [a#0, b#1] : +- LocalRelation , [a#0, b#1, c#2] :- Project [d#3, e#4] : +- LocalRelation , [d#3, e#4, f#5] +- Project [g#6, h#7] +- LocalRelation , [g#6, h#7, i#8] ``` So I think this may be the reason for the need to add the pushprojectionthroughunion rules. and to non-deterministic expression. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20541 `ColumnPruning` rule handles `Union` already. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/20541 in my opinion, this is considered that PushProjectionThroughUnion optimizes rules when there are multiple columns of union in data sources, while projection requires only a few columns, and the performance of file operation is better. thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20541 I think the use case is, by pushing projects into Union, we are more likely to combine adjacent Unions. So I don't think we need to improve it to push part of the project list and still leave a project above Union. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20541 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20541 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20541 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87237/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20541 **[Test build #87237 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87237/testReport)** for PR 20541 at commit [`4f5d46b`](https://github.com/apache/spark/commit/4f5d46baca612caaa882cbabb3b35665e9c7ed8b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20541 I'm confused about why we need `PushProjectionThroughUnion`. Generally we only need to push down required columns, not entire project list, as there is no benifit of doing this. I think we just need to handle `Union` in the `ColumnPruning` rule, but I may miss something. cc @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20541 **[Test build #87237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87237/testReport)** for PR 20541 at commit [`4f5d46b`](https://github.com/apache/spark/commit/4f5d46baca612caaa882cbabb3b35665e9c7ed8b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20541 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20541 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87210/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20541 **[Test build #87210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87210/testReport)** for PR 20541 at commit [`36dbc9c`](https://github.com/apache/spark/commit/36dbc9c543f36dc5952a89c354bd70067ddd6883). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20541 **[Test build #87210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87210/testReport)** for PR 20541 at commit [`36dbc9c`](https://github.com/apache/spark/commit/36dbc9c543f36dc5952a89c354bd70067ddd6883). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20541 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20541 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20541 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/20541 @gatorsmile ,@cloud-fan Can you help me to review it. thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org