[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-12-03 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22318 In the example @viirya described above (https://github.com/apache/spark/pull/22318#issuecomment-426317617), I think the interpretation is unclear to most users and I'm fairly concerned that it

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-12-03 Thread peter-toth
Github user peter-toth commented on the issue: https://github.com/apache/spark/pull/22318 @maropu, I don't think we can. Actually this is how we deal with [simpler joins](https://github.com/apache/spark/pull/22318#issuecomment-427080091) Do you think changing the behaviour is

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-12-02 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22318 @peter-toth we cannot fix the issue of the description without changing [the existing behaviour](https://github.com/apache/spark/pull/22318#issuecomment-426317617)? ---

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-10-12 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22318 @srowen the change seems fine to me as I think this does improve the behavior of attributes deduplication (I think it was a bug not rewriting the join condition with the new references).

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-10-11 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22318 Oh I see there was indeed more discussion on this, and it does relate to resolving columns to joined DataFrames. I don't know enough to bless this change, but it seems reasonable. @maropu approves,

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-10-11 Thread peter-toth
Github user peter-toth commented on the issue: https://github.com/apache/spark/pull/22318 @srowen, I saw your last comment on https://github.com/peter-toth/spark/tree/SPARK-25150. I submitted this PR to solve that ticket and I believe the description here explains what is the real

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-10-04 Thread peter-toth
Github user peter-toth commented on the issue: https://github.com/apache/spark/pull/22318 Also please consider that currently (and also after this PR) using `b` and `c` from the description: ``` b.join(c, b("id") === b("id"), "inner") ``` is interpreted as ```

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-10-02 Thread peter-toth
Github user peter-toth commented on the issue: https://github.com/apache/spark/pull/22318 Thanks @viirya, your analysis is correct. Unfortunately an attribute doesn't have a reference to its dataset so I don't think this scenario can be solved easily. I believe the good

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-10-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22318 For a query similar to the one in the PR description: `a.join(b, a("id") === b("id"), "inner").join(c, a("id") === b("id"), "inner"` It is interpreted as a cartesian products now.

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-10-02 Thread peter-toth
Github user peter-toth commented on the issue: https://github.com/apache/spark/pull/22318 @cloud-fan could you please help me with this PR and take it one step forward? --- - To unsubscribe, e-mail:

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-13 Thread peter-toth
Github user peter-toth commented on the issue: https://github.com/apache/spark/pull/22318 @cloud-fan, does the new description defines the scope as you suggested? Is there anything I can add to this PR? --- - To

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-10 Thread peter-toth
Github user peter-toth commented on the issue: https://github.com/apache/spark/pull/22318 @cloud-fan , I added some explanation to the description in which cases this PR helps and also where it doesn't. --- - To

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-10 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22318 Can you define the scope of this PR? In which case we should change the references in the join condition? --- - To

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95854/ Test PASSed. ---

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95854 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95854/testReport)** for PR 22318 at commit

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-10 Thread peter-toth
Github user peter-toth commented on the issue: https://github.com/apache/spark/pull/22318 @cloud-fan this PR doesn't solve that question. There are some hacks in `Dataset.join` to handle `EqualTo` and `EqualNullSafe` with duplicated attributes and those hacks are still required

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22318 How does this work? When we have duplicated attributes in the join condition, how can we know which attribute comes from which side? ---

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95854 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95854/testReport)** for PR 22318 at commit

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-09 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22318 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-09 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22318 To make sure we have no regression by this change, I checked the `Analyzer$ResolveReferences`time in TPCDS queries. But, I didn't find actual performance regression. ---

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-06 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22318 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95757/ Test PASSed. ---

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95757/testReport)** for PR 22318 at commit

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95757/testReport)** for PR 22318 at commit

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-06 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22318 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95750/ Test FAILed. ---

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95750/testReport)** for PR 22318 at commit

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95750 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95750/testReport)** for PR 22318 at commit

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95723/ Test PASSed. ---

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95723/testReport)** for PR 22318 at commit

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95720/ Test PASSed. ---

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95720/testReport)** for PR 22318 at commit

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95723/testReport)** for PR 22318 at commit

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22318 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95714/ Test FAILed. ---

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22318 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95714/testReport)** for PR 22318 at commit

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22318 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95720/testReport)** for PR 22318 at commit

[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22318 **[Test build #95714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95714/testReport)** for PR 22318 at commit