Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22318
In the example @viirya described above
(https://github.com/apache/spark/pull/22318#issuecomment-426317617), I think
the interpretation is unclear to most users and I'm fairly concerned that it
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@maropu, I don't think we can. Actually this is how we deal with [simpler
joins](https://github.com/apache/spark/pull/22318#issuecomment-427080091)
Do you think changing the behaviour is
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22318
@peter-toth we cannot fix the issue of the description without changing
[the existing
behaviour](https://github.com/apache/spark/pull/22318#issuecomment-426317617)?
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22318
@srowen the change seems fine to me as I think this does improve the
behavior of attributes deduplication (I think it was a bug not rewriting the
join condition with the new references).
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/22318
Oh I see there was indeed more discussion on this, and it does relate to
resolving columns to joined DataFrames. I don't know enough to bless this
change, but it seems reasonable. @maropu approves,
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@srowen, I saw your last comment on
https://github.com/peter-toth/spark/tree/SPARK-25150. I submitted this PR to
solve that ticket and I believe the description here explains what is the real
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
Also please consider that currently (and also after this PR) using `b` and
`c` from the description:
```
b.join(c, b("id") === b("id"), "inner")
```
is interpreted as
```
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
Thanks @viirya, your analysis is correct.
Unfortunately an attribute doesn't have a reference to its dataset so I
don't think this scenario can be solved easily. I believe the good
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/22318
For a query similar to the one in the PR description:
`a.join(b, a("id") === b("id"), "inner").join(c, a("id") === b("id"),
"inner"`
It is interpreted as a cartesian products now.
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@cloud-fan could you please help me with this PR and take it one step
forward?
---
-
To unsubscribe, e-mail:
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@cloud-fan, does the new description defines the scope as you suggested?
Is there anything I can add to this PR?
---
-
To
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@cloud-fan , I added some explanation to the description in which cases
this PR helps and also where it doesn't.
---
-
To
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22318
Can you define the scope of this PR? In which case we should change the
references in the join condition?
---
-
To
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95854/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22318
**[Test build #95854 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95854/testReport)**
for PR 22318 at commit
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@cloud-fan this PR doesn't solve that question.
There are some hacks in `Dataset.join` to handle `EqualTo` and
`EqualNullSafe` with duplicated attributes and those hacks are still required
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22318
How does this work? When we have duplicated attributes in the join
condition, how can we know which attribute comes from which side?
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22318
**[Test build #95854 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95854/testReport)**
for PR 22318 at commit
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22318
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22318
To make sure we have no regression by this change, I checked the
`Analyzer$ResolveReferences`time in TPCDS queries. But, I didn't find
actual performance regression.
---
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22318
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95757/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22318
**[Test build #95757 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95757/testReport)**
for PR 22318 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22318
**[Test build #95757 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95757/testReport)**
for PR 22318 at commit
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22318
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95750/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22318
**[Test build #95750 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95750/testReport)**
for PR 22318 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22318
**[Test build #95750 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95750/testReport)**
for PR 22318 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95723/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22318
**[Test build #95723 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95723/testReport)**
for PR 22318 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95720/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22318
**[Test build #95720 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95720/testReport)**
for PR 22318 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22318
**[Test build #95723 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95723/testReport)**
for PR 22318 at commit
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/22318
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95714/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22318
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22318
**[Test build #95714 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95714/testReport)**
for PR 22318 at commit
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22318
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22318
**[Test build #95720 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95720/testReport)**
for PR 22318 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22318
**[Test build #95714 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95714/testReport)**
for PR 22318 at commit
46 matches
Mail list logo