Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21449
@cloud-fan do you have any further comments about this? Thanks.
---
-
To unsubscribe, e-mail:
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21449
ok so I created https://github.com/apache/spark/pull/21605 for the fix
proposed by @daniel-shields. I'd like to leave this open in order to go on with
the discussion for a long-term better fix.
Github user WenboZhao commented on the issue:
https://github.com/apache/spark/pull/21449
I like the proposal by @daniel-shields. If we could get it fixed soon, we
will be able to catch up the Spark 2.3.2 release.
---
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21449
@daniel-shields do you want to open a PR for that? I'll leave this PR open
as it is a more general fix so we can go on with the long-term discussion here
in this PR. Do you agree with this
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21449
> In the short term we should make the behavior of EqualTo and
EqualNullSafe identical.
This seems pretty safe and reasonable to me
---
Github user daniel-shields commented on the issue:
https://github.com/apache/spark/pull/21449
In the short term we should make the behavior of EqualTo and EqualNullSafe
identical. We could do that by adding a case for EqualNullSafe that mirrors
that of EqualTo.
---
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21449
Sure, thanks for your time.
PS `df.join(df, df("id") >= df("id"))` may be ambiguous, but in the example
above
`df1.join(df2, df2['id'].eqNullSafe(df1['id'])).collect()` where `df1` and
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21449
This will definitely not go into 2.3.1, so we have plenty of time. I'll
think deeper into it after the spark summit.
IMO `df.join(df, df("id") >= df("id"))` is ambiguous, especially when
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21449
I see what you mean. Honestly I have not thought of a full design for this
problem (so I can't state what we should support and what not), but focusing on
this specific case I think that:
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21449
My point is that, we may have a different design if we wanna solve this
problem holistically, which may conflict with this patch. We should prove that
this is in the right direction and future
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21449
Thanks for your comment @cloud-fan. I understand your point. That is quite
a tricky problem, since we should know probably also the "DAG" of the
dataframes in order to take the right decision.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91343/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21449
**[Test build #91343 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91343/testReport)**
for PR 21449 at commit
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21449
This is a long-standing issue, I've seen many attempts to fix it (including
myself) but no one success.
The major problem is, there is no clear definition of the expected
behavior, i.e.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21449
yes @daniel-shields, you are right with your analysis. The problem was that
we were sometimes using `==`, sometimes `semanticEquals`. And `equals` has the
problem you mentioned.
I think
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3732/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21449
**[Test build #91343 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91343/testReport)**
for PR 21449 at commit
Github user daniel-shields commented on the issue:
https://github.com/apache/spark/pull/21449
@mgaido91 I looked at the test failures and I think the changes to the
Dataset,resolve method are causing havoc. Consider the Dataset.drop method with
the following signature:
` def
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91303/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21449
**[Test build #91303 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91303/testReport)**
for PR 21449 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3704/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21449
**[Test build #91303 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91303/testReport)**
for PR 21449 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21449
**[Test build #91298 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91298/testReport)**
for PR 21449 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91298/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3702/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21449
**[Test build #91298 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91298/testReport)**
for PR 21449 at commit
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21449
thanks @daniel-shields , you're right. I am working to check if and how
this can be fixed. Thanks for your catch!
---
-
To
Github user daniel-shields commented on the issue:
https://github.com/apache/spark/pull/21449
This case can also occur when the datasets are different but share a common
lineage. Consider the following:
`df = spark.range(10)
df1 = df.groupby('id').count()
df2 =
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21449
@daniel-shields in that case you have 2 different datasets `df1` and `df2`.
So they are 2 distinct attributes and the check `a.sameRef(b)` would return
false. This is applied only in case you have
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91253/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21449
**[Test build #91253 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91253/testReport)**
for PR 21449 at commit
Github user daniel-shields commented on the issue:
https://github.com/apache/spark/pull/21449
I'm not sure that this behavior should be applied to all binary
comparisons. It could result in unexpected behavior in some rare cases. For
example:
`df1.join(df2, df2['x'] < df1['x'])`
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3666/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21449
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21449
**[Test build #91253 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91253/testReport)**
for PR 21449 at commit
42 matches
Mail list logo