[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-07-16 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21449
  
@cloud-fan do you have any further comments about this? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-21 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21449
  
ok so I created https://github.com/apache/spark/pull/21605 for the fix 
proposed by @daniel-shields. I'd like to leave this open in order to go on with 
the discussion for a long-term better fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-21 Thread WenboZhao
Github user WenboZhao commented on the issue:

https://github.com/apache/spark/pull/21449
  
I like the proposal by @daniel-shields. If we could get it fixed soon, we 
will be able to catch up the Spark 2.3.2 release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-06 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21449
  
@daniel-shields do you want to open a PR for that? I'll leave this PR open 
as it is a more general fix so we can go on with the long-term discussion here 
in this PR. Do you agree with this approach @cloud-fan ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21449
  
> In the short term we should make the behavior of EqualTo and 
EqualNullSafe identical.

This seems pretty safe and reasonable to me


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-05 Thread daniel-shields
Github user daniel-shields commented on the issue:

https://github.com/apache/spark/pull/21449
  
In the short term we should make the behavior of EqualTo and EqualNullSafe 
identical. We could do that by adding a case for EqualNullSafe that mirrors 
that of EqualTo.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-04 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21449
  
Sure, thanks for your time.

PS `df.join(df, df("id") >= df("id"))` may be ambiguous, but in the example 
above
`df1.join(df2, df2['id'].eqNullSafe(df1['id'])).collect()` where `df1` and 
`df2` are created from the same dataframe it is not ambiguous IMHO.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21449
  
This will definitely not go into 2.3.1, so we have plenty of time. I'll 
think deeper into it after the spark summit.

IMO `df.join(df, df("id") >= df("id"))` is ambiguous, especially when it's 
not an inner join.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-03 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21449
  
I see what you mean. Honestly I have not thought of a full design for this 
problem (so I can't state what we should support and what not), but focusing on 
this specific case I think that:

 - at the moment we do support self-joins (at least in the case 
`df.join(df, df("id") >= df("id"))`) so considering this invalid would cause a 
big behavior change (potentially causing user workflows to break).
 - even though we might consider acceptable such a change in a major 
release, I think that we should support with the Dataframe API what we support 
in the SQL API, and SQL standard supports self joins (using aliases for the 
tables). So I do believe we should support this use case.
 - the case presented by @daniel-shields in 
https://github.com/apache/spark/pull/21449#issuecomment-392947474, I think is a 
valid one without any doubt. As of now we are not supporting it, though.

So I think that in the holistic approach we shouldn't change the current 
behavior/approach which is present now and will be (IMHO) improved by this 
patch.

What I do think we have to discuss in order not to have to change it - once 
we want to solve the more generic issue - is the way to track the dataset an 
attribute is coming from. Here I decided to use the metadata, since I thought 
this is the cleanest approach. Another approach might be to introduce a new 
`Option` in the `AttributeReference` a reference to the dataset it is coming 
from.
For the generic solution, this might have the advantage that having a 
reference to the provenance dataset, where we might want to store some kind of 
DAG of the datasets this one is coming from in order to take more complex 
decision about the validity of the syntax and/or the resolution of the 
attribute.

What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-02 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21449
  
My point is that, we may have a different design if we wanna solve this 
problem holistically, which may conflict with this patch. We should prove that 
this is in the right direction and future fix will not conflict with it, or 
come out with the final fix directly.

An example is, we may want to treat `df.join(df, df("id") >= df("id"))` as 
invalid in the final design.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-01 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21449
  
Thanks for your comment @cloud-fan. I understand your point. That is quite 
a tricky problem, since we should know probably also the "DAG" of the 
dataframes in order to take the right decision.

But despite this change is related to that problem, I think it is different 
and with a much smaller scope. Indeed, while we can use the metadata 
information in many places, actually in this patch is is used only in the 
self-join case when there is ambiguity in which column to take. The behavior in 
any other case in unchanged.

So after this patch, the situation in resolving column using `col` is 
unchanged. The only places where the dataset of provenance is checked is in 
self joins. The goal here is only to support cases which were throwing 
exceptions in resolving the right column.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91343/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21449
  
**[Test build #91343 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91343/testReport)**
 for PR 21449 at commit 
[`8e6e5c0`](https://github.com/apache/spark/commit/8e6e5c0059574c1e171e589fcf533c6b5669499f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-31 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21449
  
This is a long-standing issue, I've seen many attempts to fix it (including 
myself) but no one success.

The major problem is, there is no clear definition of the expected 
behavior, i.e. what's the semantic of `Dataset.col`?

some examples
```
df.select(df.col("i")) // valid

val df1 = df.filter(...)
df1.select(df.col("i")) // still valid

df.join(otherDF, df.col("i") === otherDF.col("i")) // valid

df.join(otherDF).select(df.col("i"), otherDF("i"))  // valid

val df2 = df.select(df.col("i") + 1)
df2.select(df.col("i"))   // invalid
```

Sometime we can use an ancestor's column in a new Dataset but sometimes we 
can't. We should make the condition clear first.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-31 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21449
  
yes @daniel-shields, you are right with your analysis. The problem was that 
we were sometimes using `==`, sometimes `semanticEquals`. And `equals` has the 
problem you mentioned.

I think this is the only way for addressing the problem described here is 
to reference which dataset the column is coming from. I think adding a metadata 
for it is the cleanest way. We may also add a new attribute to the `Attribute` 
class instead of using metadata, but honestly this way seemed cleaner to me. 
What do you think? Do you have other suggestions?

cc @cloud-fan @hvanhovell @gatorsmile  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3732/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21449
  
**[Test build #91343 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91343/testReport)**
 for PR 21449 at commit 
[`8e6e5c0`](https://github.com/apache/spark/commit/8e6e5c0059574c1e171e589fcf533c6b5669499f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread daniel-shields
Github user daniel-shields commented on the issue:

https://github.com/apache/spark/pull/21449
  
@mgaido91 I looked at the test failures and I think the changes to the 
Dataset,resolve method are causing havoc. Consider the Dataset.drop method with 
the following signature:
` def drop(col: Column): DataFrame`
There is a statement that may be comparing an AttributeReference with the 
new metadata to one without it:
```
val colsAfterDrop = attrs.filter { attr =>
  attr != expression
}.map(attr => Column(attr))```
This may be resulting in columns not getting dropped. I haven't verified 
but this is the first thing I would check. The change to resolve may be too 
drastic.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91303/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21449
  
**[Test build #91303 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91303/testReport)**
 for PR 21449 at commit 
[`b8d5057`](https://github.com/apache/spark/commit/b8d50570b7b172ef310fdfb12b01be1598ff5481).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3704/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21449
  
**[Test build #91303 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91303/testReport)**
 for PR 21449 at commit 
[`b8d5057`](https://github.com/apache/spark/commit/b8d50570b7b172ef310fdfb12b01be1598ff5481).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21449
  
**[Test build #91298 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91298/testReport)**
 for PR 21449 at commit 
[`e8a5fa3`](https://github.com/apache/spark/commit/e8a5fa33d56187a6e30e81ba9439cd097fff5b2c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91298/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3702/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21449
  
**[Test build #91298 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91298/testReport)**
 for PR 21449 at commit 
[`e8a5fa3`](https://github.com/apache/spark/commit/e8a5fa33d56187a6e30e81ba9439cd097fff5b2c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-30 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21449
  
thanks @daniel-shields , you're right. I am working to check if and how 
this can be fixed. Thanks for your catch!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread daniel-shields
Github user daniel-shields commented on the issue:

https://github.com/apache/spark/pull/21449
  
This case can also occur when the datasets are different but share a common 
lineage. Consider the following:
`df = spark.range(10)
df1 = df.groupby('id').count()
df2 = df.groupby('id').sum('id')
df1.join(df2, df2['id'].eqNullSafe(df1['id'])).collect()`
This currently fails with eqNullSafe, but works with ==.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21449
  
@daniel-shields in that case you have 2 different datasets `df1` and `df2`. 
So they are 2 distinct attributes and the check `a.sameRef(b)` would return 
false. This is applied only in case you have self-joins, ie. you have the same 
dataset on both sides.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91253/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21449
  
**[Test build #91253 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91253/testReport)**
 for PR 21449 at commit 
[`92cb513`](https://github.com/apache/spark/commit/92cb513416c5dd0e9fa690c25cfae0565471a5e1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread daniel-shields
Github user daniel-shields commented on the issue:

https://github.com/apache/spark/pull/21449
  
I'm not sure that this behavior should be applied to all binary 
comparisons. It could result in unexpected behavior in some rare cases. For 
example:
`df1.join(df2, df2['x'] < df1['x'])`
If 'x' is ambiguous, this would result in the conditional being flipped.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3666/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21449
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21449
  
**[Test build #91253 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91253/testReport)**
 for PR 21449 at commit 
[`92cb513`](https://github.com/apache/spark/commit/92cb513416c5dd0e9fa690c25cfae0565471a5e1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org