[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2018-07-16 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 Seems we can't get an agreement on this topic, so I'd close this for now. --- - To unsubscribe, e-mail:

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 > The order is different from the original one that is evaluated in the join conditions. I'm not sure what original order you meant. By pulling out to `Project`, they are evaluated by their

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-23 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 The order is different from the original one that is evaluated in the join conditions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18652 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81054/ Test PASSed. ---

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18652 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #81054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81054/testReport)** for PR 18652 at commit

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-23 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 Join [t1.a = rand(t2.b), t1.c = rand(t2.d)] Sort Project [t1.a, t1.c] TableScan t1 Sort Project [rand(t2.b) as rand(t2.b),

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #81054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81054/testReport)** for PR 18652 at commit

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-23 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 We could add a `Sort` above the `Project` and the orders become different, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-23 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 @cloud-fan @gatorsmile More thoughts or comments for this change? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 When we join two tables, given there are equi-join keys, and they are non-deterministic, for example `t1.a = rand(t2.b)` and `t1.c = rand(t2.d)`. We pull out them to downstream project:

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 Did not get your point. Could you just give an example why the non-deterministic expressions are always evaluated in the same order no matter which join types are chosen during the physical

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 Once we pull out them into downstream project, should we still worry about call orders? They are evaluated before sort or shuffle added later. --- If your project is set up for it, you can reply to

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 You are talking about the number of calls. I am worrying about the call orders. We could add a `SORT`. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-12 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 > Why equi-join is free from the issues? Assume the equi-join predicates are in the form like `t1.a = rand(t2.b) && t1.c = rand(t2.d)`. When we compare the equi-join keys `(t1.a, t1.c)` and

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18652 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18652 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80376/ Test PASSed. ---

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #80376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80376/testReport)** for PR 18652 at commit

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-07 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 > As said in previous discussion, we can't avoid few issues regarding non-deterministic non equi join condition. We can simply allow it, but it faces inconsistency due to different join

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #80376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80376/testReport)** for PR 18652 at commit

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-08-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 @gatorsmile @cloud-fan Do you have more comments or thoughts on this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-28 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 @baibaichen when we do so, I think the result is not as same as Hive's join result. Is it still useful? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-28 Thread baibaichen
Github user baibaichen commented on the issue: https://github.com/apache/spark/pull/18652 can we add a flag i.e. ignore-non-deterministic , so that we can treat non-deterministic as deterministic, I believe this is what hive does. --- If your project is set up for it, you can reply

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-28 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 @gatorsmile Ok. No problem. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-28 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 Let me talk with more people to get the feedbacks. Will respond you later. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-27 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 @gatorsmile Actually it is not rare we add a feature step by step in SparkSQL. This is not a reason preventing us from adding this support. I think this change already help much this kind of

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-27 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 I think the goal is just to resolve the migration issues for Hive users. If we just provide a very limited support, I do not think it can help the workload migration. If we really want

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-27 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 Yea, for the case with non-deterministic non equi join conditions, you'd face the issue of changing the number of calls. So I currently plan not to support it here. --- If your project is set up

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-27 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18652 yea I know that, I'm thinking about if we need to change it by considering the position. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-27 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 No, I don't think it's true. I think we don't consider the position of equi join condition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-27 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18652 I mean, `t1.a = t2.b` before non-determinictic condition is an equi join condition, but `t1.a = t2.b` after non-determinictic condition is not. Is this true? --- If your project is set up for

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-27 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 `t1.a = t2.b` is an equi join condition. `t1.c > rand()` is not. They will be split and considered individually. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-27 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18652 Can we say that, `t1.a = t2.b && t1.c > rand()` is a equal-join condition, but `t1.c > rand() && t1.a = t2.b` is not? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-27 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 Btw, I guess that is why we also pull non-deterministic grouping expressions for Aggregate? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-27 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 If we simply allow it, the evaluation order of non-deterministic join conditions will be different on different join implementation, e.g. Sort-based and Hash-based. Then we will get inconsistent

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-27 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18652 What if we simply allow non-deterministic join condition? Since we allow non-deterministic filter condition, we should do this for join condition too? --- If your project is set up for it, you

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-26 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 ping @cloud-fan Can you have time to review this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not