[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2018-01-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18692
  
> The point of this issue is not performance improvement, but that some (in 
our case automatically generated) queries do not work at all with SPARK, 
whereas there is no problem with these queries in PostgreSQL and MySQL.

I'm surprised to hear this, did you turn on CROSS JOIN via 
`spark.sql.crossJoin.enabled`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2018-01-22 Thread Aklakan
Github user Aklakan commented on the issue:

https://github.com/apache/spark/pull/18692
  
Hi, its unfortunate to see this PR having gotten reverted

@gatorsmile 
> After rethinking about it, we might need to revert this PR. Although it 
converts a CROSS Join to an Inner join, it does not improve the performance.

The point of this issue is not performance improvement, but that (in our 
case *automatically generated queries*) *do not work at all* with SPARK, 
whereas there is no problem with these queries in PostgreSQL and MySQL.

@aokolnychyi 
```IN``` are not equals-constraints (even though they could be expanded: `a 
in (x, y, z) -> a == x OR a == y OR a == z`), so your example does not apply to 
join inference of literals.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
Yeah. That is a wrong case. Let us revisit it if we can find any useful 
case here. Thank you!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18692
  
you are right, then I don't know if there is any valid use case for 
inferring join condition from literals...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-14 Thread aokolnychyi
Github user aokolnychyi commented on the issue:

https://github.com/apache/spark/pull/18692
  
I am not sure we can infer ``a == b`` if ``a in (0, 2, 3, 4)`` and ``b in 
(0, 2, 3, 4)``. 

table 'a'
```
a1 a2
1  2
3  3
4  5
```

table 'b'
```
b1 b2
1  -1
2  -2
3  -4
```

```
SELECT * FROM a, b WHERE a1 in (1, 2) AND b1 in (1, 2)
// 1 2 1 -1
// 1 2 2 -2
```
```
SELECT * FROM a JOIN b ON a.a1 = b.b1 WHERE a1 in (1, 2) AND b1 in (1, 2)
// 1 2 1 -1
```

Do I miss anything?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
@aokolnychyi Could you rethink about it by using some cases like `a in (0, 
2, 3, 4)` and `b in (0, 2, 3, 4)`? and then refer to `a = b`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18692
  
Hi, All.
Since the commit is reverted from the master branch, can we update the 
status of JIRA issue?
- https://issues.apache.org/jira/browse/SPARK-21417


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
Done. Reverted.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
Will do it. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread aokolnychyi
Github user aokolnychyi commented on the issue:

https://github.com/apache/spark/pull/18692
  
Yeah, correct. So, we should revert then.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
Even if we use `BroadcastHashJoin` or `ShuffledHashJoin`, it does not help 
because the identical values on keys just cause the unnecessary work in both, 
right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread aokolnychyi
Github user aokolnychyi commented on the issue:

https://github.com/apache/spark/pull/18692
  
I took a look at ``JoinSelection``. It seems we will not get 
``BroadcastHashJoin`` or ``ShuffledHashJoin`` if we revert this rule.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread aokolnychyi
Github user aokolnychyi commented on the issue:

https://github.com/apache/spark/pull/18692
  
Sure, if you guys think it does not give any performance benefits, then 
let's revert it.

I also had similar concerns but my understanding was that having an inner 
join with some equality condition can be beneficial during the generation of a 
physical plan. In other words, Spark should be able to select a more efficient 
join implementation. I am not sure how it is right now but previously you could 
have only ``BroadcastNestedLoopJoin`` or ``CartesianProduct`` without any 
equality condition. Again, that was my assumption based on what I remember. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18692
  
Yea I have the same feeling. If the left side has a `a = 1` constraint, and 
the right side has a `b = 1` constraint, adding a `a = b` join condition does 
not help as it always evaluate to true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
@aokolnychyi After rethinking about it, we might need to revert this PR. 
Although it converts a CROSS Join to an Inner join, it does not improve the 
performance. What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
LGTM 

Thanks for your patience! It looks much good now. Really appreciate for 
your contributions! Welcome to make more contributions!

Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84351/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #84351 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84351/testReport)**
 for PR 18692 at commit 
[`9ab91a1`](https://github.com/apache/spark/commit/9ab91a19cefd63b7d28674992b68da8164d487ae).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #84351 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84351/testReport)**
 for PR 18692 at commit 
[`9ab91a1`](https://github.com/apache/spark/commit/9ab91a19cefd63b7d28674992b68da8164d487ae).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #84349 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84349/testReport)**
 for PR 18692 at commit 
[`0e5a9f2`](https://github.com/apache/spark/commit/0e5a9f20372b9687538b55dd2915af122afe7cb6).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class EquivalentExpressionMap `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84349/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #84349 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84349/testReport)**
 for PR 18692 at commit 
[`0e5a9f2`](https://github.com/apache/spark/commit/0e5a9f20372b9687538b55dd2915af122afe7cb6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-26 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
Also add a test case for non-deterministic cases.  For example, given the 
left child has `a = rand()` and the right child has `b = rand()`, we should not 
get `a = b`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84177/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #84177 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84177/testReport)**
 for PR 18692 at commit 
[`3e090f9`](https://github.com/apache/spark/commit/3e090f9e4e6be8efb814e2e809a0718a0c670718).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84176/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #84176 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84176/testReport)**
 for PR 18692 at commit 
[`ab7d966`](https://github.com/apache/spark/commit/ab7d966b9fa42c8852da4f3a196ab891b845c1b8).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #84177 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84177/testReport)**
 for PR 18692 at commit 
[`3e090f9`](https://github.com/apache/spark/commit/3e090f9e4e6be8efb814e2e809a0718a0c670718).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #84176 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84176/testReport)**
 for PR 18692 at commit 
[`ab7d966`](https://github.com/apache/spark/commit/ab7d966b9fa42c8852da4f3a196ab891b845c1b8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-23 Thread Aklakan
Github user Aklakan commented on the issue:

https://github.com/apache/spark/pull/18692
  
Hi @aokolnychyi, a on note on @SimonBin 's comment (I am his colleague):

> The initial solution handled your case but then there was a decision to 
restrict the proposed rule to cross joins only.

SimonBin's example appears to work when in addition enabling 
`spark.conf.set("spark.sql.crossJoin.enabled", "true")`

We will investigate further how far this gets us.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-23 Thread SimonBin
Github user SimonBin commented on the issue:

https://github.com/apache/spark/pull/18692
  
@aokolnychyi thank you for the clarification, I see now


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-21 Thread aokolnychyi
Github user aokolnychyi commented on the issue:

https://github.com/apache/spark/pull/18692
  
@SimonBin The initial solution handled your case but then there was a 
decision to restrict the proposed rule to cross joins only. You can find the 
reason in this 
[comment](https://github.com/apache/spark/pull/18692#issuecomment-326694822) or 
in the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-20 Thread SimonBin
Github user SimonBin commented on the issue:

https://github.com/apache/spark/pull/18692
  
Hi, we are very interested in this patch. I wonder if it could detect this 
code automatically, without needing to write the explicit join:

```scala
package net.sansa_stack.spark.playground

import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
import org.scalatest._

class TestSparkSqlJoin extends FlatSpec {

  "SPARK SQL processor" should "be capable of handling transitive join 
conditions" in {

val spark = SparkSession
  .builder()
  .master("local[1]")
  .getOrCreate()

val schema = new StructType()
  .add("s", IntegerType, nullable = true)
  .add("p", IntegerType, nullable = true)
  .add("o", IntegerType, nullable = true)

val data = List((1, 2, 3))
val dataRDD = spark.sparkContext.parallelize(data).map(attributes => 
Row(attributes._1, attributes._2, attributes._3))
spark.createDataFrame(dataRDD, schema).createOrReplaceTempView("T")

spark.sql("SELECT A.s FROM T A, T B WHERE A.s = 1 AND B.s = 
1").explain(true)
  }

}
```


I built this Pull request locally but it still gives me the same issue -->

```
== Physical Plan ==
org.apache.spark.sql.AnalysisException: Detected cartesian product for 
INNER join between logical plans
Project [s#3]
+- Filter (isnotnull(s#3) && (s#3 = 1))
   +- LogicalRDD [s#3, p#4, o#5], false
and
Project
+- Filter (isnotnull(s#25) && (s#25 = 1))
   +- LogicalRDD [s#25, p#26, o#27], false
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these 
relations.;
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82892/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #82892 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82892/testReport)**
 for PR 18692 at commit 
[`b69185c`](https://github.com/apache/spark/commit/b69185c2a7466b69a3f244a257449dbf1dd0ee21).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82891/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #82891 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82891/testReport)**
 for PR 18692 at commit 
[`e0e6ad3`](https://github.com/apache/spark/commit/e0e6ad381fc6f9b0b957e91e7c7df35207190021).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #82892 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82892/testReport)**
 for PR 18692 at commit 
[`b69185c`](https://github.com/apache/spark/commit/b69185c2a7466b69a3f244a257449dbf1dd0ee21).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #82891 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82891/testReport)**
 for PR 18692 at commit 
[`e0e6ad3`](https://github.com/apache/spark/commit/e0e6ad381fc6f9b0b957e91e7c7df35207190021).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-10-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
cc @gengliangwang Review this? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-05 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/18692
  
@cloud-fan : In event when the (set of join keys) is a superset of (child 
node's partitioning keys), its possible to avoid shuffle : 
https://github.com/apache/spark/pull/19054 ... this can help with 2 cases - 
when users unknowingly join over extra columns in addition to bucket columns
- the one you mentioned (ie. inferred conditions).



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18692
  
> After adding the inferred join conditions, it might lead to the child 
node's partitioning NOT satisfying the JOIN node's requirements which otherwise 
could have.

Isn't it an existing problem? the current constraint propagation framework 
infers as many predicates as possible, so we may already hit this problem. I 
think we should revisit the constraint propagation framework to think about how 
to avoid adding more shuffles, instead of stopping improving this framework to 
infer more predicates.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81390/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #81390 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81390/testReport)**
 for PR 18692 at commit 
[`cfeae46`](https://github.com/apache/spark/commit/cfeae46766a6ccb1b1a0113fe41cdb52b16897f3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #81390 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81390/testReport)**
 for PR 18692 at commit 
[`cfeae46`](https://github.com/apache/spark/commit/cfeae46766a6ccb1b1a0113fe41cdb52b16897f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
In this PR, we should limit it to `cartesian product` now. In the future, 
we need to perform smarter when extracting equi-join keys.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-01 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/18692
  
Can we restrict this to cartesian product ONLY ? One clear downside of 
doing this for other joins is that it will potentially add shuffle in case of 
(bucketing queries) and (subqueries in general). After adding the inferred join 
conditions, it might lead to the child node's partitioning NOT satisfying the 
JOIN node's requirements which otherwise could have.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-31 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
Sorry for the delay. @jiangxb1987 will submit a simple fix for the issue 
you mentioned. That will not be a perfect fix but it partially resolve the 
issue. In the future, we need to move the filter removal to a separate batch 
for cost-based optimization instead of doing it with filter inference in the 
same RBO batch. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-31 Thread aokolnychyi
Github user aokolnychyi commented on the issue:

https://github.com/apache/spark/pull/18692
  
@gatorsmile what is our decision here? Shall we wait until SPARK-21652 is 
resolved? In the meantime, I can add some tests and see how the proposed rule 
works together with all others. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
@aokolnychyi Thanks for finding the non-convergent case! Let me see how to 
fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-08 Thread aokolnychyi
Github user aokolnychyi commented on the issue:

https://github.com/apache/spark/pull/18692
  
@gatorsmile I updated the rule to cover cross join cases. Regarding the 
case with the redundant condition mentioned by you, I opened 
[SPARK-21652](https://issues.apache.org/jira/browse/SPARK-21652). It is an 
existing issue and is not caused by the proposed rule. BTW, I can try to fix it 
once we agree on a solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80362/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #80362 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80362/testReport)**
 for PR 18692 at commit 
[`281f955`](https://github.com/apache/spark/commit/281f9555226f62e99d1143295f5bd364fff55fc0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #80362 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80362/testReport)**
 for PR 18692 at commit 
[`281f955`](https://github.com/apache/spark/commit/281f9555226f62e99d1143295f5bd364fff55fc0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org