subject:"\[GitHub\] spark pull request\: \[SPARK\-9372\] \[SQL\] For joins, insert IS NOT NU..."

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2016-05-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10209


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2016-05-06 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-217503882
  
Is this already fixed by https://github.com/apache/spark/pull/7768 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163710755
  
**[Test build #47531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47531/consoleFull)**
 for PR 10209 at commit 
[`fb562fb`](https://github.com/apache/spark/commit/fb562fb67a761276456b14a81513f3fc69a6ead8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47267922
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -122,11 +122,22 @@ case class Except(left: LogicalPlan, right: 
LogicalPlan) extends SetOperation(le
   override def output: Seq[Attribute] = left.output
 }
 
+object Join {
+  def apply(
+left: LogicalPlan,
+right: LogicalPlan,
+joinType: JoinType,
+condition: Option[Expression]): Join = {
+Join(left, right, joinType, condition, None)
+  }
+}
+
 case class Join(
   left: LogicalPlan,
   right: LogicalPlan,
   joinType: JoinType,
-  condition: Option[Expression]) extends BinaryNode {
+  condition: Option[Expression],
+  generatedExpressions: Option[EquivalentExpressions]) extends BinaryNode {
--- End diff --

This is semi-public API cause I think some advanced projects do dig into 
catalyst and we've never changed the signature of something as basic as `Join` 
before.  Could we do this instead by fixing nullablity propagation and only 
inserting the filter if the attribute is `nullable`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47268378
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -99,6 +99,13 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
*/
   lazy val resolved: Boolean = expressions.forall(_.resolved) && 
childrenResolved
 
+  /**
+   * Returns true if the two plans are semantically equal. This should 
ignore state generated
+   * during planning to help the planning process.
+   * TODO: implement this as a pass that canonicalizes the plan tree 
instead?
+   */
+  def semanticEquals(other: LogicalPlan): Boolean = this == other
--- End diff --

Oh, this is a new semantic equals.  How is this different than 
`sameResult`?  Maybe we should unify the naming between Expression and 
LogicalPlan for this concept.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47268048
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala 
---
@@ -43,7 +43,7 @@ abstract class PlanTest extends SparkFunSuite {
   protected def comparePlans(plan1: LogicalPlan, plan2: LogicalPlan) {
 val normalized1 = normalizeExprIds(plan1)
 val normalized2 = normalizeExprIds(plan2)
-if (normalized1 != normalized2) {
+if (!normalized1.semanticEquals(normalized2)) {
--- End diff --

Existing: do we need this hacky normalization logic above anymore?  I don't 
think `semanticEquals` existed when I wrote this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163732061
  
**[Test build #47531 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47531/consoleFull)**
 for PR 10209 at commit 
[`fb562fb`](https://github.com/apache/spark/commit/fb562fb67a761276456b14a81513f3fc69a6ead8).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163732157
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47531/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163732156
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47291821
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -122,11 +122,22 @@ case class Except(left: LogicalPlan, right: 
LogicalPlan) extends SetOperation(le
   override def output: Seq[Attribute] = left.output
 }
 
+object Join {
+  def apply(
+left: LogicalPlan,
+right: LogicalPlan,
+joinType: JoinType,
+condition: Option[Expression]): Join = {
+Join(left, right, joinType, condition, None)
+  }
+}
+
 case class Join(
   left: LogicalPlan,
   right: LogicalPlan,
   joinType: JoinType,
-  condition: Option[Expression]) extends BinaryNode {
+  condition: Option[Expression],
+  generatedExpressions: Option[EquivalentExpressions]) extends BinaryNode {
--- End diff --

In an earlier version of catalyst we also had equivalence classes propagate 
up the logical plans.  Would that give you enough information?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47291050
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -122,11 +122,22 @@ case class Except(left: LogicalPlan, right: 
LogicalPlan) extends SetOperation(le
   override def output: Seq[Attribute] = left.output
 }
 
+object Join {
+  def apply(
+left: LogicalPlan,
+right: LogicalPlan,
+joinType: JoinType,
+condition: Option[Expression]): Join = {
+Join(left, right, joinType, condition, None)
+  }
+}
+
 case class Join(
   left: LogicalPlan,
   right: LogicalPlan,
   joinType: JoinType,
-  condition: Option[Expression]) extends BinaryNode {
+  condition: Option[Expression],
+  generatedExpressions: Option[EquivalentExpressions]) extends BinaryNode {
--- End diff --

I started down that path but can you think of a way to make that handle the 
more general case of predicate propagation?

t1.key join t2.key where t1.key = t2.key and t1.key = 5.

How do we generate the predicate t2.key = 5? how do we make this more 
general? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47293719
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -122,11 +122,22 @@ case class Except(left: LogicalPlan, right: 
LogicalPlan) extends SetOperation(le
   override def output: Seq[Attribute] = left.output
 }
 
+object Join {
+  def apply(
+left: LogicalPlan,
+right: LogicalPlan,
+joinType: JoinType,
+condition: Option[Expression]): Join = {
+Join(left, right, joinType, condition, None)
+  }
+}
+
 case class Join(
   left: LogicalPlan,
   right: LogicalPlan,
   joinType: JoinType,
-  condition: Option[Expression]) extends BinaryNode {
+  condition: Option[Expression],
+  generatedExpressions: Option[EquivalentExpressions]) extends BinaryNode {
--- End diff --

Equivalence classes is one thing, we can compute that no problem I think. 
The issue is how to remember that t2.key = 5 was generated and not to generate 
it again. The trick of setting nullable doesn't work here. We could maintain 
value constraints (where nullability is a subset).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47294889
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -122,11 +122,22 @@ case class Except(left: LogicalPlan, right: 
LogicalPlan) extends SetOperation(le
   override def output: Seq[Attribute] = left.output
 }
 
+object Join {
+  def apply(
+left: LogicalPlan,
+right: LogicalPlan,
+joinType: JoinType,
+condition: Option[Expression]): Join = {
+Join(left, right, joinType, condition, None)
+  }
+}
+
 case class Join(
   left: LogicalPlan,
   right: LogicalPlan,
   joinType: JoinType,
-  condition: Option[Expression]) extends BinaryNode {
+  condition: Option[Expression],
+  generatedExpressions: Option[EquivalentExpressions]) extends BinaryNode {
--- End diff --

Couldn't `Literal(5)` be in the equivalence class and we could check for 
that?

That said, I also like the idea more general value constraints.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47309024
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -122,11 +122,22 @@ case class Except(left: LogicalPlan, right: 
LogicalPlan) extends SetOperation(le
   override def output: Seq[Attribute] = left.output
 }
 
+object Join {
+  def apply(
+left: LogicalPlan,
+right: LogicalPlan,
+joinType: JoinType,
+condition: Option[Expression]): Join = {
+Join(left, right, joinType, condition, None)
+  }
+}
+
 case class Join(
   left: LogicalPlan,
   right: LogicalPlan,
   joinType: JoinType,
-  condition: Option[Expression]) extends BinaryNode {
+  condition: Option[Expression],
+  generatedExpressions: Option[EquivalentExpressions]) extends BinaryNode {
--- End diff --

Literal(5) would work for equivalence but we want to track more than 
equality. If it was t1.key join t2.key where t1.key = t2.key and t1.key > 5, 
we'd similarly want to add t2.key > 5. 

Are you suggesting we don't change the operator and walk the tree bottom up 
to collect these constraints? This seems extremely expensive to do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47314107
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -122,11 +122,22 @@ case class Except(left: LogicalPlan, right: 
LogicalPlan) extends SetOperation(le
   override def output: Seq[Attribute] = left.output
 }
 
+object Join {
+  def apply(
+left: LogicalPlan,
+right: LogicalPlan,
+joinType: JoinType,
+condition: Option[Expression]): Join = {
+Join(left, right, joinType, condition, None)
+  }
+}
+
 case class Join(
   left: LogicalPlan,
   right: LogicalPlan,
   joinType: JoinType,
-  condition: Option[Expression]) extends BinaryNode {
+  condition: Option[Expression],
+  generatedExpressions: Option[EquivalentExpressions]) extends BinaryNode {
--- End diff --

I was thinking operators would propagate the set of constraints up from 
their children (possibly augmenting or clearing as appropriate) and we'd save 
it in a `lazy val`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47226377
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -190,14 +190,15 @@ trait CheckAnalysis {
  | 
${exprs.map(_.prettyString).mkString(",")}""".stripMargin)
 
   // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if 
left.outputSet.intersect(right.outputSet).nonEmpty =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-failAnalysis(
-  s"""
- |Failure when resolving conflicting references in Join:
- |$plan
- |Conflicting attributes: 
${conflictingAttributes.mkString(",")}
- |""".stripMargin)
+  case j @ Join(left, right, _, _, _)
+if left.outputSet.intersect(right.outputSet).nonEmpty =>
--- End diff --

nit: we can use `if !j.selfJoinResolved`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47227527
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -356,6 +360,51 @@ object LikeSimplification extends Rule[LogicalPlan] {
 }
 
 /**
+ * This rule adds IsNotNull predicates based on join keys. If the join 
contains a condition
+ * `a` binaryOp *, a is non-nullable. This adds filters for those 
attributes to the children.
+ *
+ * To avoid the problem of repeatedly generating the IsNotNull predicates, 
the join operator
+ * remembers all the expressions it generated.
+ */
+object AddJoinKeyNullabilityFilters extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case j @ Join(left, right, Inner, Some(condition), generated) => {
+  val nonNullableKeys = ExtractNonNullableAttributes.unapply(condition)
--- End diff --

nit: `val ExtractNonNullableAttributes(nonNullableKeys) = condition`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10209#discussion_r47228437
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -356,6 +360,51 @@ object LikeSimplification extends Rule[LogicalPlan] {
 }
 
 /**
+ * This rule adds IsNotNull predicates based on join keys. If the join 
contains a condition
+ * `a` binaryOp *, a is non-nullable. This adds filters for those 
attributes to the children.
+ *
+ * To avoid the problem of repeatedly generating the IsNotNull predicates, 
the join operator
+ * remembers all the expressions it generated.
+ */
+object AddJoinKeyNullabilityFilters extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case j @ Join(left, right, Inner, Some(condition), generated) => {
+  val nonNullableKeys = ExtractNonNullableAttributes.unapply(condition)
--- End diff --

oh nvm, I think your version looks more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-10 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163627935
  
This looks very cool!
But I'm a little worried about making the `Join` stateful, how about adding 
a new batch in `Optimizer` and only execute it `Once`? So that we can avoid 
having the join regenerate the filter as your rule will be applied after all 
other optimization rules.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163514567
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163514549
  
**[Test build #47488 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47488/consoleFull)**
 for PR 10209 at commit 
[`00e957c`](https://github.com/apache/spark/commit/00e957c83ca2983941170e503e926535f2c8a91c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163514568
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47488/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163507134
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163507570
  
**[Test build #47488 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47488/consoleFull)**
 for PR 10209 at commit 
[`00e957c`](https://github.com/apache/spark/commit/00e957c83ca2983941170e503e926535f2c8a91c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163507143
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47487/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163065568
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47375/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163065565
  
**[Test build #47375 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47375/consoleFull)**
 for PR 10209 at commit 
[`565d15e`](https://github.com/apache/spark/commit/565d15ec8f0c319a97370b133317c11140da49b4).
 * This patch **fails to build**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163065567
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-08 Thread nongli

GitHub user nongli opened a pull request:

https://github.com/apache/spark/pull/10209

[SPARK-9372] [SQL] For joins, insert IS NOT NULL filters to children.

Some join types and conditions imply that the join keys cannot be NULL and
can be filtered out by the children. This patch does this for inner joins
and introduces a mechanism to generate predicates. The complex part of doing
this is to make sure the transformation is stable. The problem that we want
to avoid is generating a filter in the join, having that pushed down and 
then
having the join regenerate the filter.

This patch solves this by having the join remember predicates that it has
generated. This mechanism should be general enough that we can infer other
predicates, for example "a join b where a.id = b.id AND a.id = 10" could
also use this mechanism to generate the predicate "b.id = 10".

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nongli/spark spark-9372

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10209


commit 565d15ec8f0c319a97370b133317c11140da49b4
Author: Nong Li 
Date:   2015-12-08T23:50:25Z

[SPARK-9372] [SQL] For joins, insert IS NOT NULL filters to children.

Some join types and conditions imply that the join keys cannot be NULL and
can be filtered out by the children. This patch does this for inner joins
and introduces a mechanism to generate predicates. The complex part of doing
this is to make sure the transformation is stable. The problem that we want
to avoid is generating a filter in the join, having that pushed down and 
then
having the join regenerate the filter.

This patch solves this by having the join remember predicates that it has
generated. This mechanism should be general enough that we can infer other
predicates, for example "a join b where a.id = b.id AND a.id = 10" could
also use this mechanism to generate the predicate "b.id = 10".




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

2015-12-08 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10209#issuecomment-163062943
  
**[Test build #47375 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47375/consoleFull)**
 for PR 10209 at commit 
[`565d15e`](https://github.com/apache/spark/commit/565d15ec8f0c319a97370b133317c11140da49b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...

30 matches

Site Navigation

Mail list logo

Footer information