[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16067


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16067#discussion_r90177599
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1697,6 +1697,12 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   expr = "cast((_1 + _2) as boolean)", expectedNonNullableColumns = 
Seq("_1", "_2"))
   }
 
+  test("SPARK-17897: Fixed IsNotNull Constraint Inference Rule") {
+val data = Seq[java.lang.Integer](1, null).toDF("key")
+checkAnswer(data.filter("not key is not null"), Row(null))
--- End diff --

sure. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16067#discussion_r90177515
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: 
QueryPlan[PlanType]] extends TreeNode[PlanT
   }
 
   /**
+   * Infer the Attribute-specific IsNotNull constraints from the null 
intolerant child expressions
+   * of constraints.
+   */
+  private def inferIsNotNullConstraints(constraint: Expression): 
Seq[Expression] =
+constraint match {
+  case IsNotNull(_: Attribute) => constraint :: Nil
--- End diff --

Yeah, my original idea is to do a fast stop. After rethinking it, it might 
be fine. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16067#discussion_r90177567
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: 
QueryPlan[PlanType]] extends TreeNode[PlanT
   }
 
   /**
+   * Infer the Attribute-specific IsNotNull constraints from the null 
intolerant child expressions
+   * of constraints.
+   */
+  private def inferIsNotNullConstraints(constraint: Expression): 
Seq[Expression] =
+constraint match {
+  case IsNotNull(_: Attribute) => constraint :: Nil
+  // When the root is IsNotNull, we can push IsNotNull through the 
child null intolerant
+  // expressions
+  case IsNotNull(expr) => 
scanNullIntolerantExpr(expr).map(IsNotNull(_))
+  // Constraints always return true for all the inputs. That means, 
null will never be returned.
+  // Thus, we can infer `IsNotNull(constraint)`, and also push 
IsNotNull through the child
+  // null intolerant expressions.
+  case _ => scanNullIntolerantExpr(constraint).map(IsNotNull(_))
+}
+
+  /**
* Recursively explores the expressions which are null intolerant and 
returns all attributes
* in these expressions.
*/
   private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] = 
expr match {
--- End diff --

Sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16067#discussion_r90177275
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1697,6 +1697,12 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   expr = "cast((_1 + _2) as boolean)", expectedNonNullableColumns = 
Seq("_1", "_2"))
   }
 
+  test("SPARK-17897: Fixed IsNotNull Constraint Inference Rule") {
+val data = Seq[java.lang.Integer](1, null).toDF("key")
+checkAnswer(data.filter("not key is not null"), Row(null))
--- End diff --

shall we use DataFrame API? i.e. `data.filter(!$"key".isNotNull)`. The 
string version looks weird...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16067#discussion_r90176972
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: 
QueryPlan[PlanType]] extends TreeNode[PlanT
   }
 
   /**
+   * Infer the Attribute-specific IsNotNull constraints from the null 
intolerant child expressions
+   * of constraints.
+   */
+  private def inferIsNotNullConstraints(constraint: Expression): 
Seq[Expression] =
+constraint match {
+  case IsNotNull(_: Attribute) => constraint :: Nil
+  // When the root is IsNotNull, we can push IsNotNull through the 
child null intolerant
+  // expressions
+  case IsNotNull(expr) => 
scanNullIntolerantExpr(expr).map(IsNotNull(_))
+  // Constraints always return true for all the inputs. That means, 
null will never be returned.
+  // Thus, we can infer `IsNotNull(constraint)`, and also push 
IsNotNull through the child
+  // null intolerant expressions.
+  case _ => scanNullIntolerantExpr(constraint).map(IsNotNull(_))
+}
+
+  /**
* Recursively explores the expressions which are null intolerant and 
returns all attributes
* in these expressions.
*/
   private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] = 
expr match {
--- End diff --

shall we rename it to `scanNullIntolerantAttribute`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16067#discussion_r90176867
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: 
QueryPlan[PlanType]] extends TreeNode[PlanT
   }
 
   /**
+   * Infer the Attribute-specific IsNotNull constraints from the null 
intolerant child expressions
+   * of constraints.
+   */
+  private def inferIsNotNullConstraints(constraint: Expression): 
Seq[Expression] =
+constraint match {
+  case IsNotNull(_: Attribute) => constraint :: Nil
--- End diff --

we don't this case, I think it can be covered by the next case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16067#discussion_r90169798
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: 
QueryPlan[PlanType]] extends TreeNode[PlanT
   }
 
   /**
+   * Infer the Attribute-specific IsNotNull constraints from the null 
intolerant child expressions
+   * of constraints.
+   */
+  private def inferIsNotNullConstraints(constraint: Expression): 
Seq[Expression] =
--- End diff --

Yeah.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16067#discussion_r90168753
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: 
QueryPlan[PlanType]] extends TreeNode[PlanT
   }
 
   /**
+   * Infer the Attribute-specific IsNotNull constraints from the null 
intolerant child expressions
+   * of constraints.
+   */
+  private def inferIsNotNullConstraints(constraint: Expression): 
Seq[Expression] =
--- End diff --

Yes. After this PR, we do not support it. This is a pretty rare case, 
right? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16067#discussion_r90167277
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: 
QueryPlan[PlanType]] extends TreeNode[PlanT
   }
 
   /**
+   * Infer the Attribute-specific IsNotNull constraints from the null 
intolerant child expressions
+   * of constraints.
+   */
+  private def inferIsNotNullConstraints(constraint: Expression): 
Seq[Expression] =
--- End diff --

This change simply ignores all `IsNotNull`s which are not the top 
expression. The above case works because `Filter` splits it. But if the 
constraint looks like `Cast(IsNotNull(a), Integer) == 1`, we won't infer 
`IsNotNull(a)` from it, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16067#discussion_r90164208
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
 ---
@@ -351,6 +351,15 @@ class ConstraintPropagationSuite extends SparkFunSuite 
{
 IsNotNull(IsNotNull(resolveColumn(tr, "b"))),
 IsNotNull(resolveColumn(tr, "a")),
 IsNotNull(resolveColumn(tr, "c")
+
+verifyConstraints(
+  tr.where('a.attr === 1 && IsNotNull(resolveColumn(tr, "b")) &&
+IsNotNull(resolveColumn(tr, "c"))).analyze.constraints,
+  ExpressionSet(Seq(
+resolveColumn(tr, "a") === 1,
+IsNotNull(resolveColumn(tr, "c")),
+IsNotNull(resolveColumn(tr, "a")),
+IsNotNull(resolveColumn(tr, "b")
--- End diff --

The test case is added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16067#discussion_r90154852
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: 
QueryPlan[PlanType]] extends TreeNode[PlanT
   }
 
   /**
+   * Infer the Attribute-specific IsNotNull constraints from the null 
intolerant child expressions
+   * of constraints.
+   */
+  private def inferIsNotNullConstraints(constraint: Expression): 
Seq[Expression] =
--- End diff --

Can this infer `IsNotNull(a)`, `IsNotNull(b)` from `IsNotNull(a) && 
IsNotNull(b)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...

2016-11-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16067#discussion_r90143591
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1697,6 +1697,12 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   expr = "cast((_1 + _2) as boolean)", expectedNonNullableColumns = 
Seq("_1", "_2"))
   }
 
+  test("SPARK-17897: Attribute is not NullIntolerant") {
--- End diff --

New test case name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org