[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16719


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98363002
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 ---
@@ -78,13 +78,11 @@ object Canonicalize extends {
 case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
 case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
 
-case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThan(r, l)
+// Note in the following `NOT` cases, `l.hashCode() <= r.hashCode()` 
holds. The reason is that
+// canonicalization is conducted bottom-up -- see 
[[Expression.canonicalized]].
--- End diff --

To the other reviewers, this PR added test cases in 
`ExpressionSetSuite.scala` to ensure it. Thus, it is safe to clean the codes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98360513
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala
 ---
@@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
 
   val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
 
+  // An [AttributeReference] with almost the maximum hashcode, to make 
testing canonicalize rules
+  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => 
GreaterThan(r, l)` easier
+  val maxHash =
+Canonicalize.ignoreNamesTypes(
+  AttributeReference("maxHash", IntegerType)(exprId =
+new ExprId(4, NamedExpression.jvmId) {
+  // maxHash's hashcode is calculated based on this exprId's 
hashcode, so we set this
+  // exprId's hashCode to this specific value to make sure 
maxHash's hashcode is almost
+  // `Int.MaxValue`
+  override def hashCode: Int = 826929706
--- End diff --

Great!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-29 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98347229
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 ---
@@ -78,14 +78,18 @@ object Canonicalize extends {
 case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
 case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
 
-case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThan(r, l)
-case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r)
-case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, 
l)
-case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r)
-case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
-case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
-case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
-case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
+case Not(GreaterThan(l, r)) =>
+  assert(l.hashCode() <= r.hashCode())
--- End diff --

sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98347203
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 ---
@@ -78,14 +78,18 @@ object Canonicalize extends {
 case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
 case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
 
-case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThan(r, l)
-case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r)
-case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, 
l)
-case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r)
-case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
-case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
-case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
-case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
+case Not(GreaterThan(l, r)) =>
+  assert(l.hashCode() <= r.hashCode())
--- End diff --

I think we can remove `assert`, because the test cases already cover the 
scenario. You can add a comment to explain. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98346999
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala
 ---
@@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
 
   val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
 
+  // An [AttributeReference] with almost the maximum hashcode, to make 
testing canonicalize rules
+  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => 
GreaterThan(r, l)` easier
+  val maxHash =
+Canonicalize.ignoreNamesTypes(
+  AttributeReference("maxHash", IntegerType)(exprId =
+new ExprId(4, NamedExpression.jvmId) {
+  // maxHash's hashcode is calculated based on this exprId's 
hashcode, so we set this
+  // exprId's hashCode to this specific value to make sure 
maxHash's hashcode is almost
+  // `Int.MaxValue`
+  override def hashCode: Int = 826929706
+  // We are implementing this equals() only because the 
style-checking rule "you should
+  // implement equals and hashCode together" requires us to
+  override def equals(obj: Any): Boolean = super.equals(obj)
+})).asInstanceOf[AttributeReference]
+  assert(maxHash.hashCode() == Int.MaxValue - 1)
+
+  // An [AttributeReference] with almost the minimum hashcode, to make 
testing canonicalize rules
+  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => 
GreaterThan(r, l)` easier
+  val minHash =
+Canonicalize.ignoreNamesTypes(
+  AttributeReference("minHash", IntegerType)(exprId =
+new ExprId(5, NamedExpression.jvmId) {
+  // minHash's hashcode is calculated based on this exprId's 
hashcode, so we set this
+  // exprId's hashCode to this specific value to make sure 
minHash's hashcode is almost
+  // `Int.MinValue`
+  override def hashCode: Int = 826929707
--- End diff --

updated, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98346981
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala
 ---
@@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
 
   val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
 
+  // An [AttributeReference] with almost the maximum hashcode, to make 
testing canonicalize rules
+  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => 
GreaterThan(r, l)` easier
+  val maxHash =
+Canonicalize.ignoreNamesTypes(
+  AttributeReference("maxHash", IntegerType)(exprId =
+new ExprId(4, NamedExpression.jvmId) {
+  // maxHash's hashcode is calculated based on this exprId's 
hashcode, so we set this
+  // exprId's hashCode to this specific value to make sure 
maxHash's hashcode is almost
+  // `Int.MaxValue`
+  override def hashCode: Int = 826929706
--- End diff --

ah, `-1030353449` works great! let me push a commit updating this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98346741
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala
 ---
@@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
 
   val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
 
+  // An [AttributeReference] with almost the maximum hashcode, to make 
testing canonicalize rules
+  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => 
GreaterThan(r, l)` easier
+  val maxHash =
+Canonicalize.ignoreNamesTypes(
+  AttributeReference("maxHash", IntegerType)(exprId =
+new ExprId(4, NamedExpression.jvmId) {
+  // maxHash's hashcode is calculated based on this exprId's 
hashcode, so we set this
+  // exprId's hashCode to this specific value to make sure 
maxHash's hashcode is almost
+  // `Int.MaxValue`
+  override def hashCode: Int = 826929706
+  // We are implementing this equals() only because the 
style-checking rule "you should
+  // implement equals and hashCode together" requires us to
+  override def equals(obj: Any): Boolean = super.equals(obj)
+})).asInstanceOf[AttributeReference]
+  assert(maxHash.hashCode() == Int.MaxValue - 1)
+
+  // An [AttributeReference] with almost the minimum hashcode, to make 
testing canonicalize rules
+  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => 
GreaterThan(r, l)` easier
+  val minHash =
+Canonicalize.ignoreNamesTypes(
+  AttributeReference("minHash", IntegerType)(exprId =
+new ExprId(5, NamedExpression.jvmId) {
+  // minHash's hashcode is calculated based on this exprId's 
hashcode, so we set this
+  // exprId's hashCode to this specific value to make sure 
minHash's hashcode is almost
+  // `Int.MinValue`
+  override def hashCode: Int = 826929707
--- End diff --

To make `minHash.hashCode()` equal to `Int.MinValue`, you can set it to 
`1407330692`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98346688
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala
 ---
@@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
 
   val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
 
+  // An [AttributeReference] with almost the maximum hashcode, to make 
testing canonicalize rules
+  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => 
GreaterThan(r, l)` easier
+  val maxHash =
+Canonicalize.ignoreNamesTypes(
+  AttributeReference("maxHash", IntegerType)(exprId =
+new ExprId(4, NamedExpression.jvmId) {
+  // maxHash's hashcode is calculated based on this exprId's 
hashcode, so we set this
+  // exprId's hashCode to this specific value to make sure 
maxHash's hashcode is almost
+  // `Int.MaxValue`
+  override def hashCode: Int = 826929706
--- End diff --

uh, I did not read the comment carefully. Thanks for the explanation. 
You can set it to `-1030353449`. Then, `maxHash.hashCode()` will be equal 
to `Int.MaxValue`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98346453
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala
 ---
@@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
 
   val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
 
+  // An [AttributeReference] with almost the maximum hashcode, to make 
testing canonicalize rules
+  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => 
GreaterThan(r, l)` easier
+  val maxHash =
+Canonicalize.ignoreNamesTypes(
+  AttributeReference("maxHash", IntegerType)(exprId =
+new ExprId(4, NamedExpression.jvmId) {
+  // maxHash's hashcode is calculated based on this exprId's 
hashcode, so we set this
+  // exprId's hashCode to this specific value to make sure 
maxHash's hashcode is almost
+  // `Int.MaxValue`
+  override def hashCode: Int = 826929706
--- End diff --

thanks.

the reason is in 
[Canonicalize.scala#ignoreNamesTypes](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala#L41),
 we're making copies of `e` (`maxHash` in this case):
```scala
  private def ignoreNamesTypes(e: Expression): Expression = e match {
case a: AttributeReference =>
  AttributeReference("none", a.dataType.asNullable)(exprId = a.exprId)
case _ => e
  }
```
so, even if we `override def hashCode: Int = Int.MaxValue` on `maxHash`, it 
has nothing to do with the copy's hashcode.

then i took a step back -- defined `exprId`'s hashcode to a specific value 
(as provided in this patch), which would further affect the copied 
attribute-reference's hashcode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98346104
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 ---
@@ -78,14 +78,18 @@ object Canonicalize extends {
 case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
 case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
 
-case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThan(r, l)
--- End diff --

uh. Just saw the above comment from @dongjoon-hyun . Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98346062
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala
 ---
@@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
 
   val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
 
+  // An [AttributeReference] with almost the maximum hashcode, to make 
testing canonicalize rules
+  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => 
GreaterThan(r, l)` easier
+  val maxHash =
+Canonicalize.ignoreNamesTypes(
+  AttributeReference("maxHash", IntegerType)(exprId =
+new ExprId(4, NamedExpression.jvmId) {
+  // maxHash's hashcode is calculated based on this exprId's 
hashcode, so we set this
+  // exprId's hashCode to this specific value to make sure 
maxHash's hashcode is almost
+  // `Int.MaxValue`
+  override def hashCode: Int = 826929706
--- End diff --

Why not `override def hashCode: Int = Int.MaxValue`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98346066
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala
 ---
@@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
 
   val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
 
+  // An [AttributeReference] with almost the maximum hashcode, to make 
testing canonicalize rules
+  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => 
GreaterThan(r, l)` easier
+  val maxHash =
+Canonicalize.ignoreNamesTypes(
+  AttributeReference("maxHash", IntegerType)(exprId =
+new ExprId(4, NamedExpression.jvmId) {
+  // maxHash's hashcode is calculated based on this exprId's 
hashcode, so we set this
+  // exprId's hashCode to this specific value to make sure 
maxHash's hashcode is almost
+  // `Int.MaxValue`
+  override def hashCode: Int = 826929706
+  // We are implementing this equals() only because the 
style-checking rule "you should
+  // implement equals and hashCode together" requires us to
+  override def equals(obj: Any): Boolean = super.equals(obj)
+})).asInstanceOf[AttributeReference]
+  assert(maxHash.hashCode() == Int.MaxValue - 1)
+
+  // An [AttributeReference] with almost the minimum hashcode, to make 
testing canonicalize rules
+  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => 
GreaterThan(r, l)` easier
+  val minHash =
+Canonicalize.ignoreNamesTypes(
+  AttributeReference("minHash", IntegerType)(exprId =
+new ExprId(5, NamedExpression.jvmId) {
+  // minHash's hashcode is calculated based on this exprId's 
hashcode, so we set this
+  // exprId's hashCode to this specific value to make sure 
minHash's hashcode is almost
+  // `Int.MinValue`
+  override def hashCode: Int = 826929707
--- End diff --

Why not `override def hashCode: Int = Int.MinValue`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98345869
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 ---
@@ -78,14 +78,18 @@ object Canonicalize extends {
 case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
 case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
 
-case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThan(r, l)
-case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r)
-case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, 
l)
-case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r)
-case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
-case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
-case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
-case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
+case Not(GreaterThan(l, r)) =>
+  assert(l.hashCode() <= r.hashCode())
--- End diff --

It should be fine to get rid of `assert`, as long as we add the code 
comments and the needed test cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98345847
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 ---
@@ -78,14 +78,18 @@ object Canonicalize extends {
 case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
 case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
 
-case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThan(r, l)
--- End diff --

This is a dead code, because our canonicalization order is bottom up, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-27 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98322977
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 ---
@@ -78,14 +78,18 @@ object Canonicalize extends {
 case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
 case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
 
-case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThan(r, l)
-case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r)
-case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, 
l)
-case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r)
-case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
-case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
-case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
-case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
+case Not(GreaterThan(l, r)) =>
+  assert(l.hashCode() <= r.hashCode())
--- End diff --

thanks! maybe an alternative way is to add comments saying it's guaranteed 
that `l.hashcode <= r.hashcode`, otherwise people might wonder why there is no 
`case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode()` at their first 
glance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-27 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98322886
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala
 ---
@@ -75,10 +107,14 @@ class ExpressionSetSuite extends SparkFunSuite {
   setTest(1, aUpper >= bUpper, bUpper <= aUpper)
 
   // `Not` canonicalization
-  setTest(1, Not(aUpper > 1), aUpper <= 1, Not(Literal(1) < aUpper), 
Literal(1) >= aUpper)
-  setTest(1, Not(aUpper < 1), aUpper >= 1, Not(Literal(1) > aUpper), 
Literal(1) <= aUpper)
-  setTest(1, Not(aUpper >= 1), aUpper < 1, Not(Literal(1) <= aUpper), 
Literal(1) > aUpper)
-  setTest(1, Not(aUpper <= 1), aUpper > 1, Not(Literal(1) >= aUpper), 
Literal(1) < aUpper)
+  setTest(1, Not(maxHash > 1), maxHash <= 1, Not(Literal(1) < maxHash), 
Literal(1) >= maxHash)
+  setTest(1, Not(minHash > 1), minHash <= 1, Not(Literal(1) < minHash), 
Literal(1) >= minHash)
+  setTest(1, Not(maxHash < 1), maxHash >= 1, Not(Literal(1) > maxHash), 
Literal(1) <= maxHash)
+  setTest(1, Not(minHash < 1), minHash >= 1, Not(Literal(1) > minHash), 
Literal(1) <= minHash)
+  setTest(1, Not(maxHash >= 1), maxHash < 1, Not(Literal(1) <= maxHash), 
Literal(1) > maxHash)
+  setTest(1, Not(minHash >= 1), minHash < 1, Not(Literal(1) <= minHash), 
Literal(1) > minHash)
+  setTest(1, Not(maxHash <= 1), maxHash > 1, Not(Literal(1) >= maxHash), 
Literal(1) < maxHash)
+  setTest(1, Not(minHash <= 1), minHash > 1, Not(Literal(1) >= minHash), 
Literal(1) < minHash)
--- End diff --

yea sure they are covered correctly even prior to this patch's changes!

the previous `aUpper`'hashcode is either greater than or less than `1`'s 
hashcode but can not be both, while this change aims to test both cases -- but 
I'm quite open to revert the changes if they are considered unnecessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98280054
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 ---
@@ -78,14 +78,18 @@ object Canonicalize extends {
 case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
 case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
 
-case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThan(r, l)
-case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r)
-case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, 
l)
-case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r)
-case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => 
GreaterThanOrEqual(r, l)
-case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
-case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => 
LessThanOrEqual(r, l)
-case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
+case Not(GreaterThan(l, r)) =>
+  assert(l.hashCode() <= r.hashCode())
--- End diff --

Can we remove these `assert`s? It seems to be verified with your test cases 
now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16719#discussion_r98278557
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala
 ---
@@ -75,10 +107,14 @@ class ExpressionSetSuite extends SparkFunSuite {
   setTest(1, aUpper >= bUpper, bUpper <= aUpper)
 
   // `Not` canonicalization
-  setTest(1, Not(aUpper > 1), aUpper <= 1, Not(Literal(1) < aUpper), 
Literal(1) >= aUpper)
-  setTest(1, Not(aUpper < 1), aUpper >= 1, Not(Literal(1) > aUpper), 
Literal(1) <= aUpper)
-  setTest(1, Not(aUpper >= 1), aUpper < 1, Not(Literal(1) <= aUpper), 
Literal(1) > aUpper)
-  setTest(1, Not(aUpper <= 1), aUpper > 1, Not(Literal(1) >= aUpper), 
Literal(1) < aUpper)
+  setTest(1, Not(maxHash > 1), maxHash <= 1, Not(Literal(1) < maxHash), 
Literal(1) >= maxHash)
+  setTest(1, Not(minHash > 1), minHash <= 1, Not(Literal(1) < minHash), 
Literal(1) >= minHash)
+  setTest(1, Not(maxHash < 1), maxHash >= 1, Not(Literal(1) > maxHash), 
Literal(1) <= maxHash)
+  setTest(1, Not(minHash < 1), minHash >= 1, Not(Literal(1) > minHash), 
Literal(1) <= minHash)
+  setTest(1, Not(maxHash >= 1), maxHash < 1, Not(Literal(1) <= maxHash), 
Literal(1) > maxHash)
+  setTest(1, Not(minHash >= 1), minHash < 1, Not(Literal(1) <= minHash), 
Literal(1) > minHash)
+  setTest(1, Not(maxHash <= 1), maxHash > 1, Not(Literal(1) >= maxHash), 
Literal(1) < maxHash)
+  setTest(1, Not(minHash <= 1), minHash > 1, Not(Literal(1) >= minHash), 
Literal(1) < minHash)
--- End diff --

These test cases are covered previously correctly. Actually, this PR 
simplifies the logics only. Am I right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

2017-01-27 Thread lw-lin
GitHub user lw-lin opened a pull request:

https://github.com/apache/spark/pull/16719

[SPARK-19385][SQL] During canonicalization, `NOT(l, r)` should not expect 
such cases that l.hashcode > r.hashcode

## What changes were proposed in this pull request?

During canonicalization, `NOT(l, r)` should not expect such cases that 
`l.hashcode > r.hashcode`.

Take the rule `case NOT(Greater(l, r)) if l.hashcode > r.hashcode` for 
example, it should never be matched since `Greater(l, r)` itself would be 
re-written as `Greater(r, l)` after canonicalization given `l.hashcode > 
r.hashcode`.

## How was this patch tested?

This patch expanded the `NOT` test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lw-lin/spark canonicalize

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16719.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16719


commit da0b98c7b255819ca7c24d32209ff5803ce46eab
Author: Liwei Lin 
Date:   2017-01-22T09:05:39Z

Fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org