[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16719 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98363002 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -78,13 +78,11 @@ object Canonicalize extends { case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l) +// Note in the following `NOT` cases, `l.hashCode() <= r.hashCode()` holds. The reason is that +// canonicalization is conducted bottom-up -- see [[Expression.canonicalized]]. --- End diff -- To the other reviewers, this PR added test cases in `ExpressionSetSuite.scala` to ensure it. Thus, it is safe to clean the codes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98360513 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite { val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil) + // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules + // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier + val maxHash = +Canonicalize.ignoreNamesTypes( + AttributeReference("maxHash", IntegerType)(exprId = +new ExprId(4, NamedExpression.jvmId) { + // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this + // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost + // `Int.MaxValue` + override def hashCode: Int = 826929706 --- End diff -- Great! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98347229 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -78,14 +78,18 @@ object Canonicalize extends { case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l) -case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r) -case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, l) -case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r) -case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r) -case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) -case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r) +case Not(GreaterThan(l, r)) => + assert(l.hashCode() <= r.hashCode()) --- End diff -- sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98347203 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -78,14 +78,18 @@ object Canonicalize extends { case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l) -case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r) -case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, l) -case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r) -case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r) -case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) -case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r) +case Not(GreaterThan(l, r)) => + assert(l.hashCode() <= r.hashCode()) --- End diff -- I think we can remove `assert`, because the test cases already cover the scenario. You can add a comment to explain. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98346999 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite { val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil) + // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules + // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier + val maxHash = +Canonicalize.ignoreNamesTypes( + AttributeReference("maxHash", IntegerType)(exprId = +new ExprId(4, NamedExpression.jvmId) { + // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this + // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost + // `Int.MaxValue` + override def hashCode: Int = 826929706 + // We are implementing this equals() only because the style-checking rule "you should + // implement equals and hashCode together" requires us to + override def equals(obj: Any): Boolean = super.equals(obj) +})).asInstanceOf[AttributeReference] + assert(maxHash.hashCode() == Int.MaxValue - 1) + + // An [AttributeReference] with almost the minimum hashcode, to make testing canonicalize rules + // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier + val minHash = +Canonicalize.ignoreNamesTypes( + AttributeReference("minHash", IntegerType)(exprId = +new ExprId(5, NamedExpression.jvmId) { + // minHash's hashcode is calculated based on this exprId's hashcode, so we set this + // exprId's hashCode to this specific value to make sure minHash's hashcode is almost + // `Int.MinValue` + override def hashCode: Int = 826929707 --- End diff -- updated, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98346981 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite { val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil) + // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules + // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier + val maxHash = +Canonicalize.ignoreNamesTypes( + AttributeReference("maxHash", IntegerType)(exprId = +new ExprId(4, NamedExpression.jvmId) { + // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this + // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost + // `Int.MaxValue` + override def hashCode: Int = 826929706 --- End diff -- ah, `-1030353449` works great! let me push a commit updating this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98346741 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite { val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil) + // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules + // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier + val maxHash = +Canonicalize.ignoreNamesTypes( + AttributeReference("maxHash", IntegerType)(exprId = +new ExprId(4, NamedExpression.jvmId) { + // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this + // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost + // `Int.MaxValue` + override def hashCode: Int = 826929706 + // We are implementing this equals() only because the style-checking rule "you should + // implement equals and hashCode together" requires us to + override def equals(obj: Any): Boolean = super.equals(obj) +})).asInstanceOf[AttributeReference] + assert(maxHash.hashCode() == Int.MaxValue - 1) + + // An [AttributeReference] with almost the minimum hashcode, to make testing canonicalize rules + // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier + val minHash = +Canonicalize.ignoreNamesTypes( + AttributeReference("minHash", IntegerType)(exprId = +new ExprId(5, NamedExpression.jvmId) { + // minHash's hashcode is calculated based on this exprId's hashcode, so we set this + // exprId's hashCode to this specific value to make sure minHash's hashcode is almost + // `Int.MinValue` + override def hashCode: Int = 826929707 --- End diff -- To make `minHash.hashCode()` equal to `Int.MinValue`, you can set it to `1407330692` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98346688 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite { val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil) + // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules + // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier + val maxHash = +Canonicalize.ignoreNamesTypes( + AttributeReference("maxHash", IntegerType)(exprId = +new ExprId(4, NamedExpression.jvmId) { + // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this + // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost + // `Int.MaxValue` + override def hashCode: Int = 826929706 --- End diff -- uh, I did not read the comment carefully. Thanks for the explanation. You can set it to `-1030353449`. Then, `maxHash.hashCode()` will be equal to `Int.MaxValue` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98346453 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite { val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil) + // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules + // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier + val maxHash = +Canonicalize.ignoreNamesTypes( + AttributeReference("maxHash", IntegerType)(exprId = +new ExprId(4, NamedExpression.jvmId) { + // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this + // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost + // `Int.MaxValue` + override def hashCode: Int = 826929706 --- End diff -- thanks. the reason is in [Canonicalize.scala#ignoreNamesTypes](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala#L41), we're making copies of `e` (`maxHash` in this case): ```scala private def ignoreNamesTypes(e: Expression): Expression = e match { case a: AttributeReference => AttributeReference("none", a.dataType.asNullable)(exprId = a.exprId) case _ => e } ``` so, even if we `override def hashCode: Int = Int.MaxValue` on `maxHash`, it has nothing to do with the copy's hashcode. then i took a step back -- defined `exprId`'s hashcode to a specific value (as provided in this patch), which would further affect the copied attribute-reference's hashcode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98346104 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -78,14 +78,18 @@ object Canonicalize extends { case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l) --- End diff -- uh. Just saw the above comment from @dongjoon-hyun . Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98346062 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite { val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil) + // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules + // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier + val maxHash = +Canonicalize.ignoreNamesTypes( + AttributeReference("maxHash", IntegerType)(exprId = +new ExprId(4, NamedExpression.jvmId) { + // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this + // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost + // `Int.MaxValue` + override def hashCode: Int = 826929706 --- End diff -- Why not `override def hashCode: Int = Int.MaxValue`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98346066 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite { val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil) + // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules + // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier + val maxHash = +Canonicalize.ignoreNamesTypes( + AttributeReference("maxHash", IntegerType)(exprId = +new ExprId(4, NamedExpression.jvmId) { + // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this + // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost + // `Int.MaxValue` + override def hashCode: Int = 826929706 + // We are implementing this equals() only because the style-checking rule "you should + // implement equals and hashCode together" requires us to + override def equals(obj: Any): Boolean = super.equals(obj) +})).asInstanceOf[AttributeReference] + assert(maxHash.hashCode() == Int.MaxValue - 1) + + // An [AttributeReference] with almost the minimum hashcode, to make testing canonicalize rules + // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier + val minHash = +Canonicalize.ignoreNamesTypes( + AttributeReference("minHash", IntegerType)(exprId = +new ExprId(5, NamedExpression.jvmId) { + // minHash's hashcode is calculated based on this exprId's hashcode, so we set this + // exprId's hashCode to this specific value to make sure minHash's hashcode is almost + // `Int.MinValue` + override def hashCode: Int = 826929707 --- End diff -- Why not `override def hashCode: Int = Int.MinValue`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98345869 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -78,14 +78,18 @@ object Canonicalize extends { case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l) -case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r) -case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, l) -case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r) -case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r) -case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) -case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r) +case Not(GreaterThan(l, r)) => + assert(l.hashCode() <= r.hashCode()) --- End diff -- It should be fine to get rid of `assert`, as long as we add the code comments and the needed test cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98345847 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -78,14 +78,18 @@ object Canonicalize extends { case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l) --- End diff -- This is a dead code, because our canonicalization order is bottom up, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98322977 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -78,14 +78,18 @@ object Canonicalize extends { case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l) -case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r) -case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, l) -case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r) -case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r) -case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) -case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r) +case Not(GreaterThan(l, r)) => + assert(l.hashCode() <= r.hashCode()) --- End diff -- thanks! maybe an alternative way is to add comments saying it's guaranteed that `l.hashcode <= r.hashcode`, otherwise people might wonder why there is no `case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode()` at their first glance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98322886 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -75,10 +107,14 @@ class ExpressionSetSuite extends SparkFunSuite { setTest(1, aUpper >= bUpper, bUpper <= aUpper) // `Not` canonicalization - setTest(1, Not(aUpper > 1), aUpper <= 1, Not(Literal(1) < aUpper), Literal(1) >= aUpper) - setTest(1, Not(aUpper < 1), aUpper >= 1, Not(Literal(1) > aUpper), Literal(1) <= aUpper) - setTest(1, Not(aUpper >= 1), aUpper < 1, Not(Literal(1) <= aUpper), Literal(1) > aUpper) - setTest(1, Not(aUpper <= 1), aUpper > 1, Not(Literal(1) >= aUpper), Literal(1) < aUpper) + setTest(1, Not(maxHash > 1), maxHash <= 1, Not(Literal(1) < maxHash), Literal(1) >= maxHash) + setTest(1, Not(minHash > 1), minHash <= 1, Not(Literal(1) < minHash), Literal(1) >= minHash) + setTest(1, Not(maxHash < 1), maxHash >= 1, Not(Literal(1) > maxHash), Literal(1) <= maxHash) + setTest(1, Not(minHash < 1), minHash >= 1, Not(Literal(1) > minHash), Literal(1) <= minHash) + setTest(1, Not(maxHash >= 1), maxHash < 1, Not(Literal(1) <= maxHash), Literal(1) > maxHash) + setTest(1, Not(minHash >= 1), minHash < 1, Not(Literal(1) <= minHash), Literal(1) > minHash) + setTest(1, Not(maxHash <= 1), maxHash > 1, Not(Literal(1) >= maxHash), Literal(1) < maxHash) + setTest(1, Not(minHash <= 1), minHash > 1, Not(Literal(1) >= minHash), Literal(1) < minHash) --- End diff -- yea sure they are covered correctly even prior to this patch's changes! the previous `aUpper`'hashcode is either greater than or less than `1`'s hashcode but can not be both, while this change aims to test both cases -- but I'm quite open to revert the changes if they are considered unnecessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98280054 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -78,14 +78,18 @@ object Canonicalize extends { case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l) -case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r) -case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, l) -case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r) -case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r) -case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) -case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r) +case Not(GreaterThan(l, r)) => + assert(l.hashCode() <= r.hashCode()) --- End diff -- Can we remove these `assert`s? It seems to be verified with your test cases now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98278557 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -75,10 +107,14 @@ class ExpressionSetSuite extends SparkFunSuite { setTest(1, aUpper >= bUpper, bUpper <= aUpper) // `Not` canonicalization - setTest(1, Not(aUpper > 1), aUpper <= 1, Not(Literal(1) < aUpper), Literal(1) >= aUpper) - setTest(1, Not(aUpper < 1), aUpper >= 1, Not(Literal(1) > aUpper), Literal(1) <= aUpper) - setTest(1, Not(aUpper >= 1), aUpper < 1, Not(Literal(1) <= aUpper), Literal(1) > aUpper) - setTest(1, Not(aUpper <= 1), aUpper > 1, Not(Literal(1) >= aUpper), Literal(1) < aUpper) + setTest(1, Not(maxHash > 1), maxHash <= 1, Not(Literal(1) < maxHash), Literal(1) >= maxHash) + setTest(1, Not(minHash > 1), minHash <= 1, Not(Literal(1) < minHash), Literal(1) >= minHash) + setTest(1, Not(maxHash < 1), maxHash >= 1, Not(Literal(1) > maxHash), Literal(1) <= maxHash) + setTest(1, Not(minHash < 1), minHash >= 1, Not(Literal(1) > minHash), Literal(1) <= minHash) + setTest(1, Not(maxHash >= 1), maxHash < 1, Not(Literal(1) <= maxHash), Literal(1) > maxHash) + setTest(1, Not(minHash >= 1), minHash < 1, Not(Literal(1) <= minHash), Literal(1) > minHash) + setTest(1, Not(maxHash <= 1), maxHash > 1, Not(Literal(1) >= maxHash), Literal(1) < maxHash) + setTest(1, Not(minHash <= 1), minHash > 1, Not(Literal(1) >= minHash), Literal(1) < minHash) --- End diff -- These test cases are covered previously correctly. Actually, this PR simplifies the logics only. Am I right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/16719 [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` should not expect such cases that l.hashcode > r.hashcode ## What changes were proposed in this pull request? During canonicalization, `NOT(l, r)` should not expect such cases that `l.hashcode > r.hashcode`. Take the rule `case NOT(Greater(l, r)) if l.hashcode > r.hashcode` for example, it should never be matched since `Greater(l, r)` itself would be re-written as `Greater(r, l)` after canonicalization given `l.hashcode > r.hashcode`. ## How was this patch tested? This patch expanded the `NOT` test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lw-lin/spark canonicalize Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16719.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16719 commit da0b98c7b255819ca7c24d32209ff5803ce46eab Author: Liwei LinDate: 2017-01-22T09:05:39Z Fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org