[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68203451 [Test build #24850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24850/consoleFull) for PR 3778 at commit [`37022d1`](https://github.com/apache/spark/commit/37022d11cfcb9b316cf5cbe48db2ba7fd4b1f918). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68203564 [Test build #24851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24851/consoleFull) for PR 3778 at commit [`527e6ce`](https://github.com/apache/spark/commit/527e6cee23dca18c2f035c772f659394c3c700d5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22297306 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -316,7 +340,29 @@ object BooleanSimplification extends Rule[LogicalPlan] { case (_, Literal(true, BooleanType)) = Literal(true) case (Literal(false, BooleanType), r) = r case (l, Literal(false, BooleanType)) = l - case (_, _) = or + // a || a = a + case (l, r) if l fastEquals r = l + case (_, _) = +val lhsSet = splitConjunctivePredicates(left).toSet +val rhsSet = splitConjunctivePredicates(right).toSet +val common = lhsSet.intersect(rhsSet) +val ldiff = lhsSet.diff(common) +val rdiff = rhsSet.diff(common) +if (common.size == 0) { --- End diff -- to remove this if --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22297307 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -307,7 +309,29 @@ object BooleanSimplification extends Rule[LogicalPlan] { case (l, Literal(true, BooleanType)) = l case (Literal(false, BooleanType), _) = Literal(false) case (_, Literal(false, BooleanType)) = Literal(false) - case (_, _) = and + // a a = a + case (l, r) if l fastEquals r = l + case (_, _) = +val lhsSet = splitDisjunctivePredicates(left).toSet +val rhsSet = splitDisjunctivePredicates(right).toSet +val common = lhsSet.intersect(rhsSet) +val ldiff = lhsSet.diff(common) +val rdiff = rhsSet.diff(common) +if (common.size == 0) { --- End diff -- to remove this if --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68204050 Updated. /cc @liancheng, any comments here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68204877 [Test build #24850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24850/consoleFull) for PR 3778 at commit [`37022d1`](https://github.com/apache/spark/commit/37022d11cfcb9b316cf5cbe48db2ba7fd4b1f918). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68204880 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24850/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68204978 [Test build #24851 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24851/consoleFull) for PR 3778 at commit [`527e6ce`](https://github.com/apache/spark/commit/527e6cee23dca18c2f035c772f659394c3c700d5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68204981 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24851/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68172996 Hi @liancheng, @chenghao-intel, i refactored a clean one, can you have a look at it. I think it is more clean and readable. https://github.com/scwf/spark/compare/apache:master...scwf:refactory-filter?expand=1 If you think it is ok, i will push it to this branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68173060 And @liancheng, can you give me a floating point precision test? i want to make sure this ok with float. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68178485 @scwf Two comments about your new branch: 1. `ConditionSimplification` is actually a synonym of `BooleanSimplification`. The latter also covers simple cases of `And` / `Or` expressions since `And` / `Or` *are* themselves boolean expressions. Thus these two can be naturally merged together. 2. Would like to add that, compared to optimizing cartesian product to equi-join, benefits brought by numeric comparison optimizations are very limited. Even in the best cases, they can only optimize out some O(1) comparisons, which are practically very cheap. Personally I don't think this worth the effort to either introduce an extra library dependency (which increases risks of dependency hell) or write complicated code that implements them (which increases maintenance costs). The low benefit-cost ratio makes I feel hesitant to include them. (As for the floating point precision part, after rethinking this, I feel it's generally safe since we are only casting literal numeric values to double without further computations.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68180010 Ok, get it. After discuss with @liancheng offline, i will change this to cover 4th optimization and some simple numeric comparison, and do not introduce the extra library spire. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68180176 [Test build #24845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24845/consoleFull) for PR 3778 at commit [`546a82b`](https://github.com/apache/spark/commit/546a82bfe463ad7f9ed15f6e026a81373735fa9a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68183126 [Test build #24845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24845/consoleFull) for PR 3778 at commit [`546a82b`](https://github.com/apache/spark/commit/546a82bfe463ad7f9ed15f6e026a81373735fa9a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68183128 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24845/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68140510 Not only, actually this PR cover optimizations as follows: ``` And/Or with same condition a a = a , a a a ... = a a || a = a , a || a || a ... = a one And/Or with conditions that can be merged a 2 a 2 = false, a 3 a 5 = a 5 a 2 || a = 2 = true, a 3 || a 5 = a 3 two And/Or with conditions that can be merged (a 3 b 5) || a 2 = b 5 || a 2 (a 3 || b 5) || a 2 = true (a 2 b 5) a 3 = flase (a 2 || b 5) a 3 = b 5 a 3 more than two And/Or with common conditions (a b c ...) || (a b d ...) || (a b e ...) ... = a b ((c ...) || (d ...) || (e ...) || ...) (a || b || c || ...) (a || b || d || ...) (a || b || e || ...) ... = (a || b) || ((c || ...) (f || ...) (e || ...) ...) ``` hi @liancheng, do you mind i refactory this and refer to your PR to cover all the cases above? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68169507 Actually I'd highly suggest you breaking this PR into at least two self contained PRs, which can be much easier to review and merge. Rule sets 1 and 4 can be merged into one PR, rule sets 2 and 3 into another. Maybe we can remove rules 2 and 3 from this PR after your refactoring and get rule sets 1 and 4 merged first (I realized #3784 doesn't cover all rules in set 4, because the second rule in set 4 doesn't help optimizing cartesian products). The reason why I'm hesitant to include rule sets 2 and 3 is that, for now I don't see a sound yet concise implementation without introducing extra dependencies. Although I proposed the Spark `Interval` solution, I'd rather not introduce Spire. On the other hand, rule sets 1 and 4 have been proven to be both useful and easy to implement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68169633 Would like to add that the solution based on Spire `Interval` I posted above may suffer from floating point precision issue. Thus we might want to cast all integral comparisons to `Interval[Long]` and all fractional comparisons to `Interval[Double]` to fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22274369 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] { } } +object ConditionSimplification extends Rule[LogicalPlan] { + import BinaryComparison.LiteralComparison + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case q: LogicalPlan = q transformExpressionsDown { + case origin: CombinePredicate = +origin.toOptimized +} + } + + type SplitFragments = Map[Expression, Option[Expression]] + + implicit class CombinePredicateExtension(source: CombinePredicate) { +def find(goal: Expression): Boolean = { + def delegate(child: Expression): Boolean = (child, goal) match { +case (combine: CombinePredicate, _) = + isSameCombinePredicate(source, combine) combine.find(goal) + + // if left child is a literal + // LiteralComparison's unapply method change the literal and attribute position +case (LiteralComparison(childComparison), LiteralComparison(goalComparison)) = + isSame(childComparison, goalComparison) + +case other = + isSame(child, goal) + } + + // using method to avoid right side compute if left side is true + val leftResult = () = delegate(source.left) + val rightResult = () = delegate(source.right) + leftResult() || rightResult() +} + +@inline +def isOrPredicate: Boolean = { + source.isInstanceOf[Or] +} + +// create a new combine predicate that has the same combine operator as this +@inline +def build(left: Expression, right: Expression): CombinePredicate = { + CombinePredicate(left, right, isOrPredicate) +} + +// swap left child and right child +@inline +def swap: CombinePredicate = { + source.build(source.right, source.left) +} + +def toOptimized: Expression = source match { + // one CombinePredicate, left equals right , drop right, keep left + // examples: a a = a, a || a = a + case CombinePredicate(left, right) if left.fastEquals(right) = +left + + // one CombinePredicate and left and right are both binary comparison + // examples: a 2 a 2 = false + case origin @ CombinePredicate(LiteralComparison(left), LiteralComparison(right)) = +// left or right maybe change its child position, so rebuild one +val changed = origin.build(left, right) +val optimized = changed.mergeComparison +if (isSame(changed, optimized)) { + origin +} else { + optimized +} + + case origin @ CombinePredicate(left @ CombinePredicate(ll, lr), right) +if isNotCombinePredicate(ll, lr, right) = +val leftOptimized = left.toOptimized +if (isSame(left, leftOptimized)) { + if (isSame(ll, right) || isSame(lr, right)) { +if (isSameCombinePredicate(origin, left)) leftOptimized else right + } else { +val llRight = origin.build(ll, right) +val lrRight = origin.build(lr, right) +val llRightOptimized = llRight.toOptimized +val lrRightOptimized = lrRight.toOptimized +if (isSame(llRight, llRightOptimized) isSame(lrRight, lrRightOptimized)) { + origin +} else if ((isNotCombinePredicate(llRightOptimized, lrRightOptimized)) + || isSameCombinePredicate(origin, left)) { + left.build(llRightOptimized, lrRightOptimized).toOptimized +} else if (llRightOptimized.isLiteral || lrRightOptimized.isLiteral) { + left.build(llRightOptimized, lrRightOptimized) +} else { + origin +} + } +} else if (isNotCombinePredicate(leftOptimized)) { + origin.build(leftOptimized, right).toOptimized +} else { + origin +} + + case origin @ CombinePredicate(left, right @ CombinePredicate(left2, right2)) +if isNotCombinePredicate(left, left2, right2) = +val changed = origin.swap +val optimized = changed.toOptimized +if (isSame(changed, optimized)) { + origin +} else { + optimized +} + + // do optimize like : (a || b || c) a = a, here a, b , c is a condition + case origin @ CombinePredicate(left:
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22276984 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] { } } +object ConditionSimplification extends Rule[LogicalPlan] { + import BinaryComparison.LiteralComparison + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case q: LogicalPlan = q transformExpressionsDown { + case origin: CombinePredicate = +origin.toOptimized +} + } + + type SplitFragments = Map[Expression, Option[Expression]] + + implicit class CombinePredicateExtension(source: CombinePredicate) { +def find(goal: Expression): Boolean = { + def delegate(child: Expression): Boolean = (child, goal) match { +case (combine: CombinePredicate, _) = + isSameCombinePredicate(source, combine) combine.find(goal) + + // if left child is a literal + // LiteralComparison's unapply method change the literal and attribute position +case (LiteralComparison(childComparison), LiteralComparison(goalComparison)) = + isSame(childComparison, goalComparison) + +case other = + isSame(child, goal) + } + + // using method to avoid right side compute if left side is true + val leftResult = () = delegate(source.left) + val rightResult = () = delegate(source.right) + leftResult() || rightResult() +} + +@inline +def isOrPredicate: Boolean = { + source.isInstanceOf[Or] +} + +// create a new combine predicate that has the same combine operator as this +@inline +def build(left: Expression, right: Expression): CombinePredicate = { + CombinePredicate(left, right, isOrPredicate) +} + +// swap left child and right child +@inline +def swap: CombinePredicate = { + source.build(source.right, source.left) +} + +def toOptimized: Expression = source match { + // one CombinePredicate, left equals right , drop right, keep left + // examples: a a = a, a || a = a + case CombinePredicate(left, right) if left.fastEquals(right) = +left + + // one CombinePredicate and left and right are both binary comparison + // examples: a 2 a 2 = false + case origin @ CombinePredicate(LiteralComparison(left), LiteralComparison(right)) = +// left or right maybe change its child position, so rebuild one +val changed = origin.build(left, right) +val optimized = changed.mergeComparison +if (isSame(changed, optimized)) { + origin +} else { + optimized +} + + case origin @ CombinePredicate(left @ CombinePredicate(ll, lr), right) +if isNotCombinePredicate(ll, lr, right) = +val leftOptimized = left.toOptimized +if (isSame(left, leftOptimized)) { + if (isSame(ll, right) || isSame(lr, right)) { +if (isSameCombinePredicate(origin, left)) leftOptimized else right + } else { +val llRight = origin.build(ll, right) +val lrRight = origin.build(lr, right) +val llRightOptimized = llRight.toOptimized +val lrRightOptimized = lrRight.toOptimized +if (isSame(llRight, llRightOptimized) isSame(lrRight, lrRightOptimized)) { + origin +} else if ((isNotCombinePredicate(llRightOptimized, lrRightOptimized)) + || isSameCombinePredicate(origin, left)) { + left.build(llRightOptimized, lrRightOptimized).toOptimized +} else if (llRightOptimized.isLiteral || lrRightOptimized.isLiteral) { + left.build(llRightOptimized, lrRightOptimized) +} else { + origin +} + } +} else if (isNotCombinePredicate(leftOptimized)) { + origin.build(leftOptimized, right).toOptimized +} else { + origin +} + + case origin @ CombinePredicate(left, right @ CombinePredicate(left2, right2)) +if isNotCombinePredicate(left, left2, right2) = +val changed = origin.swap +val optimized = changed.toOptimized +if (isSame(changed, optimized)) { + origin +} else { + optimized +} + + // do optimize like : (a || b || c) a = a, here a, b , c is a condition + case origin @ CombinePredicate(left: CombinePredicate,
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22277093 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -112,7 +112,30 @@ case class InSet(value: Expression, hset: Set[Any]) } } -case class And(left: Expression, right: Expression) extends BinaryPredicate { +abstract class CombinePredicate extends BinaryPredicate { --- End diff -- Since we already have the `implicit` class in `Optimizer`, I don't really think we should make the existed expression more complicated. And probably even the `CombinePredicate` is not necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22277105 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -160,6 +183,49 @@ abstract class BinaryComparison extends BinaryPredicate { self: Product = } +object BinaryComparison { --- End diff -- The same as `CombinePredicate`, we'd better not combine the existed expression with the `Optimizer`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22277118 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] { } } +object ConditionSimplification extends Rule[LogicalPlan] { + import BinaryComparison.LiteralComparison + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case q: LogicalPlan = q transformExpressionsDown { + case origin: CombinePredicate = --- End diff -- Right here, we have the `rules` here, probably we'd better not use the `CombinePredicate` any more, since it's more clear to me if we list every possible `type` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22277164 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] { } } +object ConditionSimplification extends Rule[LogicalPlan] { + import BinaryComparison.LiteralComparison + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case q: LogicalPlan = q transformExpressionsDown { + case origin: CombinePredicate = +origin.toOptimized +} + } + + type SplitFragments = Map[Expression, Option[Expression]] + + implicit class CombinePredicateExtension(source: CombinePredicate) { --- End diff -- Can we refactor the code as more functional style? For example: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L43, The same code pattern is always easier for maintenance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68119210 This is really a very useful optimization, particularly for those SQLs generated by machines. And it would make more senses if we add unit test to reflect the expression optimization will eventually change the behavior of logical plan optimization (Cartesian join == hash join). e.g. in https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala#L62 In the meantime, I think the code can be polished as more functional style (e.g. Remove the implicit classes and avoid using the unnecessary trait etc. ), as you know, most of the optimizing rules in Catalyst are straightforward and brief, which is easier to be maintained and extended for the others. Sorry that I didn't dive into the details. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22277333 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -112,7 +112,30 @@ case class InSet(value: Expression, hset: Set[Any]) } } -case class And(left: Expression, right: Expression) extends BinaryPredicate { +abstract class CombinePredicate extends BinaryPredicate { --- End diff -- you mean we can do this in implicit class in Optimizer, right? my initial idea of adding this is to unify the simplification of and/or and avoid duplicate code.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22277360 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] { } } +object ConditionSimplification extends Rule[LogicalPlan] { + import BinaryComparison.LiteralComparison + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case q: LogicalPlan = q transformExpressionsDown { + case origin: CombinePredicate = +origin.toOptimized +} + } + + type SplitFragments = Map[Expression, Option[Expression]] + + implicit class CombinePredicateExtension(source: CombinePredicate) { --- End diff -- definitely should refactor here:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22277573 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -112,7 +112,30 @@ case class InSet(value: Expression, hset: Set[Any]) } } -case class And(left: Expression, right: Expression) extends BinaryPredicate { +abstract class CombinePredicate extends BinaryPredicate { --- End diff -- Sorry for the confusing, it's just a general comment! It probably saves some code in `Optimizer` if we making a general parent type here, however, we'd better not change any existed expression JUST for the `Optimizer`, and, it's more important to list all of the possible type combination in `Optimizer`, so people can get straightforward feeling on how to maintain / extend it in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68121946 @scwf Would you mind to list all the optimizations in the PR description first? Some more concise examples coupled with each optimization can be really helpful. Then we can help decoupling and polishing each optimization rule individually. It's really hard to grasp your intention from the provided complex query. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68123618 For numeric comparison optimizations, did some experiments along my former double interval comparison idea and came up with the following snippet, I haven't even compiled it yet, but it shows the general idea: ```scala private implicit class NumericLiteral(e: Literal) { def toDouble = Cast(e, DoubleType).eval().asInstanceOf[Double] } object LiteralBinaryComparison { def unapply(e: Expression): Option[(NamedExpression, Interval[Double])] = e match { case LessThan(n: NamedExpression, l @ Literal(_, _: NumericType)) = Some(n, Interval.below(l.toDouble)) case LessThan(l @ Literal(_, _: NumericType), n: NamedExpression) = Some(n, Interval.atOrAbove(l.toDouble)) case GreaterThan(n: NamedExpression, l @ Literal(_, _: NumericType)) = Some(n, Interval.above(l.toDouble)) case GreaterThan(l @ Literal(_, dt: NumericType), n: NamedExpression) = Some(n, Interval.atOrBelow(l.toDouble)) case LessThanOrEqual(n: NamedExpression, l @ Literal(_, _: NumericType)) = Some(n, Interval.atOrBelow(l.toDouble)) case LessThanOrEqual(l @ Literal(_, _: NumericType), n: NamedExpression) = Some(n, Interval.above(l.toDouble)) case GreaterThanOrEqual(n: NamedExpression, l @ Literal(_, _: NumericType)) = Some(n, Interval.atOrAbove(l.toDouble)) case GreaterThanOrEqual(l @ Literal(_, _: NumericType), n: NamedExpression) = Some(n, Interval.below(l.toDouble)) case EqualTo(n: NamedExpression, l @ Literal(_, _: NumericType)) = Some(n, Interval.point(l.toDouble)) } } def simplify(e: Expression): Expression = e transform { case and @ And( e1 @ LiteralBinaryComparison(n1, i1), e2 @ LiteralBinaryComparison(n2, i2)) if n1 == n2 = if (i1.intersect(i2).isEmpty) Literal(false) else if (i1.isSubsetOf(i2)) e1 else if (i1.isSupersetOf(i2)) e2 else and case or @ Or( e1 @ LiteralBinaryComparison(n1, i1), e2 @ LiteralBinaryComparison(n2, i2)) if n1 == n2 = if (i1.union(i2) == Interval.all[Double]) Literal(true) else if (i1.isSubsetOf(i2)) e2 else if (i1.isSupersetOf(i2)) e1 else or } ``` The interval utility comes from Spire, which needs to be added as a compile time dependency of Catalyst. The simplify method can be merged into `BooleanSimplification`. I guess this together with #3784 grasp all optimizations introduced in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68035013 [Test build #24773 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24773/consoleFull) for PR 3778 at commit [`8c0316f`](https://github.com/apache/spark/commit/8c0316f8454f0ac8268f98d9a4c9cc29baedbf5b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class CombinePredicate extends BinaryPredicate ` * `case class And(left: Expression, right: Expression) extends CombinePredicate ` * `case class Or(left: Expression, right: Expression) extends CombinePredicate ` * ` implicit class CombinePredicateExtension(source: CombinePredicate) ` * ` implicit class ExpressionCookies(expression: Expression) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68035016 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24773/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68084460 [Test build #24804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24804/consoleFull) for PR 3778 at commit [`8733027`](https://github.com/apache/spark/commit/8733027a544c9a89da123c37d9eaba98b62199ad). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68086250 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24804/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68086248 [Test build #24804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24804/consoleFull) for PR 3778 at commit [`8733027`](https://github.com/apache/spark/commit/8733027a544c9a89da123c37d9eaba98b62199ad). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class CombinePredicate extends BinaryPredicate ` * `case class And(left: Expression, right: Expression) extends CombinePredicate ` * `case class Or(left: Expression, right: Expression) extends CombinePredicate ` * ` implicit class CombinePredicateExtension(source: CombinePredicate) ` * ` implicit class ExpressionCookies(expression: Expression) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68029380 @scwf These optimizations are useful, particularly the one that eliminates common predicates. Thanks for bringing them up! However, the implementation in this PR is really over complicated... I opened #3784 for a simpler version. For the numeric comparisons, I think we can first cast all numeric comparisons to `Double` type and derive a `Double` range from each predicate and then eliminate unnecessary comparisons by comparing ranges. `Double` is preferred because `Double.PositiveInfinity` and `Double.NegativeInfinity` can be handy when dealing with predicates like `a 10` and `a 100`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68031794 [Test build #24768 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24768/consoleFull) for PR 3778 at commit [`f1a487f`](https://github.com/apache/spark/commit/f1a487f55797086f39a70536547c963854183858). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68032736 [Test build #24773 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24773/consoleFull) for PR 3778 at commit [`8c0316f`](https://github.com/apache/spark/commit/8c0316f8454f0ac8268f98d9a4c9cc29baedbf5b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68033864 [Test build #24768 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24768/consoleFull) for PR 3778 at commit [`f1a487f`](https://github.com/apache/spark/commit/f1a487f55797086f39a70536547c963854183858). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class CombinePredicate extends BinaryPredicate ` * `case class And(left: Expression, right: Expression) extends CombinePredicate ` * `case class Or(left: Expression, right: Expression) extends CombinePredicate ` * ` implicit class CombinePredicateExtension(source: CombinePredicate) ` * ` implicit class ExpressionCookies(expression: Expression) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3778#issuecomment-68033865 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24768/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org