subject:"\[GitHub\] spark pull request\: \[WIP\]\[SPARK\-4937\]\[SQL\] Adding optimization to ..."

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68203451
  
  [Test build #24850 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24850/consoleFull)
 for   PR 3778 at commit 
[`37022d1`](https://github.com/apache/spark/commit/37022d11cfcb9b316cf5cbe48db2ba7fd4b1f918).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68203564
  
  [Test build #24851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24851/consoleFull)
 for   PR 3778 at commit 
[`527e6ce`](https://github.com/apache/spark/commit/527e6cee23dca18c2f035c772f659394c3c700d5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22297306
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -316,7 +340,29 @@ object BooleanSimplification extends Rule[LogicalPlan] 
{
   case (_, Literal(true, BooleanType)) = Literal(true)
   case (Literal(false, BooleanType), r) = r
   case (l, Literal(false, BooleanType)) = l
-  case (_, _) = or
+  // a || a = a
+  case (l, r) if l fastEquals r = l
+  case (_, _) =
+val lhsSet = splitConjunctivePredicates(left).toSet
+val rhsSet = splitConjunctivePredicates(right).toSet
+val common = lhsSet.intersect(rhsSet)
+val ldiff = lhsSet.diff(common)
+val rdiff = rhsSet.diff(common)
+if (common.size == 0) {
--- End diff --

to remove this if


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22297307
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -307,7 +309,29 @@ object BooleanSimplification extends Rule[LogicalPlan] 
{
   case (l, Literal(true, BooleanType)) = l
   case (Literal(false, BooleanType), _) = Literal(false)
   case (_, Literal(false, BooleanType)) = Literal(false)
-  case (_, _) = and
+  // a  a = a
+  case (l, r) if l fastEquals r = l
+  case (_, _) =
+val lhsSet = splitDisjunctivePredicates(left).toSet
+val rhsSet = splitDisjunctivePredicates(right).toSet
+val common = lhsSet.intersect(rhsSet)
+val ldiff = lhsSet.diff(common)
+val rdiff = rhsSet.diff(common)
+if (common.size == 0) {
--- End diff --

to remove this if


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68204050
  
Updated. /cc @liancheng, any comments here? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68204877
  
  [Test build #24850 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24850/consoleFull)
 for   PR 3778 at commit 
[`37022d1`](https://github.com/apache/spark/commit/37022d11cfcb9b316cf5cbe48db2ba7fd4b1f918).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68204880
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24850/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68204978
  
  [Test build #24851 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24851/consoleFull)
 for   PR 3778 at commit 
[`527e6ce`](https://github.com/apache/spark/commit/527e6cee23dca18c2f035c772f659394c3c700d5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68204981
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24851/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-27 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68172996
  
Hi @liancheng, @chenghao-intel, i refactored a clean one, can you have a 
look at it. I think it is more clean and readable.

https://github.com/scwf/spark/compare/apache:master...scwf:refactory-filter?expand=1
  

If you think it is ok, i will push it to this branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-27 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68173060
  
And @liancheng, can you give me a floating point precision test? i want to 
make sure this ok with float. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-27 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68178485
  
@scwf Two comments about your new branch:

1. `ConditionSimplification` is actually a synonym of 
`BooleanSimplification`. The latter also covers simple cases of `And` / `Or` 
expressions since `And` / `Or` *are* themselves boolean expressions. Thus these 
two can be naturally merged together.
2. Would like to add that, compared to optimizing cartesian product to 
equi-join, benefits brought by numeric comparison optimizations are very 
limited. Even in the best cases, they can only optimize out some O(1) 
comparisons, which are practically very cheap. Personally I don't think this 
worth the effort to either introduce an extra library dependency (which 
increases risks of dependency hell) or write complicated code that implements 
them (which increases maintenance costs). The low benefit-cost ratio makes I 
feel hesitant to include them.

(As for the floating point precision part, after rethinking this, I feel 
it's generally safe since we are only casting literal numeric values to double 
without further computations.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-27 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68180010
  
Ok, get it. After discuss with @liancheng offline, i will change this to 
cover 4th optimization and some simple numeric comparison, and do not introduce 
the extra library spire.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68180176
  
  [Test build #24845 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24845/consoleFull)
 for   PR 3778 at commit 
[`546a82b`](https://github.com/apache/spark/commit/546a82bfe463ad7f9ed15f6e026a81373735fa9a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68183126
  
  [Test build #24845 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24845/consoleFull)
 for   PR 3778 at commit 
[`546a82b`](https://github.com/apache/spark/commit/546a82bfe463ad7f9ed15f6e026a81373735fa9a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68183128
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24845/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-26 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68140510
  
Not only,  actually this PR cover optimizations as follows:
```
And/Or with same condition
a  a = a , a  a  a ... = a  
a || a = a , a || a || a ... = a

one And/Or with conditions that can be merged
a  2  a  2 = false, a  3  a  5 = a  5
a  2 || a = 2 = true, a  3 || a  5 = a  3

two And/Or with conditions that can be merged
(a  3  b  5) || a  2 = b  5 || a  2
(a  3 || b  5) || a  2 = true
(a  2  b  5)  a  3 = flase
(a  2 || b  5)  a  3 = b  5  a  3

more than two And/Or with common conditions
(a  b  c  ...) || (a  b  d  ...) || (a  b  e  ...) ... = 
a  b  ((c  ...) || (d  ...) || (e  ...) || ...)
(a || b || c || ...)  (a || b || d || ...)  (a || b || e || ...) ... = 
(a || b) || ((c || ...)  (f || ...)  (e || ...)  ...)
```
hi @liancheng, do you mind i refactory this and refer to your PR to cover 
all the cases above?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-26 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68169507
  
Actually I'd highly suggest you breaking this PR into at least two self 
contained PRs, which can be much easier to review and merge. Rule sets 1 and 4 
can be merged into one PR, rule sets 2 and 3 into another. Maybe we can remove 
rules 2 and 3 from this PR after your refactoring and get rule sets 1 and 4 
merged first (I realized #3784 doesn't cover all rules in set 4, because the 
second rule in set 4 doesn't help optimizing cartesian products).

The reason why I'm hesitant to include rule sets 2 and 3 is that, for now I 
don't see a sound yet concise implementation without introducing extra 
dependencies. Although I proposed the Spark `Interval` solution, I'd rather not 
introduce Spire. On the other hand, rule sets 1 and 4 have been proven to be 
both useful and easy to implement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-26 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68169633
  
Would like to add that the solution based on Spire `Interval` I posted 
above may suffer from floating point precision issue. Thus we might want to 
cast all integral comparisons to `Interval[Long]` and all fractional 
comparisons to `Interval[Double]` to fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22274369
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] {
   }
 }
 
+object ConditionSimplification extends Rule[LogicalPlan] {
+  import BinaryComparison.LiteralComparison
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case q: LogicalPlan = q transformExpressionsDown  {
+  case origin: CombinePredicate =
+origin.toOptimized
+}
+  }
+
+  type SplitFragments = Map[Expression, Option[Expression]]
+
+  implicit class CombinePredicateExtension(source: CombinePredicate) {
+def find(goal: Expression): Boolean = {
+  def delegate(child: Expression): Boolean = (child, goal) match {
+case (combine: CombinePredicate, _) =
+  isSameCombinePredicate(source, combine)  combine.find(goal)
+
+ // if left child is a literal
+ // LiteralComparison's unapply method change the literal and 
attribute position
+case (LiteralComparison(childComparison), 
LiteralComparison(goalComparison)) =
+  isSame(childComparison, goalComparison)
+
+case other =
+  isSame(child, goal)
+  }
+
+  // using method to avoid right side compute if left side is true
+  val leftResult = () = delegate(source.left)
+  val rightResult = () = delegate(source.right)
+  leftResult() || rightResult()
+}
+
+@inline
+def isOrPredicate: Boolean = {
+  source.isInstanceOf[Or]
+}
+
+// create a new combine predicate that has the same combine operator 
as this
+@inline
+def build(left: Expression, right: Expression): CombinePredicate = {
+  CombinePredicate(left, right, isOrPredicate)
+}
+
+// swap left child and right child
+@inline
+def swap: CombinePredicate = {
+  source.build(source.right, source.left)
+}
+
+def toOptimized: Expression = source match {
+  // one CombinePredicate, left equals right , drop right, keep left
+  // examples: a  a = a, a || a = a
+  case CombinePredicate(left, right) if left.fastEquals(right) =
+left
+
+  // one CombinePredicate and left and right are both binary comparison
+  // examples: a  2  a  2 = false
+  case origin @ CombinePredicate(LiteralComparison(left), 
LiteralComparison(right)) =
+// left or right maybe change its child position, so rebuild one
+val changed = origin.build(left, right)
+val optimized = changed.mergeComparison
+if (isSame(changed, optimized)) {
+  origin
+} else {
+  optimized
+}
+
+  case origin @ CombinePredicate(left @ CombinePredicate(ll, lr), 
right)
+if isNotCombinePredicate(ll, lr, right) =
+val leftOptimized = left.toOptimized
+if (isSame(left, leftOptimized)) {
+  if (isSame(ll, right) || isSame(lr, right)) {
+if (isSameCombinePredicate(origin, left)) leftOptimized else 
right
+  } else {
+val llRight = origin.build(ll, right)
+val lrRight = origin.build(lr, right)
+val llRightOptimized = llRight.toOptimized
+val lrRightOptimized = lrRight.toOptimized
+if (isSame(llRight, llRightOptimized)  isSame(lrRight, 
lrRightOptimized)) {
+  origin
+} else if ((isNotCombinePredicate(llRightOptimized, 
lrRightOptimized))
+  || isSameCombinePredicate(origin, left)) {
+  left.build(llRightOptimized, lrRightOptimized).toOptimized
+} else if (llRightOptimized.isLiteral || 
lrRightOptimized.isLiteral) {
+  left.build(llRightOptimized, lrRightOptimized)
+} else {
+  origin
+}
+  }
+} else if (isNotCombinePredicate(leftOptimized)) {
+  origin.build(leftOptimized, right).toOptimized
+} else {
+  origin
+}
+
+  case origin @ CombinePredicate(left, right @ CombinePredicate(left2, 
right2))
+if isNotCombinePredicate(left, left2, right2) =
+val changed = origin.swap
+val optimized = changed.toOptimized
+if (isSame(changed, optimized)) {
+  origin
+} else {
+  optimized
+}
+
+  // do optimize like : (a || b || c)   a = a, here a, b , c is a 
condition
+  case origin @ CombinePredicate(left:

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22276984
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] {
   }
 }
 
+object ConditionSimplification extends Rule[LogicalPlan] {
+  import BinaryComparison.LiteralComparison
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case q: LogicalPlan = q transformExpressionsDown  {
+  case origin: CombinePredicate =
+origin.toOptimized
+}
+  }
+
+  type SplitFragments = Map[Expression, Option[Expression]]
+
+  implicit class CombinePredicateExtension(source: CombinePredicate) {
+def find(goal: Expression): Boolean = {
+  def delegate(child: Expression): Boolean = (child, goal) match {
+case (combine: CombinePredicate, _) =
+  isSameCombinePredicate(source, combine)  combine.find(goal)
+
+ // if left child is a literal
+ // LiteralComparison's unapply method change the literal and 
attribute position
+case (LiteralComparison(childComparison), 
LiteralComparison(goalComparison)) =
+  isSame(childComparison, goalComparison)
+
+case other =
+  isSame(child, goal)
+  }
+
+  // using method to avoid right side compute if left side is true
+  val leftResult = () = delegate(source.left)
+  val rightResult = () = delegate(source.right)
+  leftResult() || rightResult()
+}
+
+@inline
+def isOrPredicate: Boolean = {
+  source.isInstanceOf[Or]
+}
+
+// create a new combine predicate that has the same combine operator 
as this
+@inline
+def build(left: Expression, right: Expression): CombinePredicate = {
+  CombinePredicate(left, right, isOrPredicate)
+}
+
+// swap left child and right child
+@inline
+def swap: CombinePredicate = {
+  source.build(source.right, source.left)
+}
+
+def toOptimized: Expression = source match {
+  // one CombinePredicate, left equals right , drop right, keep left
+  // examples: a  a = a, a || a = a
+  case CombinePredicate(left, right) if left.fastEquals(right) =
+left
+
+  // one CombinePredicate and left and right are both binary comparison
+  // examples: a  2  a  2 = false
+  case origin @ CombinePredicate(LiteralComparison(left), 
LiteralComparison(right)) =
+// left or right maybe change its child position, so rebuild one
+val changed = origin.build(left, right)
+val optimized = changed.mergeComparison
+if (isSame(changed, optimized)) {
+  origin
+} else {
+  optimized
+}
+
+  case origin @ CombinePredicate(left @ CombinePredicate(ll, lr), 
right)
+if isNotCombinePredicate(ll, lr, right) =
+val leftOptimized = left.toOptimized
+if (isSame(left, leftOptimized)) {
+  if (isSame(ll, right) || isSame(lr, right)) {
+if (isSameCombinePredicate(origin, left)) leftOptimized else 
right
+  } else {
+val llRight = origin.build(ll, right)
+val lrRight = origin.build(lr, right)
+val llRightOptimized = llRight.toOptimized
+val lrRightOptimized = lrRight.toOptimized
+if (isSame(llRight, llRightOptimized)  isSame(lrRight, 
lrRightOptimized)) {
+  origin
+} else if ((isNotCombinePredicate(llRightOptimized, 
lrRightOptimized))
+  || isSameCombinePredicate(origin, left)) {
+  left.build(llRightOptimized, lrRightOptimized).toOptimized
+} else if (llRightOptimized.isLiteral || 
lrRightOptimized.isLiteral) {
+  left.build(llRightOptimized, lrRightOptimized)
+} else {
+  origin
+}
+  }
+} else if (isNotCombinePredicate(leftOptimized)) {
+  origin.build(leftOptimized, right).toOptimized
+} else {
+  origin
+}
+
+  case origin @ CombinePredicate(left, right @ CombinePredicate(left2, 
right2))
+if isNotCombinePredicate(left, left2, right2) =
+val changed = origin.swap
+val optimized = changed.toOptimized
+if (isSame(changed, optimized)) {
+  origin
+} else {
+  optimized
+}
+
+  // do optimize like : (a || b || c)   a = a, here a, b , c is a 
condition
+  case origin @ CombinePredicate(left: CombinePredicate,

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22277093
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
@@ -112,7 +112,30 @@ case class InSet(value: Expression, hset: Set[Any])
   }
 }
 
-case class And(left: Expression, right: Expression) extends 
BinaryPredicate {
+abstract class CombinePredicate extends BinaryPredicate {
--- End diff --

Since we already have the `implicit` class in `Optimizer`, I don't really 
think we should make the existed expression more complicated. And probably even 
the `CombinePredicate` is not necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22277105
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
@@ -160,6 +183,49 @@ abstract class BinaryComparison extends 
BinaryPredicate {
   self: Product =
 }
 
+object BinaryComparison {
--- End diff --

The same as `CombinePredicate`, we'd better not combine the existed 
expression with the `Optimizer`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22277118
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] {
   }
 }
 
+object ConditionSimplification extends Rule[LogicalPlan] {
+  import BinaryComparison.LiteralComparison
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case q: LogicalPlan = q transformExpressionsDown  {
+  case origin: CombinePredicate =
--- End diff --

Right here, we have the `rules` here, probably we'd better not use the 
`CombinePredicate` any more, since it's more  clear to me if we list every 
possible `type` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22277164
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] {
   }
 }
 
+object ConditionSimplification extends Rule[LogicalPlan] {
+  import BinaryComparison.LiteralComparison
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case q: LogicalPlan = q transformExpressionsDown  {
+  case origin: CombinePredicate =
+origin.toOptimized
+}
+  }
+
+  type SplitFragments = Map[Expression, Option[Expression]]
+
+  implicit class CombinePredicateExtension(source: CombinePredicate) {
--- End diff --

Can we refactor the code as more functional style? For example: 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L43,
 The same code pattern is always easier for maintenance. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68119210
  
This is really a very useful optimization, particularly for those SQLs 
generated by machines. And it would make more senses if we add unit test to 
reflect the expression optimization will eventually change the behavior of 
logical plan optimization (Cartesian join == hash join). e.g. in 
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala#L62

In the meantime, I think the code can be polished as more functional style
(e.g. Remove the implicit classes and avoid using the unnecessary trait 
etc. ), as you know, most of the optimizing rules in Catalyst are 
straightforward and brief, which is easier to be maintained and extended for 
the others.

Sorry that I didn't dive into the details.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22277333
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
@@ -112,7 +112,30 @@ case class InSet(value: Expression, hset: Set[Any])
   }
 }
 
-case class And(left: Expression, right: Expression) extends 
BinaryPredicate {
+abstract class CombinePredicate extends BinaryPredicate {
--- End diff --

you mean we can do this in implicit class in Optimizer, right?
my initial idea of adding this is to unify the simplification of and/or and 
avoid duplicate code..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22277360
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] {
   }
 }
 
+object ConditionSimplification extends Rule[LogicalPlan] {
+  import BinaryComparison.LiteralComparison
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case q: LogicalPlan = q transformExpressionsDown  {
+  case origin: CombinePredicate =
+origin.toOptimized
+}
+  }
+
+  type SplitFragments = Map[Expression, Option[Expression]]
+
+  implicit class CombinePredicateExtension(source: CombinePredicate) {
--- End diff --

definitely should refactor here:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22277573
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
@@ -112,7 +112,30 @@ case class InSet(value: Expression, hset: Set[Any])
   }
 }
 
-case class And(left: Expression, right: Expression) extends 
BinaryPredicate {
+abstract class CombinePredicate extends BinaryPredicate {
--- End diff --

Sorry for the confusing, it's just a general comment! 
It probably saves some code in `Optimizer` if we making a general parent 
type here, however, we'd better not change any existed expression JUST for the 
`Optimizer`, and, it's more important to list all of the possible type 
combination in `Optimizer`, so people can get straightforward feeling on how to 
maintain / extend it in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68121946
  
@scwf Would you mind to list all the optimizations in the PR description 
first? Some more concise examples coupled with each optimization can be really 
helpful. Then we can help decoupling and polishing each optimization rule 
individually. It's really hard to grasp your intention from the provided 
complex query.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68123618
  
For numeric comparison optimizations, did some experiments along my former 
double interval comparison idea and came up with the following snippet, I 
haven't even compiled it yet, but it shows the general idea:

```scala
  private implicit class NumericLiteral(e: Literal) {
def toDouble = Cast(e, DoubleType).eval().asInstanceOf[Double]
  }

  object LiteralBinaryComparison {
def unapply(e: Expression): Option[(NamedExpression, Interval[Double])] 
= e match {
  case LessThan(n: NamedExpression, l @ Literal(_, _: NumericType)) = 
Some(n, Interval.below(l.toDouble))
  case LessThan(l @ Literal(_, _: NumericType), n: NamedExpression) = 
Some(n, Interval.atOrAbove(l.toDouble))

  case GreaterThan(n: NamedExpression, l @ Literal(_, _: NumericType)) 
= Some(n, Interval.above(l.toDouble))
  case GreaterThan(l @ Literal(_, dt: NumericType), n: NamedExpression) 
= Some(n, Interval.atOrBelow(l.toDouble))

  case LessThanOrEqual(n: NamedExpression, l @ Literal(_, _: 
NumericType)) = Some(n, Interval.atOrBelow(l.toDouble))
  case LessThanOrEqual(l @ Literal(_, _: NumericType), n: 
NamedExpression) = Some(n, Interval.above(l.toDouble))

  case GreaterThanOrEqual(n: NamedExpression, l @ Literal(_, _: 
NumericType)) = Some(n, Interval.atOrAbove(l.toDouble))
  case GreaterThanOrEqual(l @ Literal(_, _: NumericType), n: 
NamedExpression) = Some(n, Interval.below(l.toDouble))

  case EqualTo(n: NamedExpression, l @ Literal(_, _: NumericType)) = 
Some(n, Interval.point(l.toDouble))
}
  }

  def simplify(e: Expression): Expression = e transform {
case and @ And(
e1 @ LiteralBinaryComparison(n1, i1),
e2 @ LiteralBinaryComparison(n2, i2)) if n1 == n2 =
  if (i1.intersect(i2).isEmpty) Literal(false)
  else if (i1.isSubsetOf(i2)) e1
  else if (i1.isSupersetOf(i2)) e2
  else and

case or @ Or(
e1 @ LiteralBinaryComparison(n1, i1),
e2 @ LiteralBinaryComparison(n2, i2)) if n1 == n2 =
  if (i1.union(i2) == Interval.all[Double]) Literal(true)
  else if (i1.isSubsetOf(i2)) e2
  else if (i1.isSupersetOf(i2)) e1
  else or
  }
```

The interval utility comes from Spire, which needs to be added as a compile 
time dependency of Catalyst. The simplify method can be merged into 
`BooleanSimplification`.

I guess this together with #3784 grasp all optimizations introduced in this 
PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68035013
  
  [Test build #24773 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24773/consoleFull)
 for   PR 3778 at commit 
[`8c0316f`](https://github.com/apache/spark/commit/8c0316f8454f0ac8268f98d9a4c9cc29baedbf5b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class CombinePredicate extends BinaryPredicate `
  * `case class And(left: Expression, right: Expression) extends 
CombinePredicate `
  * `case class Or(left: Expression, right: Expression) extends 
CombinePredicate `
  * `  implicit class CombinePredicateExtension(source: CombinePredicate) `
  * `  implicit class ExpressionCookies(expression: Expression) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68035016
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24773/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68084460
  
  [Test build #24804 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24804/consoleFull)
 for   PR 3778 at commit 
[`8733027`](https://github.com/apache/spark/commit/8733027a544c9a89da123c37d9eaba98b62199ad).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68086250
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24804/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68086248
  
  [Test build #24804 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24804/consoleFull)
 for   PR 3778 at commit 
[`8733027`](https://github.com/apache/spark/commit/8733027a544c9a89da123c37d9eaba98b62199ad).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class CombinePredicate extends BinaryPredicate `
  * `case class And(left: Expression, right: Expression) extends 
CombinePredicate `
  * `case class Or(left: Expression, right: Expression) extends 
CombinePredicate `
  * `  implicit class CombinePredicateExtension(source: CombinePredicate) `
  * `  implicit class ExpressionCookies(expression: Expression) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-23 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68029380
  
@scwf These optimizations are useful, particularly the one that eliminates 
common predicates. Thanks for bringing them up! However, the implementation in 
this PR is really over complicated... I opened #3784 for a simpler version.

For the numeric comparisons, I think we can first cast all numeric 
comparisons to `Double` type and derive a `Double` range from each predicate 
and then eliminate unnecessary comparisons by comparing ranges. `Double` is 
preferred because `Double.PositiveInfinity` and `Double.NegativeInfinity` can 
be handy when dealing with predicates like `a  10` and `a  100`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68031794
  
  [Test build #24768 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24768/consoleFull)
 for   PR 3778 at commit 
[`f1a487f`](https://github.com/apache/spark/commit/f1a487f55797086f39a70536547c963854183858).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68032736
  
  [Test build #24773 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24773/consoleFull)
 for   PR 3778 at commit 
[`8c0316f`](https://github.com/apache/spark/commit/8c0316f8454f0ac8268f98d9a4c9cc29baedbf5b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68033864
  
  [Test build #24768 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24768/consoleFull)
 for   PR 3778 at commit 
[`f1a487f`](https://github.com/apache/spark/commit/f1a487f55797086f39a70536547c963854183858).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class CombinePredicate extends BinaryPredicate `
  * `case class And(left: Expression, right: Expression) extends 
CombinePredicate `
  * `case class Or(left: Expression, right: Expression) extends 
CombinePredicate `
  * `  implicit class CombinePredicateExtension(source: CombinePredicate) `
  * `  implicit class ExpressionCookies(expression: Expression) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68033865
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24768/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

41 matches

Mail list logo