[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

liancheng Thu, 25 Dec 2014 21:12:38 -0800

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/3778#issuecomment-68123618
  
    For numeric comparison optimizations, did some experiments along my former 
double interval comparison idea and came up with the following snippet, I 
haven't even compiled it yet, but it shows the general idea:
    
    ```scala
      private implicit class NumericLiteral(e: Literal) {
        def toDouble = Cast(e, DoubleType).eval().asInstanceOf[Double]
      }
    
      object LiteralBinaryComparison {
        def unapply(e: Expression): Option[(NamedExpression, Interval[Double])] 
= e match {
          case LessThan(n: NamedExpression, l @ Literal(_, _: NumericType)) => 
Some(n, Interval.below(l.toDouble))
          case LessThan(l @ Literal(_, _: NumericType), n: NamedExpression) => 
Some(n, Interval.atOrAbove(l.toDouble))
    
          case GreaterThan(n: NamedExpression, l @ Literal(_, _: NumericType)) 
=> Some(n, Interval.above(l.toDouble))
          case GreaterThan(l @ Literal(_, dt: NumericType), n: NamedExpression) 
=> Some(n, Interval.atOrBelow(l.toDouble))
    
          case LessThanOrEqual(n: NamedExpression, l @ Literal(_, _: 
NumericType)) => Some(n, Interval.atOrBelow(l.toDouble))
          case LessThanOrEqual(l @ Literal(_, _: NumericType), n: 
NamedExpression) => Some(n, Interval.above(l.toDouble))
    
          case GreaterThanOrEqual(n: NamedExpression, l @ Literal(_, _: 
NumericType)) => Some(n, Interval.atOrAbove(l.toDouble))
          case GreaterThanOrEqual(l @ Literal(_, _: NumericType), n: 
NamedExpression) => Some(n, Interval.below(l.toDouble))
    
          case EqualTo(n: NamedExpression, l @ Literal(_, _: NumericType)) => 
Some(n, Interval.point(l.toDouble))
        }
      }
    
      def simplify(e: Expression): Expression = e transform {
        case and @ And(
            e1 @ LiteralBinaryComparison(n1, i1),
            e2 @ LiteralBinaryComparison(n2, i2)) if n1 == n2 =>
          if (i1.intersect(i2).isEmpty) Literal(false)
          else if (i1.isSubsetOf(i2)) e1
          else if (i1.isSupersetOf(i2)) e2
          else and
    
        case or @ Or(
            e1 @ LiteralBinaryComparison(n1, i1),
            e2 @ LiteralBinaryComparison(n2, i2)) if n1 == n2 =>
          if (i1.union(i2) == Interval.all[Double]) Literal(true)
          else if (i1.isSubsetOf(i2)) e2
          else if (i1.isSupersetOf(i2)) e1
          else or
      }
    ```
    
    The interval utility comes from Spire, which needs to be added as a compile 
time dependency of Catalyst. The simplify method can be merged into 
`BooleanSimplification`.
    
    I guess this together with #3784 grasp all optimizations introduced in this 
PR?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

Reply via email to