Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/3778#issuecomment-68123618
For numeric comparison optimizations, did some experiments along my former
double interval comparison idea and came up with the following snippet, I
haven't even compiled it yet, but it shows the general idea:
```scala
private implicit class NumericLiteral(e: Literal) {
def toDouble = Cast(e, DoubleType).eval().asInstanceOf[Double]
}
object LiteralBinaryComparison {
def unapply(e: Expression): Option[(NamedExpression, Interval[Double])]
= e match {
case LessThan(n: NamedExpression, l @ Literal(_, _: NumericType)) =>
Some(n, Interval.below(l.toDouble))
case LessThan(l @ Literal(_, _: NumericType), n: NamedExpression) =>
Some(n, Interval.atOrAbove(l.toDouble))
case GreaterThan(n: NamedExpression, l @ Literal(_, _: NumericType))
=> Some(n, Interval.above(l.toDouble))
case GreaterThan(l @ Literal(_, dt: NumericType), n: NamedExpression)
=> Some(n, Interval.atOrBelow(l.toDouble))
case LessThanOrEqual(n: NamedExpression, l @ Literal(_, _:
NumericType)) => Some(n, Interval.atOrBelow(l.toDouble))
case LessThanOrEqual(l @ Literal(_, _: NumericType), n:
NamedExpression) => Some(n, Interval.above(l.toDouble))
case GreaterThanOrEqual(n: NamedExpression, l @ Literal(_, _:
NumericType)) => Some(n, Interval.atOrAbove(l.toDouble))
case GreaterThanOrEqual(l @ Literal(_, _: NumericType), n:
NamedExpression) => Some(n, Interval.below(l.toDouble))
case EqualTo(n: NamedExpression, l @ Literal(_, _: NumericType)) =>
Some(n, Interval.point(l.toDouble))
}
}
def simplify(e: Expression): Expression = e transform {
case and @ And(
e1 @ LiteralBinaryComparison(n1, i1),
e2 @ LiteralBinaryComparison(n2, i2)) if n1 == n2 =>
if (i1.intersect(i2).isEmpty) Literal(false)
else if (i1.isSubsetOf(i2)) e1
else if (i1.isSupersetOf(i2)) e2
else and
case or @ Or(
e1 @ LiteralBinaryComparison(n1, i1),
e2 @ LiteralBinaryComparison(n2, i2)) if n1 == n2 =>
if (i1.union(i2) == Interval.all[Double]) Literal(true)
else if (i1.isSubsetOf(i2)) e2
else if (i1.isSupersetOf(i2)) e1
else or
}
```
The interval utility comes from Spire, which needs to be added as a compile
time dependency of Catalyst. The simplify method can be merged into
`BooleanSimplification`.
I guess this together with #3784 grasp all optimizations introduced in this
PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]