Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/8922#issuecomment-146103912
The point I'm trying to make is that we should generalize this, so that we
don't have to special case every possible `udf(attr) <some comparison> literal`
but can instead push down any case where we are running some predicate that
involves only a single attribute reference.
Your implementation, for example, doesn't handle: `a + 1 = 1` or `udf(a) =
udf(a + 1)` and adding each of these individually is not going to scale. Since
we are already resorting to pushing down a function, why not leverage the
existing evaluation framework.
Here is a very rough sketch:
```scala
case class FilterFunction(func: Any => Boolean) extends Filter
protected[sql] def selectFilters(filters: Seq[Expression]) = {
filters.match {
...
case e: Expression if e.references.size == 1 =>
val boundExpression = BindReferences.bindReference(e,
e.references.toSeq)
Some(FilterFunction(
(a: Any) => {
val inputRow = new GenericInternalRow(Array(a))
boundExpression.eval(inputRow).asInstanceOf[Boolean]
}))
}
}
```
There are a bunch of things that need to be done though before we could
commit this though:
- data type conversions
- evaluate the cost of boxing and decide if we should specialize
- consider codegen for this function
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]