[GitHub] spark pull request: [SPARK-10841][SQL] Add pushdown support of UDF...

marmbrus Wed, 07 Oct 2015 00:47:58 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/8922#issuecomment-146103912
  
    The point I'm trying to make is that we should generalize this, so that we 
don't have to special case every possible `udf(attr) <some comparison> literal` 
but can instead push down any case where we are running some predicate that 
involves only a single attribute reference.
    
    Your implementation, for example, doesn't handle: `a + 1 = 1` or `udf(a) = 
udf(a + 1)` and adding each of these individually is not going to scale.  Since 
we are already resorting to pushing down a function, why not leverage the 
existing evaluation framework.
    
    Here is a very rough sketch:
    ```scala 
    case class FilterFunction(func: Any => Boolean) extends Filter
    
    protected[sql] def selectFilters(filters: Seq[Expression]) = {
      filters.match {
        ...
          case e: Expression if e.references.size == 1 =>
            val boundExpression = BindReferences.bindReference(e, 
e.references.toSeq)
    
            Some(FilterFunction(
            (a: Any) => {
              val inputRow = new GenericInternalRow(Array(a))
              boundExpression.eval(inputRow).asInstanceOf[Boolean]
            }))
      }
    }
    ```
    
    There are a bunch of things that need to be done though before we could 
commit this though:
     - data type conversions
     - evaluate the cost of boxing and decide if we should specialize
     - consider codegen for this function



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-10841][SQL] Add pushdown support of UDF...

Reply via email to