Hi, I want to write a UDF/UDAF which provides native processing performance. Currently, when creating a UDF/UDAF in a normal manner the performance is hit because it breaks optimizations. For a simple example I wanted to create a UDF which tests whether the value is smaller than 10. I tried something like this :
import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.catalyst.util.TypeUtils import org.apache.spark.sql.types._ import org.apache.spark.util.Utils import org.apache.spark.sql.catalyst.expressions._ case class genf(child: Expression) extends UnaryExpression with Predicate with ImplicitCastInputTypes { override def inputTypes: Seq[AbstractDataType] = Seq(IntegerType) override def toString: String = s"$child < 10" override def eval(input: InternalRow): Any = { val value = child.eval(input) if (value == null) { false } else { child.dataType match { case IntegerType => value.asInstanceOf[Int] < 10 } } } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { defineCodeGen(ctx, ev, c => s"($c) < 10") } } However, this doesn't work as some of the underlying classes/traits are private (e.g. AbstractDataType is private) making it problematic to create a new case class. Is there a way to do it? The idea is to provide a couple of jars with a bunch of functions our team needs. Thanks, Assaf. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/implement-UDF-UDAF-supporting-whole-stage-codegen-tp18874.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.