rdblue commented on a change in pull request #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#discussion_r336668483
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala ########## @@ -450,3 +452,165 @@ case class ScalaUDAF( override def nodeName: String = udaf.getClass.getSimpleName } + +/** + * The internal wrapper used to hook a [[UserDefinedImperativeAggregator]] `udia` in the + * internal aggregation code path. + */ +case class ScalaUDIA[T]( + children: Seq[Expression], + udia: UserDefinedImperativeAggregator[T], + mutableAggBufferOffset: Int = 0, + inputAggBufferOffset: Int = 0) + extends TypedImperativeAggregate[T] + with NonSQLExpression + with UserDefinedExpression + with ImplicitCastInputTypes + with Logging { + + def dataType: DataType = udia.resultType + + val inputTypes: Seq[DataType] = udia.inputSchema.map(_.dataType) Review comment: As you said, I think this is important for performance. It also affects the output type. One valid way to handle null input values is to make the output null. For example, `avg(1, null, 3)` is `null`. If input values are guaranteed to be non-null, then functions like this would guarantee a non-null result and wouldn't need to check for null values. I think it's important that this be a full struct type, not just the individual data types. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org