erikerlandson commented on a change in pull request #25024: [SPARK-27296][SQL]
User Defined Aggregators that do not ser/de on each input row
URL: https://github.com/apache/spark/pull/25024#discussion_r307836766
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/expressions/udaf.scala
##########
@@ -165,3 +165,140 @@ abstract class MutableAggregationBuffer extends Row {
/** Update the ith value of this buffer. */
def update(i: Int, value: Any): Unit
}
+
+/**
+ * The base class for implementing user-defined imperative aggregator (UDIA).
+ *
+ * @tparam A the aggregating type: implements the aggregation logic.
+ *
+ * @since 3.0.0
+ */
+@Experimental
+abstract class UserDefinedImperativeAggregator[A] extends Serializable {
Review comment:
Certainly agreed wrt `DataSet[Row]` as far as that goes.
There are some other boxes to check. Off the top of my head:
* Can an `Encoder` be built around `UserDefinedType`? (maybe) If so will
pyspark be able to use it in the same way?
* `Aggregator` has `toColumn`, which returns a `TypedColumn`, compared to
UDAF, which has `apply` and `distinct`, returning `Column` - what are the
implications of those differences?
* as you say, `register`. Getting that to work with UDIA was pretty easy,
so I'd guess that is feasible.
To me, it is very important that a solution operates via pyspark in the way
that UDAF does.
I would also prefer something that "can pass all the same unit tests as
UDAF", which I believe I was able to do with UDIA. I don't think `Aggregator`
exactly does that, although it might be reasonably close for most purposes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]