[GitHub] [spark] erikerlandson commented on a change in pull request #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row

GitBox Thu, 12 Dec 2019 10:07:26 -0800

erikerlandson commented on a change in pull request #25024: [SPARK-27296][SQL] 
User Defined Aggregators that do not ser/de on each input row
URL: https://github.com/apache/spark/pull/25024#discussion_r357291403


 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
 ##########
 @@ -79,6 +80,24 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
     udaf
   }
 
+  /**
+   * Registers a typed [[Aggregator]] for use with untyped Data Frames
+   *
+   * @param name the name to register under
+   * @param agg the typed Aggregator
+   * @return a UserDefinedAggregator that can be used as an aggregating UDF
+   *
+   * @since 3.0.0
+   */
+  def registerAggregator[IN: TypeTag, BUF, OUT](
 
 Review comment:
   @cloud-fan @rdblue  this iteration still has a `TypeTag` implicit creeping 
in , here and also in `case class UserDefinedAggregator` - the primary reason 
is that `Aggregator` has no input encoder associated with it, except what can 
be inferred from `IN`.
   
   I see two ways to square that circle:
   1. make the input-encoder a parameter for `registerAggregator` and 
`UserDefinedAggregator` (or an alternative overloading for java users).
   1. Add an `inputEncoder` method to `Aggregator`
   
   ATM I am feeling partial to adding `inputEncoder` to `Aggregator`, since the 
transition to 3.0 allows for that kind of API change and it would be symmetric 
with the other two aggregator methods, but the other approach is also 
reasonable.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] erikerlandson commented on a change in pull request #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row

Reply via email to