[GitHub] [spark] erikerlandson commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row

GitBox Tue, 22 Oct 2019 09:50:50 -0700

erikerlandson commented on issue #25024: [SPARK-27296][SQL] User Defined 
Aggregators that do not ser/de on each input row
URL: https://github.com/apache/spark/pull/25024#issuecomment-545054005
 
 
   @rdblue @cloud-fan,  I hesitate to propose this because it would be a fourth 
iteration on this PR, but:  IF we are willing to alter the class signature for 
`Aggregator` for 3.0 release, there is an opportunity to simplify things:
   ```scala
   // note IN is no longer contravariant
   // potentially, all encoder info becomes implicit
   abstract class Aggregator[IN, BUF, OUT] extends Serializable {
     def zero: BUF
     def reduce(b: BUF, a: IN): BUF
     def merge(b1: BUF, b2: BUF): BUF
     def finish(reduction: BUF): OUT
     def apply(exprs: Column*)(implicit
       eIN: Encoder[IN], eBUF: Encoder[BUF], eOUT: Encoder[OUT]: Column = // 
untyped aggregator
     def toColumn(implicit
       eBUF: Encoder[BUF], eOUT: Encoder[OUT]: TypedColumn[IN,OUT] = // typed 
aggregator
   }
   ```
   The above should allow us to get rid of the intermediate 
`UserDefinedAggregator`.
   
   This can be combined with a modernizing refactor of moving `Encoder` 
implicits into the `Encoder` companion object.
   
   Another option would be to default `bufferEncoder` and `outputEncoder` to 
implicits, and allow them to be overridden for cases where implicits might not 
cover someone's case (like TDigest). This might be safer from an "escape hatch" 
perspective.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] erikerlandson commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row

Reply via email to