FINISH in Hive UDAF adapter

GitBox Thu, 25 Apr 2019 06:44:14 -0700

cloud-fan opened a new pull request #24459: [SPARK-24935][SQL][followup] 
support INIT -> UPDATE -> MERGE -> FINISH in Hive UDAF adapter
URL: https://github.com/apache/spark/pull/24459
 
 
   ## What changes were proposed in this pull request?
   
   This is a followup of https://github.com/apache/spark/pull/24144 . #24144 
missed one case: when hash aggregate fallback to sort aggregate, the life cycle 
of UDAF is: INIT -> UPDATE -> MERGE -> FINISH.
   
   However, not all Hive UDAF can support it. Hive UDAF knows the aggregation 
mode when creating the aggregation buffer, so that it can create different 
buffers for different inputs: the original data or the aggregation buffer. 
Please see an example in the [sketches 
library](https://github.com/DataSketches/sketches-hive/blob/7f9e76e9e03807277146291beb2c7bec40e8672b/src/main/java/com/yahoo/sketches/hive/cpc/DataToSketchUDAF.java#L107).
 The buffer for UPDATE may not support MERGE.
   
   This PR updates the Hive UDAF adapter in Spark to support INIT -> UPDATE -> 
MERGE -> FINISH, by turning it to  INIT -> UPDATE -> FINISH + IINIT -> MERGE -> 
FINISH.
   
   ## How was this patch tested?
   
   a new test case


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan opened a new pull request #24459: [SPARK-24935][SQL][followup] support INIT -> UPDATE -> MERGE -> FINISH in Hive UDAF adapter

Reply via email to