pgandhi999 edited a comment on issue #23778: [SPARK-24935][SQL] : Problem with 
Executing Hive UDF's from Spark 2.2 Onwards
URL: https://github.com/apache/spark/pull/23778#issuecomment-466176136
 
 
   Sure @cloud-fan . Thank you for your response. As far as my understanding of 
Hive UDAF is concerned, I can roughly classify them into into types: those that 
support partial aggregation(Mode PARTIAL and FINAL) and those that do not(Mode 
COMPLETE). For the Hive UDAFs that support partial aggregation, there are five 
phases:
   - **Initialize:** The aggregation buffers for PARTIAL1 Mode and PARTIAL2 
Mode are created in this phase.
   - **Iterate(Update) :** This state processes a new row of data into the 
aggregation buffer created for PARTIAL1.
   - **TerminatePartial:** Returns the contents of the aggregation buffer.
   - **Merge:** Merges a partial aggregation returned by calling 
terminatePartial() on PARTIAL1 aggregation buffer into the current aggregation 
happening on PARTIAL2 aggregation buffer.
   - **Terminate:** Returns the final result of the aggregation stored in 
PARTIAL2 buffer to Hive.  
   
   For the Hive UDAFs that do not support partial aggregation, I have seen the 
following three phases:
   -**Initialize:** Initialize the aggregation buffer.
   -**Iterate(Update):** Process the rows into the buffer.
   -**Terminate:** Return the final result.
   
   For more information, you may find this link helpful: 
https://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy
   
   This information is based on what I have found out during my tests and 
reading through the docs and it is based on this information that I have 
modeled the behaviour of the class `HiveTypedImperativeAggregate`. I am by no 
means an expert on Hive, so if you feel that my summary on Hive UDAFs is 
incorrect or is missing something, please let me know. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to