[GitHub] [spark] pgandhi999 opened a new pull request #24149: [SPARK-27207] : Ensure aggregate buffers are initialized again for So…

GitBox Tue, 19 Mar 2019 14:34:14 -0700

pgandhi999 opened a new pull request #24149: [SPARK-27207] : Ensure aggregate 
buffers are initialized again for So…
URL: https://github.com/apache/spark/pull/24149
 
 
   …rtBasedAggregate
   
   Normally, the aggregate operations that are invoked for an aggregation 
buffer for User Defined Aggregate Functions(UDAF) follow the order like 
initialize(), update(), eval() OR initialize(), merge(), eval(). However, after 
a certain threshold configurable by 
spark.sql.objectHashAggregate.sortBased.fallbackThreshold is reached, 
ObjectHashAggregate falls back to SortBasedAggregator which invokes the merge 
or update operation without calling initialize() on the aggregate buffer.
   
   ## What changes were proposed in this pull request?
   
   The fix here is to initialize aggregate buffers again when fallback to 
SortBasedAggregate operator happens.
   
   ## How was this patch tested?
   
   The patch was tested as part of 
[SPARK-24935](https://issues.apache.org/jira/browse/SPARK-24935) as documented 
in PR https://github.com/apache/spark/pull/23778.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pgandhi999 opened a new pull request #24149: [SPARK-27207] : Ensure aggregate buffers are initialized again for So…

Reply via email to