[GitHub] [spark] EnricoMi commented on issue #27377: [SPARK-30666][Core] Reliable single-stage accumulators

GitBox Tue, 17 Mar 2020 09:35:02 -0700

EnricoMi commented on issue #27377: [SPARK-30666][Core] Reliable single-stage 
accumulators
URL: https://github.com/apache/spark/pull/27377#issuecomment-600170844
 
 
   I have had a quick chat with @holdenk and we found two use cases where this 
approach will not work:
   
   1. partially computed partitions will lock the partial value of the 
aggregator, a subsequent complete computation will not update that partition's 
aggregator value
   2. building a Dataset on top of another one that contains an accumulator may 
produce two query plans where the aggregator is computed with differing 
partitioning
   
   I will look into these, so changing this to WIP. More feedback welcome.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] EnricoMi commented on issue #27377: [SPARK-30666][Core] Reliable single-stage accumulators

Reply via email to