EnricoMi commented on issue #27377: [SPARK-30666][Core] Reliable single-stage accumulators URL: https://github.com/apache/spark/pull/27377#issuecomment-600170844 I have had a quick chat with @holdenk and we found two use cases where this approach will not work: 1. partially computed partitions will lock the partial value of the aggregator, a subsequent complete computation will not update that partition's aggregator value 2. building a Dataset on top of another one that contains an accumulator may produce two query plans where the aggregator is computed with differing partitioning I will look into these, so changing this to WIP. More feedback welcome.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
