eejbyfeldt commented on PR #37206: URL: https://github.com/apache/spark/pull/37206#issuecomment-1496124996
> @eejbyfeldt shuffle write processor is an instance variable, and so is a part of the dependency created at driver While that statement is true. That is not relevant for the point I was trying to make. @JoshRosen commented in https://github.com/apache/spark/pull/37206#issuecomment-1189930626 claiming that we should not perform concurrent access due to the fact that accumulators should be deserialized and therefore registered during the `Task deserilization and there should be no race with the hearbeat thread. But my claim was that is not true as accumulators will be deserialized in the `taskBinary` that is serialized later during the task execution. This can happen because of client code using accumulator (created spec here https://github.com/apache/spark/pull/40663 also linked in Jira) or due to spark serializing them as part of the `ShuffleDependency`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
