eejbyfeldt commented on PR #37206:
URL: https://github.com/apache/spark/pull/37206#issuecomment-1496124996

   > @eejbyfeldt shuffle write processor is an instance variable, and so is a 
part of the dependency created at driver
   
   While that statement is true. That is not relevant for the point I was 
trying to make. @JoshRosen commented in 
https://github.com/apache/spark/pull/37206#issuecomment-1189930626 claiming 
that we should not perform concurrent access due to the fact that accumulators 
should be deserialized and therefore registered during the `Task deserilization 
and there should be no race with the hearbeat thread. But my claim was that is 
not true as accumulators will be deserialized  in the `taskBinary` that is 
serialized later during the task execution. This can happen because of client 
code using accumulator (created spec here 
https://github.com/apache/spark/pull/40663 also linked in Jira) or due to spark 
serializing them as part of the `ShuffleDependency`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to