Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1056#issuecomment-50173304
As far as I can tell, you're right - I don't see why updateShuffleMetrics
needs to be synchronized.
Uploading a patch that:
* Adds comments to TaskMetrics that explain which threads access it and
what not to do.
* Removes the race with hostname
* Sets shuffleFinishTime to -1
* Addresses your earlier style comments
I verified that I could click around the UI without exceptions. I also
looked in detail through the fields in TaskMetrics for races that we could face
with them. One small issue is that the metrics might be shipped off before
shuffleReadBytes is updated, but after shuffleFetchWaitTime is updated, so the
wait time would be slightly inaccurate with the number of bytes it represented.
My opinion is that this is ok.
Let me know what you think. If you'd still rather go a PartialTaskMetrics
approach, would you consider moving that to a followup patch? This patch
touches a bunch of stuff already, and I've had to upmerge with non-trivial
conflicts several times now.
Agreed that moving TaskMetrics and accumulators under a common framework
would be best.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---