tgravescs commented on a change in pull request #35085:
URL: https://github.com/apache/spark/pull/35085#discussion_r830149464
##########
File path: core/src/main/scala/org/apache/spark/ContextCleaner.scala
##########
@@ -235,8 +235,10 @@ private[spark] class ContextCleaner(
try {
if (mapOutputTrackerMaster.containsShuffle(shuffleId)) {
logDebug("Cleaning shuffle " + shuffleId)
- mapOutputTrackerMaster.unregisterShuffle(shuffleId)
+ // Shuffle must be removed before it's unregistered from the output
tracker
+ // to find blocks served by the shuffle service on deallocated
executors
shuffleDriverComponents.removeShuffle(shuffleId, blocking)
+ mapOutputTrackerMaster.unregisterShuffle(shuffleId)
Review comment:
so I agree that this would be better the other way from a logical
viewpoint, it makes more sense to unregister before you remove in case it tried
to use those shuffle locations and mid removal some were actually there and
some were not. But in the context here normally it would get cleaned if
nothing is referencing it anymore, so nothing could try to use it so it
shouldn't matter the order. Now there is also the RDD.cleanShuffleDependencies
though as well. The problem here I think is he needs map output tracker to
figure out what removeShuffle removes. I haven't looked in detail but it might
be better if unregistershuffle could return something that could then be passed
to remove Shuffle for instance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]