tgravescs commented on a change in pull request #35085:
URL: https://github.com/apache/spark/pull/35085#discussion_r830149464



##########
File path: core/src/main/scala/org/apache/spark/ContextCleaner.scala
##########
@@ -235,8 +235,10 @@ private[spark] class ContextCleaner(
     try {
       if (mapOutputTrackerMaster.containsShuffle(shuffleId)) {
         logDebug("Cleaning shuffle " + shuffleId)
-        mapOutputTrackerMaster.unregisterShuffle(shuffleId)
+        // Shuffle must be removed before it's unregistered from the output 
tracker
+        // to find blocks served by the shuffle service on deallocated 
executors
         shuffleDriverComponents.removeShuffle(shuffleId, blocking)
+        mapOutputTrackerMaster.unregisterShuffle(shuffleId)

Review comment:
       so I agree that this would be better the other way from a logical 
viewpoint, it makes more sense to unregister before you remove in case it tried 
to use those shuffle locations and mid removal some were actually there and 
some were not.  But in the context here normally it would get cleaned if 
nothing is referencing it anymore, so nothing could try to use it so it 
shouldn't matter the order.  Now there is also the RDD.cleanShuffleDependencies 
 though as well.  The problem here I think is he needs map output tracker to 
figure out what removeShuffle removes.  I haven't looked in detail but it might 
be better if unregistershuffle could return something that could then be passed 
to remove Shuffle for instance.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to