tgravescs commented on a change in pull request #30103:
URL: https://github.com/apache/spark/pull/30103#discussion_r523023927



##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
##########
@@ -586,6 +586,7 @@ private[spark] class ExecutorMonitor(
       val hadActiveShuffle = hasActiveShuffle
       hasActiveShuffle = ids.exists(shuffleIds.contains)
       if (hadActiveShuffle && isIdle) {
+        shuffleIds.clear()

Review comment:
       I don't follow this change.  Here isIdle means we don't have any running 
tasks and hasActiveshuffle is false. The ids passed in are supposed to be all 
the active ones which is determined by shuffleToActiveJobs.  It does not mean 
that shuffleIds is empty. shuffleIds stick around until we get the 
ShuffleCleanedEvent that comes from context cleaner when RDD isn't referenced 
anymore. If its referenced even though the job may be done it means someone 
could use that still so we don't remove it until after the timeout.  Here if we 
are idle and had active shuffle we update the Timeout, which should be setting 
it to either cache or storage timeout and then when the ExecutorMonitor comes 
around to check it should see that its set and time it out regardless of if it 
has shuffle ids.
   
   If there is a bug in there somewhere then  we need to figure out where it 
is. I do not think is not the correct fix.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to