tgravescs commented on a change in pull request #30103:
URL: https://github.com/apache/spark/pull/30103#discussion_r523023927
##########
File path:
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
##########
@@ -586,6 +586,7 @@ private[spark] class ExecutorMonitor(
val hadActiveShuffle = hasActiveShuffle
hasActiveShuffle = ids.exists(shuffleIds.contains)
if (hadActiveShuffle && isIdle) {
+ shuffleIds.clear()
Review comment:
I don't follow this change. Here isIdle means we don't have any running
tasks and hasActiveshuffle is false. The ids passed in are supposed to be all
the active ones which is determined by shuffleToActiveJobs. It does not mean
that shuffleIds is empty. shuffleIds stick around until we get the
ShuffleCleanedEvent that comes from context cleaner when RDD isn't referenced
anymore. If its referenced even though the job may be done it means someone
could use that still so we don't remove it until after the timeout. Here if we
are idle and had active shuffle we update the Timeout, which should be setting
it to either cache or storage timeout and then when the ExecutorMonitor comes
around to check it should see that its set and time it out regardless of if it
has shuffle ids.
If there is a bug in there somewhere then we need to figure out where it
is. I do not think is not the correct fix.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]