agrawaldevesh commented on a change in pull request #29422:
URL: https://github.com/apache/spark/pull/29422#discussion_r470889426
##########
File path:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -136,7 +137,9 @@ private[spark] class TaskSchedulerImpl(
// IDs of the tasks running on each executor
private val executorIdToRunningTaskIds = new HashMap[String, HashSet[Long]]
- private val executorsPendingDecommission = new HashMap[String,
ExecutorDecommissionInfo]
+ val executorsPendingDecommission = new HashMap[String,
ExecutorDecommissionInfo]
+ // map of second to list of executors to clear form the above map
+ val decommissioningExecutorsToGc = new util.TreeMap[Long,
mutable.ArrayBuffer[String]]()
Review comment:
Sure. Any structure that lets me GC by time will do. I just wanted
something lightweight and custom to this use case.
I expect the treemap to contain no more than 60 seconds worth of entries
since things are keyed by the second, and they are also cleaned up on every
check. The check happens on every executor loss and fetch failures. But yeah it
is possible that if there are no failures then the entries could just sit there
:-P.
I will change it to Cache. good idea.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]