[GitHub] [spark] cloud-fan commented on a change in pull request #29422: [SPARK-32613][CORE] Fix regressions in DecommissionWorkerSuite

GitBox Mon, 17 Aug 2020 01:21:44 -0700


cloud-fan commented on a change in pull request #29422:
URL: https://github.com/apache/spark/pull/29422#discussion_r471318413




##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -136,7 +139,21 @@ private[spark] class TaskSchedulerImpl(
   // IDs of the tasks running on each executor
   private val executorIdToRunningTaskIds = new HashMap[String, HashSet[Long]]
 
-  private val executorsPendingDecommission = new HashMap[String, 
ExecutorDecommissionInfo]
+  // We add executors here when we first get decommission notification for 
them. Executors can
+  // continue to run even after being asked to decommission, but they will 
eventually exit.
+  val executorsPendingDecommission = new HashMap[String, 
ExecutorDecommissionInfo]
+
+  // When they exit and we know of that via heartbeat failure, we will add 
them to this cache.
+  // This cache is consulted to know if a fetch failure is because a source 
executor was
+  // decommissioned.
+  lazy val decommissionedExecutorsRemoved = CacheBuilder.newBuilder()
+    .expireAfterWrite(
+      conf.getLong("spark.decommissioningRememberAfterRemoval.seconds", 60L), 
TimeUnit.SECONDS)

Review comment:
       BTW this is core and we can define the config in 
`org.apache.spark.internal.config`.
   
   According to other decommission related configs, how about 
`spark.driver.decommission.infoCacheTTLInSeconds`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #29422: [SPARK-32613][CORE] Fix regressions in DecommissionWorkerSuite

Reply via email to