xuanyuanking commented on a change in pull request #24350: [SPARK-27348][Core] 
HeartbeatReceiver should remove lost executors from 
CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/24350#discussion_r276724612
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -205,6 +207,13 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, 
clock: Clock)
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we 
should remove
+            // lost executors from CoarseGrainedSchedulerBackend manually here 
(SPARK-27348)
+            sc.schedulerBackend match {
+              case backend: CoarseGrainedSchedulerBackend =>
+                backend.driverEndpoint.send(RemoveExecutor(executorId, 
ExecutorKilled))
 
 Review comment:
   Yeah, thanks for reminding, for `sc.killExecutor`, it still has the 
problem(maybe it's hardly happened but still possible, e.g., while we call 
sc.killExecutor, and with the same time the executor crush).
   But maybe we should fix this in another PR and JIRA, or I just change the 
SPARK-27348 description and fix them together, WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to