Github user Ngone51 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21486#discussion_r194606075
--- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala ---
@@ -197,14 +197,14 @@ private[spark] class HeartbeatReceiver(sc:
SparkContext, clock: Clock)
if (now - lastSeenMs > executorTimeoutMs) {
logWarning(s"Removing executor $executorId with no recent
heartbeats: " +
s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
- scheduler.executorLost(executorId, SlaveLost("Executor heartbeat "
+
- s"timed out after ${now - lastSeenMs} ms"))
// Asynchronously kill the executor to avoid blocking the
current thread
killExecutorThread.submit(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
// Note: we want to get an executor back after expiring this
one,
// so do not simply call `sc.killExecutor` here (SPARK-8119)
sc.killAndReplaceExecutor(executorId)
--- End diff --
To be more specific, `killAndReplaceExecutor#killExecutors` will block
until we get response from cluster manager or overtime after 120s (by default
RPC timeout config).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]