Github user baluchicken commented on a diff in the pull request:
https://github.com/apache/spark/pull/21067#discussion_r194794691
--- Diff:
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
---
@@ -67,12 +68,19 @@ private[spark] class BasicExecutorFeatureStep(
}
private val executorLimitCores =
kubernetesConf.get(KUBERNETES_EXECUTOR_LIMIT_CORES)
- override def configurePod(pod: SparkPod): SparkPod = {
- val name =
s"$executorPodNamePrefix-exec-${kubernetesConf.roleSpecificConf.executorId}"
+ // If the driver pod is killed, the new driver pod will try to
+ // create new executors with the same name, but it will fail
+ // and hangs indefinitely because a terminating executors blocks
+ // the creation of the new ones, so to avoid that apply salt
+ private val executorNameSalt =
Random.alphanumeric.take(4).mkString("").toLowerCase
--- End diff --
If we use applicationID as a salt the executor pod name will exceed the 64
length limit in case the application name is longer.
`override def configureExecutorPod(pod: SparkExecutorPod): SparkExecutorPod
= {
val name = s"$executorPodNamePrefix-$applicationID" +
s"-exec-${kubernetesConf.roleSpecificConf.executorId}"`
For example if the application name is networkwordcount then this result an
executor pod name like:
`networkwordcount-1519234651100-spark-application-1528815922371-exec-10`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]