Github user baluchicken commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21067#discussion_r194411939
  
    --- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
    @@ -59,16 +59,18 @@ private[spark] class KubernetesClusterSchedulerBackend(
     
       private val kubernetesNamespace = conf.get(KUBERNETES_NAMESPACE)
     
    -  private val kubernetesDriverPodName = conf
    -    .get(KUBERNETES_DRIVER_POD_NAME)
    -    .getOrElse(throw new SparkException("Must specify the driver pod 
name"))
    +  private val kubernetesDriverJobName = conf
    +    .get(KUBERNETES_DRIVER_JOB_NAME)
    +    .getOrElse(throw new SparkException("Must specify the driver job 
name"))
       private implicit val requestExecutorContext = 
ExecutionContext.fromExecutorService(
         requestExecutorsService)
     
    -  private val driverPod = kubernetesClient.pods()
    -    .inNamespace(kubernetesNamespace)
    -    .withName(kubernetesDriverPodName)
    -    .get()
    +  private val driverPod: Pod = {
    +    val pods = kubernetesClient.pods()
    +      .inNamespace(kubernetesNamespace).withLabel("job-name", 
kubernetesDriverJobName).list()
    --- End diff --
    
    I don't think this can happen, I can think two scenarios:
    - Job fails: No one restarts the Job the user need to use the spark-submit 
again (all job related pods will be deleted because of the Ownerreference)
    - Pod fails: Job will recreate a new Driver Pod to replace the failed one. 
There will be only one Driver pod because the failed one will be removed by the 
Kubernetes garbage collector. 
    
    Can you please elaborate on what do you mean by restart here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to