[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...

liyinan926 Tue, 08 May 2018 13:49:13 -0700

Github user liyinan926 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21241#discussion_r186862624
  
    --- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
    @@ -320,50 +322,83 @@ private[spark] class 
KubernetesClusterSchedulerBackend(
         override def eventReceived(action: Action, pod: Pod): Unit = {
           val podName = pod.getMetadata.getName
           val podIP = pod.getStatus.getPodIP
    -
    +      val podPhase = pod.getStatus.getPhase
           action match {
    -        case Action.MODIFIED if (pod.getStatus.getPhase == "Running"
    +        case Action.MODIFIED if (podPhase == "Running"
                 && pod.getMetadata.getDeletionTimestamp == null) =>
               val clusterNodeName = pod.getSpec.getNodeName
               logInfo(s"Executor pod $podName ready, launched at 
$clusterNodeName as IP $podIP.")
               executorPodsByIPs.put(podIP, pod)
     
    -        case Action.DELETED | Action.ERROR =>
    +        case Action.MODIFIED if (podPhase == "Init:Error" || podPhase == 
"Init:CrashLoopBackoff")
    --- End diff --
    
    I took a look at the client code, and it appears to me that `getPhase` 
simply returns the value of json property `phase` of `PodStatus`. Have you seen 
`Init:Error` as the return value in practice?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...

Reply via email to