Github user liyinan926 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21241#discussion_r186862624
--- Diff:
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
---
@@ -320,50 +322,83 @@ private[spark] class
KubernetesClusterSchedulerBackend(
override def eventReceived(action: Action, pod: Pod): Unit = {
val podName = pod.getMetadata.getName
val podIP = pod.getStatus.getPodIP
-
+ val podPhase = pod.getStatus.getPhase
action match {
- case Action.MODIFIED if (pod.getStatus.getPhase == "Running"
+ case Action.MODIFIED if (podPhase == "Running"
&& pod.getMetadata.getDeletionTimestamp == null) =>
val clusterNodeName = pod.getSpec.getNodeName
logInfo(s"Executor pod $podName ready, launched at
$clusterNodeName as IP $podIP.")
executorPodsByIPs.put(podIP, pod)
- case Action.DELETED | Action.ERROR =>
+ case Action.MODIFIED if (podPhase == "Init:Error" || podPhase ==
"Init:CrashLoopBackoff")
--- End diff --
I took a look at the client code, and it appears to me that `getPhase`
simply returns the value of json property `phase` of `PodStatus`. Have you seen
`Init:Error` as the return value in practice?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]