skonto commented on a change in pull request #25614: [SPARK-28887][K8S]
Executor pod status fix
URL: https://github.com/apache/spark/pull/25614#discussion_r326112183
##########
File path:
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
##########
@@ -59,7 +61,24 @@ object ExecutorPodsSnapshot extends Logging {
case "pending" =>
PodPending(pod)
case "running" =>
- PodRunning(pod)
+ // TODO(SPARK-29023): Kubernenetes 1.17 sidecar container feature
will
+ // make this code redundant
https://github.com/kubernetes/enhancements/issues/753
+ // Checking executor container status is not terminated
+ // Pod status can still be running if sidecar container status is
running
Review comment:
What if the side car thing provides a service that we need eg. kerberos
renewal, shouldnt we remove the executor pod asap and not wait for the Spark
container error to happen? I think we should emit a log msg at least eg. "Spark
executor is running but pod is not healthy". Or we could have a configurable
property to allow removal if any container is failing. In addition, Pod
restartPolicy is set to `Never` since spark manages errors and since we bypass
the K8s deployment pattern we should provide a few more managements
capabilities. Thoughts?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]