dongjoon-hyun commented on a change in pull request #32752:
URL: https://github.com/apache/spark/pull/32752#discussion_r644431423
##########
File path:
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##########
@@ -99,6 +102,16 @@ private[spark] class ExecutorPodsAllocator(
@volatile private var deletedExecutorIds = Set.empty[Long]
def start(applicationId: String, schedulerBackend:
KubernetesClusterSchedulerBackend): Unit = {
+ // wait until the driver pod is ready to ensure executors can connect to
driver svc
Review comment:
Can we be more specific? The problem is the absence of K8s's headless
service resource for this driver pod. For example, since K8s is asynchronously
working, the problem can happen even when the driver pod is ready with all
sidekicks and the K8s service is not ready to work with this driver pod.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]