Shiqi Sun created SPARK-42404:
---------------------------------

             Summary: Spark driver pod should not create executor pods when 
there is no driver service
                 Key: SPARK-42404
                 URL: https://issues.apache.org/jira/browse/SPARK-42404
             Project: Spark
          Issue Type: Improvement
          Components: Kubernetes
    Affects Versions: 3.3.1
            Reporter: Shiqi Sun


Currently, the driver pod assumes the driver headless service exists when 
creating the executor pods. However, when this assumption doesn't hold, the 
driver would still spin up executor pods, and the executor pods would fail, and 
then the driver would try to create more pods, and so on. With this, the spark 
job doesn't make any progress, while it eats a lot of computational resource, 
and it won't reach to a terminal state until manual intervention (e.g. deleting 
the job or recreate the driver service).

 

This Jira Issue is to address this problem, by having the driver check the 
driver service before creating the executor pods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to