Shiqi Sun created SPARK-42404: --------------------------------- Summary: Spark driver pod should not create executor pods when there is no driver service Key: SPARK-42404 URL: https://issues.apache.org/jira/browse/SPARK-42404 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.3.1 Reporter: Shiqi Sun
Currently, the driver pod assumes the driver headless service exists when creating the executor pods. However, when this assumption doesn't hold, the driver would still spin up executor pods, and the executor pods would fail, and then the driver would try to create more pods, and so on. With this, the spark job doesn't make any progress, while it eats a lot of computational resource, and it won't reach to a terminal state until manual intervention (e.g. deleting the job or recreate the driver service). This Jira Issue is to address this problem, by having the driver check the driver service before creating the executor pods. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org