Holden Karau created SPARK-40379: ------------------------------------ Summary: Propagate decommission executor loss reason during onDisconnect in K8s Key: SPARK-40379 URL: https://issues.apache.org/jira/browse/SPARK-40379 Project: Spark Issue Type: Improvement Components: Kubernetes, Spark Core Affects Versions: 3.4.0 Reporter: Holden Karau Assignee: Holden Karau
Currently if an executor has been sent a decommission message and then it disconnects from the scheduler we only disable the executor depending on the K8s status events to drive the rest of the state transitions. However, the K8s status events can become overwhelmed on large clusters so we should check if an executor is in a decommissioning state when it is disconnected and use that reason instead of waiting on the K8s status events so we have more accurate logging information. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org