BoYang created SPARK-47678: ------------------------------ Summary: Got fetch failed exception when new executor reused same ip address from a previously killed executor Key: SPARK-47678 URL: https://issues.apache.org/jira/browse/SPARK-47678 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.5.1, 3.5.0, 3.4.1, 3.4.0, 3.4.2 Environment: This only happens on Kubernetes, where same ip address can be re-used for new executor pod. Reporter: BoYang
This is an edge case which caused Spark on Kubernetes getting fetch failed exception when new executor reused same ip address from a previously killed executor. The new executor checks shuffle block ip address and compares it with its own host address. If the two ip addresses are the same, the new executor will assume the block on its own local disk and try to read it locally. This causes failure since the block is actually on the previously killed executor which happened to have same ip address. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org