BoYang created SPARK-47678:
------------------------------

             Summary: Got fetch failed exception when new executor reused same 
ip address from a previously killed executor
                 Key: SPARK-47678
                 URL: https://issues.apache.org/jira/browse/SPARK-47678
             Project: Spark
          Issue Type: Bug
          Components: Shuffle
    Affects Versions: 3.5.1, 3.5.0, 3.4.1, 3.4.0, 3.4.2
         Environment: This only happens on Kubernetes, where same ip address 
can be re-used for new executor pod.
            Reporter: BoYang


This is an edge case which caused Spark on Kubernetes getting fetch failed 
exception when new executor reused same ip address from a previously killed 
executor.

The new executor checks shuffle block ip address and compares it with its own 
host address. If the two ip addresses are the same, the new executor will 
assume the block on its own local disk and try to read it locally. This causes 
failure since the block is actually on the previously killed executor which 
happened to have same ip address.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to