[PR] [SHUFFLE] [SPARK-47678] Check `spark.shuffle.readHostLocalDisk` when reading shuffle blocks [spark]

via GitHub Mon, 01 Apr 2024 17:43:03 -0700


hiboyang opened a new pull request, #45803:
URL: https://github.com/apache/spark/pull/45803


   ### What changes were proposed in this pull request?
   
   Check `spark.shuffle.readHostLocalDisk` config to determine whether read 
shuffle block from same local host machine.
   
   
   ### Why are the changes needed?
   
   Spark has a special shuffle optimization to check whether the shuffle block 
is on the same local host machine, and read from local disk when possible. This 
local host machine check is done by comparing block ip address with current 
host ip address. This will cause issue when running Spark on Kubernetes, 
because Kubernetes may reuse pod ip when some old executor exits and the new 
executor starts.
   
   Consider following sequence:
   
   1. Executor 1 starts with IP address 10.0.0.1.
   2. Some shuffle block (e.g. block1) is written on Executor 1.
   3. Executor 1 terminates.
   4. Executor 2 starts with same IP address 10.0.0.1 (this is rare, but did 
happen in our test, because Kubernetes may reuse ip when launching pods).
   5. Executor 2 tries to read block1. It finds block1's address is same as 
current host address, thus assumes block1 exists on its local disk.
   6. Executor 2 will read from local disk and get error, since block1 is not 
there (block1 is on Executor 1, which is gone)
   
   This is already a Spark config (spark.shuffle.readHostLocalDisk). We can 
reuse this config and check it in BlockStoreShuffleReader.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Manually tested.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SHUFFLE] [SPARK-47678] Check `spark.shuffle.readHostLocalDisk` when reading shuffle blocks [spark]

Reply via email to