[GitHub] [spark] agrawaldevesh edited a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

GitBox Fri, 12 Jun 2020 19:09:06 -0700


agrawaldevesh edited a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-643552975



   @holdenk, what are your thoughts on reducing the number of fetch failed 
exceptions (and thus potential job failures) due to decommissioning: 
   - This PR tries to offload the shuffle blocks to another peer but that is 
somewhat best effort. 
   - When the executor is eventually lost (detected much later by a heartbeat 
timeout), its shuffle files will be cleared. Until that happens the unmigrated 
blocks (for whatever reason) will result in fetch failures. 
   
   Block migration may not be possible in all cases since we may not have 
enough time to do the migration or the peer's disk might be full or it might 
soon after be decommissioned itself. This sort of "bulk decommissioning" can 
happen for example during a cluster scale down where many nodes are brought 
down in quick succession. 
   
   While block migration is great in that it avoids wasting work, we cannot 
guarantee it and I think we need a second line of defense by proactively 
calling `MapOutputTracker.removeOutputsOnExecutor` (and clearing 
`DAGScheduler.cacheLocs`) perhaps after some timeout. (I think FetchFailures 
may still happen if a reducer has already started.)
   
   Another way to solve this problem is for the executor to cleanly exit and 
tell the driver that it is going away, so that the driver can forget about its 
(unmigrated) shuffle blocks. But even this is somewhat best effort since the 
cloud provider may not give it time to do a clean shutdown.
   
   It would be nice to prevent/reduce job failures caused by fetch failures: 
Particularly because we already know that the executor would be yanked soon, so 
we should be able to do better. 
   
   All these three approaches to preventing fetch failures are orthogonal and 
we probably should allow users to enable each one of them independently:
   * Offloading shuffle blocks
   * Clearing shuffle statuses 
   * Eager and clean termination of the executor.
   
   Thanks.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] agrawaldevesh edited a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

Reply via email to