Victsm edited a comment on pull request #30164:
URL: https://github.com/apache/spark/pull/30164#issuecomment-718135591


   @tgravescs 
   What we have is the same as what's described in the paper and the SPIP doc.
   For handling DRA, we are essentially doing 2 things:
   1. Choose shuffle service locations beyond the current active Spark 
executors.
   2. Launching Spark executors with DRA based on locations of the chosen 
shuffle services.
   
   This PR enables the first.
   By keeping track of all historical locations of executors launched for a 
given Spark application, we get 2 benefits.
   1) When DRA kicks in later on, and significantly reduces the number of 
available active executors, we can still look into the historical locations of 
past executors to get sufficient shuffle service locations to perform block 
push/merge.
   2) On a YARN cluster with authentication enabled, picking historical 
locations of past executors would ensure that the executor can talk to the 
shuffle service performing SASL authentication, and upon application finishing 
up the local dirs storing the merged shuffle files get cleaned up.
   
   In a follow up patch for driver side change 
(MapOutputTracker#getPreferredLocationsForShuffle), the second is enabled.
   Preferred location for shuffle now takes into consideration of shuffle 
service locations for a given shuffle.
   This would set the preferred locations for the corresponding `ShuffleRDD` as 
well, which would then have 2 impacts.
   1) When TaskSetManager schedules tasks to executors, this would impact the 
task placement strategy.
   2) When ExecutorAllocationManager requests more executors for DRA, this 
preferred location would be passed to YARN to request containers with the 
preferred locality.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to