vanzin opened a new pull request #24817: [WIP][SPARK-27963][core] Allow dynamic 
allocation without a shuffle service.
URL: https://github.com/apache/spark/pull/24817
 
 
   This change adds a new option that enables dynamic allocation without
   the need for a shuffle service. This mode works by tracking which stages
   generate shuffle files, and keeping executors that generate data for those
   shuffles alive while the jobs that use them are active.
   
   A separate timeout is also added for shuffle data; so that executors that
   hold shuffle data can use a separate timeout before being removed because
   of being idle. This allows the shuffle data to be kept around in case it
   is needed by some new job, or allow users to be more aggressive in timing
   out executors that don't have shuffle data in active use.
   
   The code also hooks up to the context cleaner so that shuffles that are
   garbage collected are detected, and the respective executors not held
   unnecessarily.
   
   Testing done with added unit tests, and also with TPC-DS workloads on
   YARN without a shuffle service.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to