lwwmanning opened a new pull request #24083: [SPARK-24432] Support dynamic 
allocation without external shuffle service
URL: https://github.com/apache/spark/pull/24083
 
 
   ## What changes were proposed in this pull request?
   
   This PR adds a limited version of dynamic allocation that does not require 
the external shuffle service, and thus works on kubernetes (but does not 
support preemption). 
   
   The basic approach is to track which executors are holding shuffle files, 
and of those, which have shuffle files depended on by active stages. If an 
executor contains shuffle files depended on by an active stage, then we treat 
it as "active" (i.e., prevent the ExecutorAllocationManager from marking it as 
"idle"). If an executor contains only shuffle files that are not dependencies 
of active stages, then we treat those shuffle files similarly to cached data 
(i.e., configurable idle timeout that defaults to "infinity"). 
   
   We also introduce the concept of "shuffle biased task scheduling", a 
heuristic attempt to schedule tasks for maximal efficacy of dynamic allocation. 
We do this by attempting to minimize the number of executors that contain 
(active) shuffle files, by packing as many tasks as possible onto "active" 
executors first, followed by scheduling them on executors with only inactive 
shuffle files, and finally all remaining executors.
   
   This is a port of a series of PRs on Palantir's spark fork:
   
   [Support dynamic allocation without external shuffle 
service](https://github.com/palantir/spark/pull/427)
   [Fix dynamic allocation with external shuffle 
service](https://github.com/palantir/spark/pull/445)
   [Track active shuffle by stage](https://github.com/palantir/spark/pull/446)
   [Shuffle biased task scheduling](https://github.com/palantir/spark/pull/447)
   
   cc: @rynorris @mcheah @robert3005
   
   ## How was this patch tested?
   
   We added additional tests explicitly as part of the PR, and did additional 
manual testing on small YARN and k8s clusters (partially documented on [Track 
active shuffle by stage](https://github.com/palantir/spark/pull/446)). Then we 
successfully rolled this out for a small subset of workloads in production at 
Palantir, running entirely on kubernetes.
   
   Please review http://spark.apache.org/contributing.html before opening a 
pull request.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to