Ngone51 opened a new pull request #24010: [SPARK-26439][CORE][WIP] Introduce 
WorkerOffer reservation mechanism for Barrier TaskSet
URL: https://github.com/apache/spark/pull/24010
 
 
   ## What changes were proposed in this pull request?
   
   Currently, Barrier TaskSet has a hard requirement that tasks can only be 
launched
   in a single resourceOffers round with enough slots(or sufficient resources), 
but
   can not be guaranteed even if with enough slots due to task locality delay 
scheduling.
   So, it is very likely that Barrier TaskSet gets a chunk of sufficient 
resources after
   all the trouble, but let it go easily just because one of pending tasks can 
not be
   scheduled. Futhermore, it causes severe resource competition between 
TaskSets and jobs
   and introduce unclear semantic for DynamicAllocation.
   
   This pr trys to introduce WorkerOffer reservation mechanism for Barrier 
TaskSet, which
   allows Barrier TaskSet to reserve WorkerOffer in each resourceOffers round, 
and launch
   tasks at the same time once it accumulate the sufficient resource. In this 
way, we
   relax the requirement of resources for the Barrier TaskSet.
   
   Besides, we have two features along with WorkerOffer reservation mechanism:
   
   To avoid the deadlock which may be introuduced by serveral Barrier TaskSets 
holding the reserved WorkerOffers for a long time, we'll ask Barrier TaskSets 
to force releasing part of reserved WorkerOffers
   on demand. So, it is highly possible that each Barrier TaskSet would be 
launched in the end.
   
   Barrier TaskSet could replace old high level locality reserved WorkerOffer 
with new low level locality WorkerOffer during the time it wating for 
sufficient resources, to perform better locality at the end.
   
   To integrate with DynamicAllocation:
   
   The possible effective way I can imagine is that adding new event, e.g.
   ExecutorReservedEvent, ExecutorReleasedEvent, which behaved like busy 
executor with
   running tasks or idle executor without running tasks. Thus, 
ExecutorAllocationManager
   would not let the executor go if it reminds of there're some reserved 
resource on that
   executor.
   
   ## How was this patch tested?
   
   existed and added some, needs to add more.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to