Jason Lowe commented on YARN-3508:

Yes, it's not a cure-all to move the preemption processing to the scheduler 
event queue when the scheduler is the bottleneck, but we do have separate event 
queues for a reason.  If it didn't matter who was the bottleneck then we'd just 
have one event queue for everything, correct?  The scheduler event queue is 
primarily blocked by the big scheduler lock, and IMHO we should dispatch events 
that need that lock to that queue.  Doing otherwise starts to couple the two 
event dispatchers together and we might as well just have the one event queue 
to rule them all.

> Preemption processing occuring on the main RM dispatcher
> --------------------------------------------------------
>                 Key: YARN-3508
>                 URL: https://issues.apache.org/jira/browse/YARN-3508
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager, scheduler
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-3508.002.patch, YARN-3508.01.patch
> We recently saw the RM for a large cluster lag far behind on the 
> AsyncDispacher event queue.  The AsyncDispatcher thread was consistently 
> blocked on the highly-contended CapacityScheduler lock trying to dispatch 
> preemption-related events for RMContainerPreemptEventDispatcher.  Preemption 
> processing should occur on the scheduler event dispatcher thread or a 
> separate thread to avoid delaying the processing of other events in the 
> primary dispatcher queue.

This message was sent by Atlassian JIRA

Reply via email to