[ https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608746#comment-14608746 ]
Varun Saxena commented on YARN-3508: ------------------------------------ [~leftnoteasy], yes this is what the patch does and addresses the issue in hand. I thought you want me to put events directly to Scheduler Event Queue and hence bypass Central RM Dispatcher Queue. Anyways just to explain, below is what I have done. I have removed {{ContainerPreemptEventType}} and moved all these events to {{SchedulerEventType}}. I have also removed {{RMContainerPreemptEventDispatcher}} because that processes the events synchronously as part of RMDispatcher Event Thread. As per patch the events from {{ProportionalCapacityPreemptionPolicy}} are sent to central RM Dispatcher and as the event type is {{SchedulerEventType}}, the preemption events are sent to scheduler event queue. Then the Scheduler Dispatcher picks up these events from scheduler event queue and call {{CapacityScheduler#handle}} which then calls the relevant method to handle different preemption events. I added a test case too for this behavior by adding a new test class {{TestRMDispatcher}}. > Preemption processing occuring on the main RM dispatcher > -------------------------------------------------------- > > Key: YARN-3508 > URL: https://issues.apache.org/jira/browse/YARN-3508 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler > Affects Versions: 2.6.0 > Reporter: Jason Lowe > Assignee: Varun Saxena > Attachments: YARN-3508.002.patch, YARN-3508.01.patch, > YARN-3508.03.patch, YARN-3508.04.patch > > > We recently saw the RM for a large cluster lag far behind on the > AsyncDispacher event queue. The AsyncDispatcher thread was consistently > blocked on the highly-contended CapacityScheduler lock trying to dispatch > preemption-related events for RMContainerPreemptEventDispatcher. Preemption > processing should occur on the scheduler event dispatcher thread or a > separate thread to avoid delaying the processing of other events in the > primary dispatcher queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)