[ 
https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573244#comment-13573244
 ] 

Siddharth Seth commented on YARN-365:
-------------------------------------

This isn't very different from configuring all nodes to have a higher heartbeat 
interval. With a high heartbeat interval, the NM would send a batch of updates 
over to the RM, and this heartbeat would trigger a scheduling pass.

This change de-links RM scheduling passes from NM heartbeats. The NM can 
continue to provide node updates with a smaller interval, and the RM handles 
these, along with a scheduling pass, as and when it chooses to. In this 
particular case, the scheduler queue ends up with a single scheduling event per 
node - but will attempt a scheduling run only on the next heartbeat from that 
node. At a later point, the scheduling could be changed to be triggered by the 
arrival of a new application - or to just run in a tight loop.

If the scheduler cannot keep up, it ends up scheduling as fast as it can - 
without node heartbeats affecting the queue size. Also, completed container 
information from heartbeats is processed earlier (instead of waiting for the 
event in the queue to be processed) - making each scheduler pass more efficient.

bq. I can see cases where the all at once is actually worse as it will spend 
more time on a single heartbeat and potentially not get to other things in the 
queue like apps added as fast. 
The event should not be delayed more than the time required to complete one 
scheduling pass across all nodes. I don't think this will be much better in the 
case of a growing scheduler queue.

bq. The only way I can see this being beneficial is if we can aggregate the 
heartbeats and have the scheduler process less.
Do you mean somehow aggregating heartbeats across nodes ? This approach does 
aggregate heartbeats for a single node.
                
> Each NM heartbeat should not generate and event for the Scheduler
> -----------------------------------------------------------------
>
>                 Key: YARN-365
>                 URL: https://issues.apache.org/jira/browse/YARN-365
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager, scheduler
>    Affects Versions: 0.23.5
>            Reporter: Siddharth Seth
>            Assignee: Xuan Gong
>         Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, 
> YARN-365.2.patch, YARN-365.3.patch
>
>
> Follow up from YARN-275
> https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to