[
https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573244#comment-13573244
]
Siddharth Seth commented on YARN-365:
-------------------------------------
This isn't very different from configuring all nodes to have a higher heartbeat
interval. With a high heartbeat interval, the NM would send a batch of updates
over to the RM, and this heartbeat would trigger a scheduling pass.
This change de-links RM scheduling passes from NM heartbeats. The NM can
continue to provide node updates with a smaller interval, and the RM handles
these, along with a scheduling pass, as and when it chooses to. In this
particular case, the scheduler queue ends up with a single scheduling event per
node - but will attempt a scheduling run only on the next heartbeat from that
node. At a later point, the scheduling could be changed to be triggered by the
arrival of a new application - or to just run in a tight loop.
If the scheduler cannot keep up, it ends up scheduling as fast as it can -
without node heartbeats affecting the queue size. Also, completed container
information from heartbeats is processed earlier (instead of waiting for the
event in the queue to be processed) - making each scheduler pass more efficient.
bq. I can see cases where the all at once is actually worse as it will spend
more time on a single heartbeat and potentially not get to other things in the
queue like apps added as fast.
The event should not be delayed more than the time required to complete one
scheduling pass across all nodes. I don't think this will be much better in the
case of a growing scheduler queue.
bq. The only way I can see this being beneficial is if we can aggregate the
heartbeats and have the scheduler process less.
Do you mean somehow aggregating heartbeats across nodes ? This approach does
aggregate heartbeats for a single node.
> Each NM heartbeat should not generate and event for the Scheduler
> -----------------------------------------------------------------
>
> Key: YARN-365
> URL: https://issues.apache.org/jira/browse/YARN-365
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager, scheduler
> Affects Versions: 0.23.5
> Reporter: Siddharth Seth
> Assignee: Xuan Gong
> Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch,
> YARN-365.2.patch, YARN-365.3.patch
>
>
> Follow up from YARN-275
> https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira