[
https://issues.apache.org/jira/browse/YARN-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543310#comment-13543310
]
Xuan Gong commented on YARN-270:
--------------------------------
break the sub-task 275 Make NodeManagers to NOT blindly heartbeat irrespective
of whether previous heartbeat is processed or not. to smaller task.
1. Make RM provide heartbeat interval to NM
2. RM changes to handle NM heartbeat during overload.
> RM scheduler event handler thread gets behind
> ---------------------------------------------
>
> Key: YARN-270
> URL: https://issues.apache.org/jira/browse/YARN-270
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 0.23.5
> Reporter: Thomas Graves
> Assignee: Thomas Graves
>
> We had a couple of incidents on a 2800 node cluster where the RM scheduler
> event handler thread got behind processing events and basically become
> unusable. It was still processing apps, but taking a long time (1 hr 45
> minutes) to accept new apps. this actually happened twice within 5 days.
> We are using the capacity scheduler and at the time had between 400 and 500
> applications running. There were another 250 apps that were in the SUBMITTED
> state in the RM but the scheduler hadn't processed those to put in pending
> state yet. We had about 15 queues none of them hierarchical. We also had
> plenty of space lefts on the cluster.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira