[
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168333#comment-14168333
]
Advertising
Karthik Kambatla commented on YARN-2641:
----------------------------------------
Patch looks good to me, except for the following:
- Are the changes to ResourceTrackerService#registerNodeManager required?
NodeListManager#isValidNode is synchronized on hostsReader and that should be
sufficient. No?
> improve node decommission latency in RM.
> ----------------------------------------
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Affects Versions: 2.5.0
> Reporter: zhihai xu
> Assignee: zhihai xu
> Attachments: YARN-2641.000.patch, YARN-2641.001.patch
>
>
> improve node decommission latency in RM.
> Currently the node decommission only happened after RM received nodeHeartbeat
> from the Node Manager. The node heartbeat interval is configurable. The
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager)
> instead of nodeHeartbeat(ResourceTrackerService).
> This will be a much more serious issue:
> After RM is refreshed (refreshNodes), If the NM to be decommissioned is
> killed before NM sent heartbeat to RM. The RMNode will never be
> decommissioned in RM. The RMNode will only expire in RM after
> "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)