[
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Advertising
zhihai xu updated YARN-2641:
----------------------------
Attachment: YARN-2641.003.patch
> improve node decommission latency in RM.
> ----------------------------------------
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Affects Versions: 2.5.0
> Reporter: zhihai xu
> Assignee: zhihai xu
> Attachments: YARN-2641.000.patch, YARN-2641.001.patch,
> YARN-2641.002.patch, YARN-2641.003.patch
>
>
> improve node decommission latency in RM.
> Currently the node decommission only happened after RM received nodeHeartbeat
> from the Node Manager. The node heartbeat interval is configurable. The
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager)
> instead of nodeHeartbeat(ResourceTrackerService).
> This will be a much more serious issue:
> After RM is refreshed (refreshNodes), If the NM to be decommissioned is
> killed before NM sent heartbeat to RM. The RMNode will never be
> decommissioned in RM. The RMNode will only expire in RM after
> "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)