[
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170235#comment-14170235
]
Hudson commented on YARN-2641:
------------------------------
FAILURE: Integrated in Hadoop-trunk-Commit #6254 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/6254/])
YARN-2641. Decommission nodes on -refreshNodes instead of next NM-RM heartbeat.
(Zhihai Xu via kasha) (kasha: rev da709a2eac7110026169ed3fc4d0eaf85488d3ef)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* hadoop-yarn-project/CHANGES.txt
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java
> Decommission nodes on -refreshNodes instead of next NM-RM heartbeat
> -------------------------------------------------------------------
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Affects Versions: 2.5.0
> Reporter: zhihai xu
> Assignee: zhihai xu
> Fix For: 2.7.0
>
> Attachments: YARN-2641.000.patch, YARN-2641.001.patch,
> YARN-2641.002.patch, YARN-2641.003.patch
>
>
> improve node decommission latency in RM.
> Currently the node decommission only happened after RM received nodeHeartbeat
> from the Node Manager. The node heartbeat interval is configurable. The
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager)
> instead of nodeHeartbeat(ResourceTrackerService).
> This will be a much more serious issue:
> After RM is refreshed (refreshNodes), If the NM to be decommissioned is
> killed before NM sent heartbeat to RM. The RMNode will never be
> decommissioned in RM. The RMNode will only expire in RM after
> "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)