[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167920#comment-14167920
 ] 

Karthik Kambatla commented on YARN-2641:
----------------------------------------

I don't see the 10 mins wait happening either in practice or in my casual 
observation of the code. I can see how we can improve the decommissioning 
latency by the node-heartbeat-interval, but not more. Am I missing something 
here?

> improve node decommission latency in RM.
> ----------------------------------------
>
>                 Key: YARN-2641
>                 URL: https://issues.apache.org/jira/browse/YARN-2641
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-2641.000.patch, YARN-2641.001.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).
> This will be a much more serious issue:
> After RM is refreshed (refreshNodes), If the NM to be decommissioned is 
> killed before NM sent heartbeat to RM. The RMNode will never be 
> decommissioned in RM. The RMNode will only expire in RM after  
> "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to