[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2641:
-----------------------------------
    Summary: Decommission nodes on -refreshNodes instead of next NM-RM 
heartbeat  (was: improve node decommission latency in RM.)

> Decommission nodes on -refreshNodes instead of next NM-RM heartbeat
> -------------------------------------------------------------------
>
>                 Key: YARN-2641
>                 URL: https://issues.apache.org/jira/browse/YARN-2641
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-2641.000.patch, YARN-2641.001.patch, 
> YARN-2641.002.patch, YARN-2641.003.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).
> This will be a much more serious issue:
> After RM is refreshed (refreshNodes), If the NM to be decommissioned is 
> killed before NM sent heartbeat to RM. The RMNode will never be 
> decommissioned in RM. The RMNode will only expire in RM after  
> "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to