[ 
https://issues.apache.org/jira/browse/YARN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956690#comment-13956690
 ] 

Jason Lowe commented on YARN-1888:
----------------------------------

I agree with [~kasha] on this.  A nodemanager coming up on a different port 
isn't necessarily the same nodemanager from a previous instance.  For exampe, 
the minicluster runs multiple nodes on the same host with different ports, so 
if one of these nodes disappears then it will no longer be reported as lost 
with this patch since there are others still running with the same host?

I think the real fix is to run the nodemanager with a non-ephemeral nodemanager 
port specified in yarn-site.xml.  This helps solve a number of issues:

# lost nodes count will be accurate
# a NM that reboots and rejoins the cluster before the RM expires the old 
instance will be correctly recognized as the same NM, and we avoid the RM 
thinking there are really two NMs on the host for up to the NM expiry interval
# attempts to start a subsequent NM on the same host where an NM is already 
running will fail rather than accidentally overcommit the node

> Not add NodeManager to inactiveRMNodes when reboot NodeManager which have 
> different port
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-1888
>                 URL: https://issues.apache.org/jira/browse/YARN-1888
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: zhaoyunjiong
>            Priority: Minor
>         Attachments: YARN-1888.patch
>
>
> When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" 
> inaccurate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to