[ 
https://issues.apache.org/jira/browse/YARN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaoyunjiong reopened YARN-1888:
--------------------------------


The problem here is our cluster use port 0, but when restart NodeManager, the 
"Lost Nodes" became inaccurate:
Host A have a NodeManager with ID: $HOSTA:$PORTA,
after restart, the NodeManager now with ID: $HOSTA:$PORTB,
since the ID changed, so ResourceManager didn't think it is a reconnected 
NodeManager.
Then few minutes later, NodeManager $HOSTA:$PORTA expired, and marked as LOST.
This make people confused, at first I don't think it is a bug too, but after 
few peoples asked me why there are so many nodes LOST, then I come up with this 
simple patch: if there is already another NodeManager in the same node (in real 
production cluster, I don't think people will start more than one NodeManager 
on one machine), then don't mark expired NodeManager as LOST.





> Not add NodeManager to inactiveRMNodes when reboot NodeManager which have 
> different port
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-1888
>                 URL: https://issues.apache.org/jira/browse/YARN-1888
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: zhaoyunjiong
>            Priority: Minor
>         Attachments: YARN-1888.patch
>
>
> When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" 
> inaccurate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to