[ 
https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638963#comment-14638963
 ] 

Allen Wittenauer commented on YARN-3965:
----------------------------------------

bq. If we have startup timestamp for NM, the operator could easily fetch it via 
NM webservice and find out which NM didn't restart, and take mannaul action for 
it.

That's an operational anti-pattern: polling thousands of machines like this 
won't scale.  The RM should be able to report when various NMs joined the 
system.

> Add starup timestamp for nodemanager
> ------------------------------------
>
>                 Key: YARN-3965
>                 URL: https://issues.apache.org/jira/browse/YARN-3965
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Hong Zhiguo
>            Assignee: Hong Zhiguo
>            Priority: Minor
>
> We have startup timestamp for RM already, but don't for NM.
> Sometimes cluster operator modified configuration of all nodes and kicked off 
> command to restart all NMs.  He found out it's hard for him to check whether 
> all NMs are restarted.  Actually there's always some NMs didn't restart as he 
> expected, which leads to some error later due to inconsistent configuration.
> If we have startup timestamp for NM,  the operator could easily fetch it via 
> NM webservice and find out which NM didn't restart, and take mannaul action 
> for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to