[
https://issues.apache.org/jira/browse/YARN-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996513#comment-14996513
]
Jun Gong commented on YARN-2047:
--------------------------------
Sorry for the late reply.
The issue aims to make sure that a lost NM's containers are marked expired by
the RM even across RM restart. What I said aims to solve the problem it caused
in another way. Any thought?
{quote}
If this is a required action then it would also imply that saving a such nodes
would be a critical state change operation. So, e.g. decommission command from
the admin should not complete until the store has been updated. Is that the
case?
{quote}
Yes, it is. However the store process is often very fast, it might be
acceptable.
> RM should honor NM heartbeat expiry after RM restart
> ----------------------------------------------------
>
> Key: YARN-2047
> URL: https://issues.apache.org/jira/browse/YARN-2047
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Bikas Saha
>
> After the RM restarts, it forgets about existing NM's (and their potentially
> decommissioned status too). After restart, the RM cannot maintain the
> contract to the AM's that a lost NM's containers will be marked finished
> within the expiry time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)