[
https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462375#comment-13462375
]
Thomas Graves commented on YARN-128:
------------------------------------
{quote}
{quote}
What about AM's that completed during restart. Re-running them should be a
no-op.
{quote}
AMs should not finish themselves while the RM is down or recovering. They
should just spin.
{quote}
Doesn't the RM still need to handle this. The client could stop the AM at any
point by talking directly to it. Or since anyone can write an AM it could
simply finish on its own. Or perhaps timing issue on app finish. How does the
RM tell the difference? We can have the MR client/AM handle this nicely but
even then there could be a bug or expiry after so long. So perhaps if the AM
is down it doesn't get restarted? Thats probably not ideal if app happens to
go down at the same time as the RM though - like a rack gets rebooted or
something, but otherwise you have to handle all the restart issues, like Bobby
mentioned above.
> Resurrect RM Restart
> ---------------------
>
> Key: YARN-128
> URL: https://issues.apache.org/jira/browse/YARN-128
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.0.0-alpha
> Reporter: Arun C Murthy
> Assignee: Bikas Saha
> Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM
> refactor.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira