[ 
https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462375#comment-13462375
 ] 

Thomas Graves commented on YARN-128:
------------------------------------

{quote}
{quote}
    What about AM's that completed during restart. Re-running them should be a 
no-op.
{quote}
AMs should not finish themselves while the RM is down or recovering. They 
should just spin.
{quote}
Doesn't the RM still need to handle this.  The client could stop the AM at any 
point by talking directly to it.  Or since anyone can write an AM it could 
simply finish on its own. Or perhaps timing issue on app finish. How does the 
RM tell the difference?  We can have the MR client/AM handle this nicely but 
even then there could be a bug or expiry after so long.  So perhaps if the AM 
is down it doesn't get restarted?  Thats probably not ideal if app happens to 
go down at the same time as the RM though - like a rack gets rebooted or 
something, but otherwise you have to handle all the restart issues, like Bobby 
mentioned above.


                
> Resurrect RM Restart 
> ---------------------
>
>                 Key: YARN-128
>                 URL: https://issues.apache.org/jira/browse/YARN-128
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM 
> refactor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to