[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759081#comment-13759081
 ] 

Jason Lowe commented on YARN-540:
---------------------------------

bq. Once work-preserving restart is implemented, this jira should not be a 
problem as there's no notion of relaunching a new AM in work-preserving 
restart, the old AM will just spin and resync with RM after RM restarts.

I'm still a bit confused as to why work-preserving restart matters here.  Most 
AMs are simply going to cleanup and leave after unregistering with the RM, 
since that's normally a terminal call for the AM-RM protocol.  If AMs are now 
required to poll as described in (2) only then does work-preserving restart 
seem to help here, but that relies on a behavior change in the AM.  Is that 
behavior change being implemented in the YARN API layer?
                
> Race condition causing RM to potentially relaunch already unregistered AMs on 
> RM restart
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-540
>                 URL: https://issues.apache.org/jira/browse/YARN-540
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
> YARN-540.patch, YARN-540.patch
>
>
> When job succeeds and successfully call finishApplicationMaster, RM shutdown 
> and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
> next time RM comes back, it will reload the existing state files even though 
> the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to