[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart

Jason Lowe (JIRA) Thu, 05 Sep 2013 13:20:48 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759395#comment-13759395
 ]


Jason Lowe commented on YARN-540:
---------------------------------

Ah, after the RM restarts, the NM can notify the RM that the AM container 
exited then that would pretty much fix it.  We'd only have an issue if the NM 
went down at the same time the RM did.  I'm still a bit unclear on the 
specifics for how the RM recovers the container states in work-preserving 
restart, but assuming the NMs report not only active containers but also those 
that have exited since the last successful heartbeat upon RM 
recovery/re-registration then we should be OK.
                
> Race condition causing RM to potentially relaunch already unregistered AMs on 
> RM restart
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-540
>                 URL: https://issues.apache.org/jira/browse/YARN-540
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
> YARN-540.patch, YARN-540.patch
>
>
> When job succeeds and successfully call finishApplicationMaster, RM shutdown 
> and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
> next time RM comes back, it will reload the existing state files even though 
> the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart

Reply via email to