[ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759081#comment-13759081 ]
Jason Lowe commented on YARN-540: --------------------------------- bq. Once work-preserving restart is implemented, this jira should not be a problem as there's no notion of relaunching a new AM in work-preserving restart, the old AM will just spin and resync with RM after RM restarts. I'm still a bit confused as to why work-preserving restart matters here. Most AMs are simply going to cleanup and leave after unregistering with the RM, since that's normally a terminal call for the AM-RM protocol. If AMs are now required to poll as described in (2) only then does work-preserving restart seem to help here, but that relies on a behavior change in the AM. Is that behavior change being implemented in the YARN API layer? > Race condition causing RM to potentially relaunch already unregistered AMs on > RM restart > ---------------------------------------------------------------------------------------- > > Key: YARN-540 > URL: https://issues.apache.org/jira/browse/YARN-540 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Jian He > Assignee: Jian He > Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, > YARN-540.patch, YARN-540.patch > > > When job succeeds and successfully call finishApplicationMaster, RM shutdown > and restart-dispatcher is stopped before it can process REMOVE_APP event. The > next time RM comes back, it will reload the existing state files even though > the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira