[
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759400#comment-13759400
]
Vinod Kumar Vavilapalli commented on YARN-540:
----------------------------------------------
Yes, we missed that. We either
- do a blocking persistence to the state-store *during*
finishApplicationMaster call
- or do the right thing and persist asynchronously, make
finishApplicationMaster to be complete only when RM returns a success state -
i.e. make the behaviour and API change now. Offline, I was trying to avoid this
change, but it doesn't look like we can skip that.
> Race condition causing RM to potentially relaunch already unregistered AMs on
> RM restart
> ----------------------------------------------------------------------------------------
>
> Key: YARN-540
> URL: https://issues.apache.org/jira/browse/YARN-540
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Jian He
> Assignee: Jian He
> Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch,
> YARN-540.patch, YARN-540.patch
>
>
> When job succeeds and successfully call finishApplicationMaster, RM shutdown
> and restart-dispatcher is stopped before it can process REMOVE_APP event. The
> next time RM comes back, it will reload the existing state files even though
> the job is succeeded
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira