[ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748024#comment-13748024 ]
Vinod Kumar Vavilapalli commented on YARN-540: ---------------------------------------------- I think we should fix it the right way. And after things like changes in MAPREDUCE-5476, AM's spending time in FINISHING state is going to be more common. So, I am +1 to Bikas's proposal (2). But in the interesting of not making incompatible changes, let's do the following: - Let's change FinishApplicationMasterResponse to also container a response-completed field. If it is true, it means that RM has finished the finalization of AM, otherwise, AM is supposed to retry till it becomes true - Let RM do the state-store changes asynchronously. It'll still be a behavior change, but clients which don't follow the multi-step unregister will risk only getting restarted. > RM state store not cleaned if job succeeds but RM shutdown and > restart-dispatcher stopped before it can process REMOVE_APP event > -------------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-540 > URL: https://issues.apache.org/jira/browse/YARN-540 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Jian He > Assignee: Jian He > Priority: Blocker > Attachments: YARN-540.patch, YARN-540.patch > > > When job succeeds and successfully call finishApplicationMaster, RM shutdown > and restart-dispatcher is stopped before it can process REMOVE_APP event. The > next time RM comes back, it will reload the existing state files even though > the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira