[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948810#comment-13948810 ]
Bikas Saha commented on YARN-1815: ---------------------------------- I am not up to date with the latest state of the code. The original restart code used to save all AM info into the state store - managed and unmanaged. Upon recovery, the unmanaged AM was explicitly discarded since all AM's were asked to restart after RM recovery. The plan was that when AM's will be asked to resync instead of restart then the unmanaged AM would not be discarded on recovery. It would go through the same flow as other managed AM's. Its AM would ping the RM and be asked to resync just like other managed AM's. So the flow of unmanaged AM's would be identical to the flow of the managed AM's. This comment is mainly towards running unmanaged AM's. Unmanaged AM's that have already finished before RM restart should be handled already since their completion information will tell the recovered RM that they are done. bq. by recording unmanaged AMs also when they finish and use that information for recovery Is completion information for unmanaged AM's not saved currently? We should be saving that information so that the flow is identical to managed AM's. > RM doesn't recover unmanaged AMs into its memory after restart > -------------------------------------------------------------- > > Key: YARN-1815 > URL: https://issues.apache.org/jira/browse/YARN-1815 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Affects Versions: 2.3.0 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Priority: Critical > Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, > yarn-1815-2.patch, yarn-1815-2.patch > > > RM doesn't recover unmanaged AMs into its memory after restart -- This message was sent by Atlassian JIRA (v6.2#6252)