[
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948810#comment-13948810
]
Bikas Saha commented on YARN-1815:
----------------------------------
I am not up to date with the latest state of the code. The original restart
code used to save all AM info into the state store - managed and unmanaged.
Upon recovery, the unmanaged AM was explicitly discarded since all AM's were
asked to restart after RM recovery. The plan was that when AM's will be asked
to resync instead of restart then the unmanaged AM would not be discarded on
recovery. It would go through the same flow as other managed AM's. Its AM would
ping the RM and be asked to resync just like other managed AM's. So the flow of
unmanaged AM's would be identical to the flow of the managed AM's. This comment
is mainly towards running unmanaged AM's. Unmanaged AM's that have already
finished before RM restart should be handled already since their completion
information will tell the recovered RM that they are done.
bq. by recording unmanaged AMs also when they finish and use that information
for recovery
Is completion information for unmanaged AM's not saved currently? We should be
saving that information so that the flow is identical to managed AM's.
> RM doesn't recover unmanaged AMs into its memory after restart
> --------------------------------------------------------------
>
> Key: YARN-1815
> URL: https://issues.apache.org/jira/browse/YARN-1815
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.3.0
> Reporter: Karthik Kambatla
> Assignee: Karthik Kambatla
> Priority: Critical
> Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch,
> yarn-1815-2.patch, yarn-1815-2.patch
>
>
> RM doesn't recover unmanaged AMs into its memory after restart
--
This message was sent by Atlassian JIRA
(v6.2#6252)