[
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864571#comment-13864571
]
Jian He commented on YARN-1490:
-------------------------------
bq. the list of containers that failed during the outage. List<Container>
completedContainers.
RMAppImpl.AttemptFailedTransition transition is retrieving those.
bq. the list of the container allocations List<Container> liveContainers.
SchedulerApplicationAttempt.recover()
Beyond this patch, there's more AM protocol change patch, I have a local patch
and will upload once this gets in.
> RM should optionally not kill all containers when an ApplicationMaster exits
> ----------------------------------------------------------------------------
>
> Key: YARN-1490
> URL: https://issues.apache.org/jira/browse/YARN-1490
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Jian He
> Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch
>
>
> This is needed to enable work-preserving AM restart. Some apps can chose to
> reconnect with old running containers, some may not want to. This should be
> an option.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)