[ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864571#comment-13864571
 ] 

Jian He commented on YARN-1490:
-------------------------------

bq. the list of containers that failed during the outage. List<Container> 
completedContainers.
RMAppImpl.AttemptFailedTransition transition is retrieving those.
 bq. the list of the container allocations List<Container> liveContainers.
SchedulerApplicationAttempt.recover()

Beyond this patch, there's more AM protocol change patch, I have a local patch 
and will upload once this gets in. 

> RM should optionally not kill all containers when an ApplicationMaster exits
> ----------------------------------------------------------------------------
>
>                 Key: YARN-1490
>                 URL: https://issues.apache.org/jira/browse/YARN-1490
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Jian He
>         Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch
>
>
> This is needed to enable work-preserving AM restart. Some apps can chose to 
> reconnect with old running containers, some may not want to. This should be 
> an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to