[
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861093#comment-13861093
]
Jian He commented on YARN-1490:
-------------------------------
- Create a field in AppSubmissionContext to indicate whether to clean the
containers on AM failure or not.
- Copy the data structures(liveContainers etc.) inside
SchedulerApplicationAttempt over in the case that new attempt is recovering the
failed attempt’s scheduler info.
- Similarly, copy the needed data structures(finished Containers etc.) inside
RMAppAttempt over in the case that new attempt is recovering the failed
RMAppAttempt info.
- The failed attempt is changed to still receive container events and record
the finished containers and new attempt is created with the reference of the
objects of the previous attempt.
- The appAttempt data structure inside the schedulers are removed, only use
SchedulerApplication.getCurrentAppAttempt to retrieve the current attempt.
> RM should optionally not kill all containers when an ApplicationMaster exits
> ----------------------------------------------------------------------------
>
> Key: YARN-1490
> URL: https://issues.apache.org/jira/browse/YARN-1490
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Jian He
> Attachments: YARN-1490.1.patch
>
>
> This is needed to enable work-preserving AM restart. Some apps can chose to
> reconnect with old running containers, some may not want to. This should be
> an option.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)