[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits

Jian He (JIRA) Thu, 02 Jan 2014 17:56:29 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861093#comment-13861093
 ]


Jian He commented on YARN-1490:
-------------------------------

- Create a field in AppSubmissionContext to indicate whether to clean the 
containers on AM failure or not.
- Copy the data structures(liveContainers etc.) inside 
SchedulerApplicationAttempt over in the case that new attempt is recovering the 
failed attempt’s scheduler info.
- Similarly, copy the needed data structures(finished Containers etc.) inside 
RMAppAttempt over in the case that new attempt is recovering the failed 
RMAppAttempt info.
- The failed attempt is changed to still receive container events and record 
the finished containers and new attempt is created with the reference of the 
objects of the previous attempt.
- The appAttempt data structure inside the schedulers are removed, only use 
SchedulerApplication.getCurrentAppAttempt to retrieve the current attempt.

> RM should optionally not kill all containers when an ApplicationMaster exits
> ----------------------------------------------------------------------------
>
>                 Key: YARN-1490
>                 URL: https://issues.apache.org/jira/browse/YARN-1490
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Jian He
>         Attachments: YARN-1490.1.patch
>
>
> This is needed to enable work-preserving AM restart. Some apps can chose to 
> reconnect with old running containers, some may not want to. This should be 
> an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits

Reply via email to