[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits

Bikas Saha (JIRA) Sat, 04 Jan 2014 09:10:12 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862350#comment-13862350
 ]


Bikas Saha commented on YARN-1490:
----------------------------------

bq. The failed attempt is changed to still receive container events and record 
the finished containers and new attempt is created with the reference of the 
objects of the previous attempt.
This sounds messy. IMO having 2 app attempts objects being active is going to 
be a source of bugs and race conditions. We are better off changing the 
dispatcher related logic to look up the appId of the container, get the current 
attempt of that appId and then route the event to the current event.

> RM should optionally not kill all containers when an ApplicationMaster exits
> ----------------------------------------------------------------------------
>
>                 Key: YARN-1490
>                 URL: https://issues.apache.org/jira/browse/YARN-1490
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Jian He
>         Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch
>
>
> This is needed to enable work-preserving AM restart. Some apps can chose to 
> reconnect with old running containers, some may not want to. This should be 
> an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits

Reply via email to