[ 
https://issues.apache.org/jira/browse/YARN-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1373.
-------------------------------------------

    Resolution: Duplicate
      Assignee: Omkar Vinit Joshi  (was: Anubhav Dhoot)

bq. Currently the RM moves recovered app attempts to the a terminal recovered 
state and starts a new attempt.
This is no longer an issue - never been since YARN-1210. Even in 
non-work-preserving RM restart, RM explicitly never kills the AMs, it's the 
nodes that kill all containers - this was done in YARN-1210. The state-machines 
are already setup correctly and so no changes are needed here. Closing as 
duplicate of YARN-1210.

> Transition RMApp and RMAppAttempt state to RUNNING after restart for 
> recovered running apps
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-1373
>                 URL: https://issues.apache.org/jira/browse/YARN-1373
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Omkar Vinit Joshi
>
> Currently the RM moves recovered app attempts to the a terminal recovered 
> state and starts a new attempt. Instead, it will have to transition the last 
> attempt to a running state such that it can proceed as normal once the 
> running attempt has resynced with the ApplicationMasterService (YARN-1365 and 
> YARN-1366). If the RM had started the application container before dying then 
> the AM would be up and trying to contact the RM. The RM may have had died 
> before launching the container. For this case, the RM should wait for AM 
> liveliness period and issue a kill container for the stored master container. 
> It should transition this attempt to some RECOVER_ERROR state and proceed to 
> start a new attempt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to