[ https://issues.apache.org/jira/browse/YARN-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kumar Vavilapalli resolved YARN-1373. ------------------------------------------- Resolution: Duplicate Assignee: Omkar Vinit Joshi (was: Anubhav Dhoot) bq. Currently the RM moves recovered app attempts to the a terminal recovered state and starts a new attempt. This is no longer an issue - never been since YARN-1210. Even in non-work-preserving RM restart, RM explicitly never kills the AMs, it's the nodes that kill all containers - this was done in YARN-1210. The state-machines are already setup correctly and so no changes are needed here. Closing as duplicate of YARN-1210. > Transition RMApp and RMAppAttempt state to RUNNING after restart for > recovered running apps > ------------------------------------------------------------------------------------------- > > Key: YARN-1373 > URL: https://issues.apache.org/jira/browse/YARN-1373 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Bikas Saha > Assignee: Omkar Vinit Joshi > > Currently the RM moves recovered app attempts to the a terminal recovered > state and starts a new attempt. Instead, it will have to transition the last > attempt to a running state such that it can proceed as normal once the > running attempt has resynced with the ApplicationMasterService (YARN-1365 and > YARN-1366). If the RM had started the application container before dying then > the AM would be up and trying to contact the RM. The RM may have had died > before launching the container. For this case, the RM should wait for AM > liveliness period and issue a kill container for the stored master container. > It should transition this attempt to some RECOVER_ERROR state and proceed to > start a new attempt. -- This message was sent by Atlassian JIRA (v6.2#6252)