[
https://issues.apache.org/jira/browse/YARN-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kumar Vavilapalli resolved YARN-1373.
-------------------------------------------
Resolution: Duplicate
Assignee: Omkar Vinit Joshi (was: Anubhav Dhoot)
bq. Currently the RM moves recovered app attempts to the a terminal recovered
state and starts a new attempt.
This is no longer an issue - never been since YARN-1210. Even in
non-work-preserving RM restart, RM explicitly never kills the AMs, it's the
nodes that kill all containers - this was done in YARN-1210. The state-machines
are already setup correctly and so no changes are needed here. Closing as
duplicate of YARN-1210.
> Transition RMApp and RMAppAttempt state to RUNNING after restart for
> recovered running apps
> -------------------------------------------------------------------------------------------
>
> Key: YARN-1373
> URL: https://issues.apache.org/jira/browse/YARN-1373
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Bikas Saha
> Assignee: Omkar Vinit Joshi
>
> Currently the RM moves recovered app attempts to the a terminal recovered
> state and starts a new attempt. Instead, it will have to transition the last
> attempt to a running state such that it can proceed as normal once the
> running attempt has resynced with the ApplicationMasterService (YARN-1365 and
> YARN-1366). If the RM had started the application container before dying then
> the AM would be up and trying to contact the RM. The RM may have had died
> before launching the container. For this case, the RM should wait for AM
> liveliness period and issue a kill container for the stored master container.
> It should transition this attempt to some RECOVER_ERROR state and proceed to
> start a new attempt.
--
This message was sent by Atlassian JIRA
(v6.2#6252)