[
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312813#comment-15312813
]
Jian He commented on YARN-1815:
-------------------------------
Thanks Subru ! looks good overall, few comments on the patch:
- Because of the change, we only have one target state which is FINAL_SAVING,
so we can change AMUnregisteredTransition to not inherit MultipleArcTransition,
use BaseTransition.
{code}
.addTransition(RMAppAttemptState.RUNNING,
EnumSet.of(RMAppAttemptState.FINAL_SAVING, RMAppAttemptState.FINISHED),
{code}
- I think below is what we can do in the AMUnregisteredTransition
{code}
if (appAttempt.getSubmissionContext().getUnmanagedAM()) {
// YARN-1815: Saving the attempt final state so that we do not recover
// the finished Unmanaged AM post RM failover
// Unmanaged AMs have no container to wait for, so they skip
// the FINISHING state and go straight to FINISHED.
appAttempt.rememberTargetTransitionsAndStoreState(event,
new AMFinishedAfterFinalSavingTransition(event),
RMAppAttemptState.FINISHED, RMAppAttemptState.FINISHED);
} else {
{code}
- Test case: could you also continue testing that the Unmanaged AM after
restart runs successfully and restart RM one more time, making sure the
unmanned AM is not re-run.
> Work preserving recovery of Unmanged AMs
> ----------------------------------------
>
> Key: YARN-1815
> URL: https://issues.apache.org/jira/browse/YARN-1815
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.3.0
> Reporter: Karthik Kambatla
> Assignee: Subru Krishnan
> Priority: Critical
> Attachments: Unmanaged AM recovery.png, YARN-1815-v3.patch,
> YARN-1815-v4.patch, YARN-1815-v5.patch, yarn-1815-1.patch, yarn-1815-2.patch,
> yarn-1815-2.patch
>
>
> Currently work preserving RM restart recovers unmanaged AMs but it has a
> couple of shortcomings - all running containers are killed and completed
> unmanaged AMs are also recovered as we do _not_ record final state for
> unmanaged AMs in the RM StateStore. This JIRA proposes to address both the
> shortcomings so that work preserving unmanaged AM recovery works exactly like
> with managed AMs
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]