[
https://issues.apache.org/jira/browse/YARN-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15489728#comment-15489728
]
Jian He commented on YARN-5637:
-------------------------------
Thanks Arun, some more comments:
- Here, we could make reInitEvent.getResourceSet() be merged with existing
resourceSet.localizedResource upfront, so that both oldResourceSet and
newResourceSet contain full copy of resources, rather than delta. Doing this,
the logic of {{container.resourceSet =
container.reInitContext.mergedResourceSet();}} will not needed. We can simply
set it with {{container.resourceSet = reInitContext.newResoureSet}}, similar to
what’s being done for {{container.launchContext =
reInitContext.newLaunchContext}}
{code}
return new ReInitializationContext(reInitEvent.getReInitLaunchContext(),
reInitEvent.getResourceSet(), container.getLaunchContext(),
container.resourceSet, reInitEvent.getRetryFailureContext(),
reInitEvent.isAutoCommit());
{code}
- nit: the container.reInitContext!= null check is not needed.
{code}
if (container.reInitContext != null
&& container.reInitContext.autoCommit) {
{code}
- I found the resourceSet is also not updated when rollback in
RetryFailureTransition, I also tried some refactoring, may be something like
below:
{code}
ContainerRetryContext retryContext = container.containerRetryContext;
int remainingAttempts = container.remainingRetryAttempts;
if (container.reInitContext != null) {
retryContext = container.reInitContext.retryOnFailueContext;
remainingAttempts = container.reInitContext.retryAttemptsRemaining;
}
if (shouldRetry(container.exitCode, retryContext,remainingAttempts)) {
// TODO state-store operation
doRelaunch(container, container.remainingRetryAttempts,
container.containerRetryContext.getRetryInterval());
} else if (container.canRollback()) {
// rollback
container.reInitContext = new ReInitializationContext(
container.reInitContext.oldLaunchContext,
container.reInitContext.oldResourceSet, null, null,
container.containerRetryContext, true);
new KilledExternallyForReInitTransition().transition(container, event);
} else {
// fail
new ExitedWithFailureTransition(true).transition(container, event);
return ContainerState.EXITED_WITH_FAILURE;
}
}
public static boolean shouldRetry(int errorCode,
ContainerRetryContext retryContext, int remainingRetryAttempts) {
if (retryContext == null) {
return false;
}
.....
{code}
- testContainerUpgradeRollbackDueToFailure: comment does not match code
{code}
// Wait for new processStartfile to be created
while (!oldStartFile.exists() && timeoutSecs++ < 20) {
{code}
> Changes in NodeManager to support Container upgrade and rollback/commit
> -----------------------------------------------------------------------
>
> Key: YARN-5637
> URL: https://issues.apache.org/jira/browse/YARN-5637
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Arun Suresh
> Assignee: Arun Suresh
> Attachments: YARN-5637.001.patch, YARN-5637.002.patch
>
>
> YARN-5620 added support for re-initialization of Containers using a new
> launch Context.
> This JIRA proposes to use the above feature to support upgrade and subsequent
> rollback or commit of the upgrade.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]