[ 
https://issues.apache.org/jira/browse/YARN-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15489728#comment-15489728
 ] 

Jian He commented on YARN-5637:
-------------------------------

Thanks Arun, some more comments:
- Here, we could make reInitEvent.getResourceSet() be merged with existing 
resourceSet.localizedResource upfront, so that both oldResourceSet and 
newResourceSet contain full copy of resources, rather than delta. Doing this, 
the logic of {{container.resourceSet = 
container.reInitContext.mergedResourceSet();}} will not needed. We can simply 
set it with {{container.resourceSet = reInitContext.newResoureSet}}, similar to 
what’s being done for {{container.launchContext = 
reInitContext.newLaunchContext}}
{code}
return new ReInitializationContext(reInitEvent.getReInitLaunchContext(),
    reInitEvent.getResourceSet(), container.getLaunchContext(),
    container.resourceSet, reInitEvent.getRetryFailureContext(), 
    reInitEvent.isAutoCommit());

{code}
- nit:  the container.reInitContext!= null check is not needed.
{code}
if (container.reInitContext != null 
    && container.reInitContext.autoCommit) {
{code}

- I found the resourceSet is also not updated when rollback in 
RetryFailureTransition, I also tried some refactoring, may be something like 
below:
{code}
      ContainerRetryContext retryContext = container.containerRetryContext;
      int remainingAttempts = container.remainingRetryAttempts;
      if (container.reInitContext != null) {
        retryContext = container.reInitContext.retryOnFailueContext;
        remainingAttempts = container.reInitContext.retryAttemptsRemaining;
      }

      if (shouldRetry(container.exitCode, retryContext,remainingAttempts)) {
        // TODO state-store operation
        doRelaunch(container, container.remainingRetryAttempts,
            container.containerRetryContext.getRetryInterval());
      } else if (container.canRollback()) {
        // rollback
        container.reInitContext = new ReInitializationContext(
            container.reInitContext.oldLaunchContext,
            container.reInitContext.oldResourceSet, null, null,
            container.containerRetryContext, true);
        new KilledExternallyForReInitTransition().transition(container, event);
      } else {
        // fail
        new ExitedWithFailureTransition(true).transition(container, event);
        return ContainerState.EXITED_WITH_FAILURE;
      }
    }

  public static boolean shouldRetry(int errorCode,
      ContainerRetryContext retryContext, int remainingRetryAttempts) {
    if (retryContext == null) {
      return false;
    }
  .....
{code}

- testContainerUpgradeRollbackDueToFailure: comment does not match code
{code}
    // Wait for new processStartfile to be created
    while (!oldStartFile.exists() && timeoutSecs++ < 20) {
{code}

> Changes in NodeManager to support Container upgrade and rollback/commit
> -----------------------------------------------------------------------
>
>                 Key: YARN-5637
>                 URL: https://issues.apache.org/jira/browse/YARN-5637
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-5637.001.patch, YARN-5637.002.patch
>
>
> YARN-5620 added support for re-initialization of Containers using a new 
> launch Context.
> This JIRA proposes to use the above feature to support upgrade and subsequent 
> rollback or commit of the upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to