Arun Suresh commented on YARN-5609:

bq. User anyway has to provide the full set of resources via the reInit API if 
they want to rollback to older than last. what do you think...
You mean rollback to a version before previous ?... Agreed

So just to clarify, the ReInit behavior as per the latest patch (v007) is:

# Container C1 currently has resources (*a* and *b*) symlinked to (*s1* and 
# The reInitialization req. comes in with autocommit = false and which asks for 
resource *c* symlinked to *s1*
** The ReInitContext created will have _newResourceSet_ = (c) and 
_oldResourceSet_ = Clone of current resourceset (which contains a and b, but it 
is moved to the pending list) 
# Container moves into REINITIALIZING state while *c* is being localized.
# Once *c* is localized, container gets a RESOURCE_LOCALIZED event. Since there 
are no more pending resources, in the _ResourceLocalizedWhileReInitTransition_, 
a CLEANUP_CONTAINER_FOR_REINIT is sent to the launcher.
# The CLEANUP_CONTAINER_FOR_REINIT is handled by the _ContainerLaunch_ which 
kills the current process and sends a CONTAINER_KILLED_ON_REQUEST to the 
# This moves the Container to LOCALIZED state and invokes the 
_KilledForReInitializationTransition_ which does a 
{{container.sendLaunchEvent()}} which restarts the process with the new launch 
context and resourceSet.

AutoRollback (in case of container failure after re-init):
# In the _RetryFailureTransition_, we decide the container has exhausted its 
retries, and we decide to rollback. Unlike normal Reinit, the container is NOT 
running now.. 
# Since the oldResourceSet is a clone of the previous state, with *a* and *b* 
still in pending, C1 sends LOCALIZE_CONTAINER_RESOURCES event to localizer and 
moves to REINITIALIZING state to wait for response.
# Once *a* and *b* has been verified (Currently its is not re-localized.. as 
you pointed out... but the Tracker still checks if the resource exits) 
RESOURCE_LOCALIZED is sent to the Container.
# Unlike the normal reInit case, since the process is already dead, I cannot 
send a CLEANUP_CONTAINER_FOR_REINIT to the ContainerLaunch (since it does not 
exit) so I send the CONTAINER_KILLED_ON_REQUEST to the container.. which is 
handled by container and proceeds as steps 5 and 6 above.

Explicit Rollback (AM invoked) is Exactly like the normal ReInit case, but 
autoCommit is true, so no rollback state is saved (since you should not be able 
to rollback a rollback)

Restart is also similar to Reinit except that we first check if a rollback 
state exists.. if so, it just copies the reference of the oldResourceSet and 
oldLaunchcontext from the existing reinitContext to the new reInitContext. No 
resourceSet cloning takes place here (since the oldResourceSet is already a 

Let me know if the above is fine...

> Expose upgrade and restart API in ContainerManagementProtocol
> -------------------------------------------------------------
>                 Key: YARN-5609
>                 URL: https://issues.apache.org/jira/browse/YARN-5609
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-5609.001.patch, YARN-5609.002.patch, 
> YARN-5609.003.patch, YARN-5609.004.patch, YARN-5609.005.patch, 
> YARN-5609.006.patch, YARN-5609.007.patch
> YARN-5620 and YARN-5637 allows an AM to explicitly *upgrade* a container with 
> a new launch context and subsequently *rollback* / *commit* the change on the 
> Container. This can also be used to simply *restart* the Container as well. 
> This JIRA proposes to extend the ContainerManagementProtocol with the 
> following API:
> * *reInitializeContainer*
> * *rollbackLastUpgrade*
> * *commitLastUpgrade*
> * *restartContainer*

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to