[ https://issues.apache.org/jira/browse/YARN-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510384#comment-15510384 ]
Arun Suresh commented on YARN-5609: ----------------------------------- bq. User anyway has to provide the full set of resources via the reInit API if they want to rollback to older than last. what do you think... You mean rollback to a version before previous ?... Agreed So just to clarify, the ReInit behavior as per the latest patch (v007) is: # Container C1 currently has resources (*a* and *b*) symlinked to (*s1* and *s2*) # The reInitialization req. comes in with autocommit = false and which asks for resource *c* symlinked to *s1* ** The ReInitContext created will have _newResourceSet_ = (c) and _oldResourceSet_ = Clone of current resourceset (which contains a and b, but it is moved to the pending list) # Container moves into REINITIALIZING state while *c* is being localized. # Once *c* is localized, container gets a RESOURCE_LOCALIZED event. Since there are no more pending resources, in the _ResourceLocalizedWhileReInitTransition_, a CLEANUP_CONTAINER_FOR_REINIT is sent to the launcher. # The CLEANUP_CONTAINER_FOR_REINIT is handled by the _ContainerLaunch_ which kills the current process and sends a CONTAINER_KILLED_ON_REQUEST to the container. # This moves the Container to LOCALIZED state and invokes the _KilledForReInitializationTransition_ which does a {{container.sendLaunchEvent()}} which restarts the process with the new launch context and resourceSet. AutoRollback (in case of container failure after re-init): # In the _RetryFailureTransition_, we decide the container has exhausted its retries, and we decide to rollback. Unlike normal Reinit, the container is NOT running now.. # Since the oldResourceSet is a clone of the previous state, with *a* and *b* still in pending, C1 sends LOCALIZE_CONTAINER_RESOURCES event to localizer and moves to REINITIALIZING state to wait for response. # Once *a* and *b* has been verified (Currently its is not re-localized.. as you pointed out... but the Tracker still checks if the resource exits) RESOURCE_LOCALIZED is sent to the Container. # Unlike the normal reInit case, since the process is already dead, I cannot send a CLEANUP_CONTAINER_FOR_REINIT to the ContainerLaunch (since it does not exit) so I send the CONTAINER_KILLED_ON_REQUEST to the container.. which is handled by container and proceeds as steps 5 and 6 above. Explicit Rollback (AM invoked) is Exactly like the normal ReInit case, but autoCommit is true, so no rollback state is saved (since you should not be able to rollback a rollback) Restart is also similar to Reinit except that we first check if a rollback state exists.. if so, it just copies the reference of the oldResourceSet and oldLaunchcontext from the existing reinitContext to the new reInitContext. No resourceSet cloning takes place here (since the oldResourceSet is already a clone) Let me know if the above is fine... > Expose upgrade and restart API in ContainerManagementProtocol > ------------------------------------------------------------- > > Key: YARN-5609 > URL: https://issues.apache.org/jira/browse/YARN-5609 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Arun Suresh > Assignee: Arun Suresh > Attachments: YARN-5609.001.patch, YARN-5609.002.patch, > YARN-5609.003.patch, YARN-5609.004.patch, YARN-5609.005.patch, > YARN-5609.006.patch, YARN-5609.007.patch > > > YARN-5620 and YARN-5637 allows an AM to explicitly *upgrade* a container with > a new launch context and subsequently *rollback* / *commit* the change on the > Container. This can also be used to simply *restart* the Container as well. > This JIRA proposes to extend the ContainerManagementProtocol with the > following API: > * *reInitializeContainer* > * *rollbackLastUpgrade* > * *commitLastUpgrade* > * *restartContainer* -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org