[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15472777#comment-15472777 ]
Jian He commented on YARN-5620: ------------------------------- [~asuresh], thanks for the explanation bq. Only the AM knows if the upgrade is actually successful. How does AM determine whether the upgrade is successful (like what kind signal should AM depend on)? I feel once the container starts running, even for AM, it's hard to distinguish whether the failure is caused by upgrade or runtime. IMO, if container fails to launch on upgrade, it should be considered as upgrade failure. Once the container starts running, if the container fails, it can be considered as runtime failure. If user does want to rollback, user call the upgardeContainer/rollback command again to roll back. bq. But, in my opinion rollback should not be provided with an explicit launchContext, it should always be the just previous context. I also agree AM can take care of tying the context with version. In our case, the slider AM (also Yarn code) will have the prior context and call the upgardeContainer with the corresponding context, and so NM does not need to remember prior context. I think for upgrade itself, it is enough work for a single jira with enough corner cases. Could you separate the patch to include only the upgrade piece in this jira ? that also makes review easier.. > Core changes in NodeManager to support for upgrade and rollback of Containers > ----------------------------------------------------------------------------- > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Arun Suresh > Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org