[
https://issues.apache.org/jira/browse/YARN-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun Suresh updated YARN-5637:
------------------------------
Attachment: YARN-5637.002.patch
Updating patch based on [~jianhe]'s suggesting and rebasing with latest
YARN-5620 patch.
bq. Do we need to do something for this condition ? else, it can be removed,
Yeah.. can be removed.. I had put that there to remind me of something.. forgot
to remove it :)
bq. In RollbackContainerTransition: the container.getResourceSet() will return
all resources including current and previous version. We should re-request only
the previous version's resources, rather than the union of both?
In the latest patch, the resourceSet is reverted to previous state as well.
bq. I still have question on the commit API, how does AM use this API in
practice ?
Commit is just a way for the AM to tell the NM that it is fine with the upgrade
(after it performs some upgrade diagnostics check on the container perhaps) and
the container is working as it should be.. After the AM does a commit, the
container cannot be rolledback and any bookkeeping required to rollback (the
reInitContext for eg.) can is deleted by the NM.
Prior to a commit, if the upgraded Container fails, NM can choose to
automatically rollback.
Of course the AM is still free to call 'upgrade' again, with an old launch
context.
By default, autoCommit is 'true' which means, as soon as the container is
upgraded, it is also committed.
bq. Also, should the rollback API be always be able to rollback ?
Once Commit has been called, you cannot rollback. The AM would have to
explicitly call the upgrade API again with the previous launchContext.
bq. ContainerLaunchContext already has the ContainerRetryContext ? can we reuse
that retryContext?
I wanted to distinguish between the retry policy used to retry a failed
container and the policy used to decide failure retries during upgrades. It is
possible both can be the same. I just put that argument there in the
_upgrade()_ API to make it explicit.
bq. The ContainerImpl#ContainerRetryContext is not updated to new value on
upgrade.
This is fixed in the latest YARN-5620 patch
bq. RetryFailureTranstion: it's a bit complicated.. is it possible to simplify
it something like below:
I refactored it a bit.. let me know if its ok.
> Changes in NodeManager to support Container upgrade and rollback/commit
> -----------------------------------------------------------------------
>
> Key: YARN-5637
> URL: https://issues.apache.org/jira/browse/YARN-5637
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Arun Suresh
> Assignee: Arun Suresh
> Attachments: YARN-5637.001.patch, YARN-5637.002.patch
>
>
> YARN-5620 added support for re-initialization of Containers using a new
> launch Context.
> This JIRA proposes to use the above feature to support upgrade and subsequent
> rollback or commit of the upgrade.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]