[ 
https://issues.apache.org/jira/browse/YARN-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5637:
------------------------------
    Attachment: YARN-5637.002.patch

Updating patch based on [~jianhe]'s suggesting and rebasing with latest 
YARN-5620 patch.

bq. Do we need to do something for this condition ? else, it can be removed,
Yeah.. can be removed.. I had put that there to remind me of something.. forgot 
to remove it :)

bq. In RollbackContainerTransition: the container.getResourceSet() will return 
all resources including current and previous version. We should re-request only 
the previous version's resources, rather than the union of both?
In the latest patch, the resourceSet is reverted to previous state as well.

bq. I still have question on the commit API, how does AM use this API in 
practice ?
Commit is just a way for the AM to tell the NM that it is fine with the upgrade 
(after it performs some upgrade diagnostics check on the container perhaps) and 
the container is working as it should be.. After the AM does a commit, the 
container cannot be rolledback and any bookkeeping required to rollback (the 
reInitContext for eg.) can is deleted by the NM. 

Prior to a commit, if the upgraded Container fails, NM can choose to 
automatically rollback.

Of course the AM is still free to call 'upgrade' again, with an old launch 
context.

By default, autoCommit is 'true' which means, as soon as the container is 
upgraded, it is also committed.

bq. Also, should the rollback API be always be able to rollback ?
Once Commit has been called, you cannot rollback. The AM would have to 
explicitly call the upgrade API again with the previous launchContext.

bq. ContainerLaunchContext already has the ContainerRetryContext ? can we reuse 
that retryContext?
I wanted to distinguish between the retry policy used to retry a failed 
container and the policy used to decide failure retries during upgrades. It is 
possible both can be the same. I just put that argument there in the 
_upgrade()_ API to make it explicit.

bq. The ContainerImpl#ContainerRetryContext is not updated to new value on 
upgrade.
This is fixed in the latest YARN-5620 patch

bq. RetryFailureTranstion: it's a bit complicated.. is it possible to simplify 
it something like below:
I refactored it a bit.. let me know if its ok.





> Changes in NodeManager to support Container upgrade and rollback/commit
> -----------------------------------------------------------------------
>
>                 Key: YARN-5637
>                 URL: https://issues.apache.org/jira/browse/YARN-5637
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-5637.001.patch, YARN-5637.002.patch
>
>
> YARN-5620 added support for re-initialization of Containers using a new 
> launch Context.
> This JIRA proposes to use the above feature to support upgrade and subsequent 
> rollback or commit of the upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to