[
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun Suresh updated YARN-5620:
------------------------------
Attachment: YARN-5620.006.patch
Uploading patch addressing most of [~vvasudev] and [~jianhe] suggestions.
Thanks for the comments !!
[~vvasudev],
bq. Should there be a guard against calling reint if a reinit is already in
progress? Could we end up with the ReInitContext in odd state?
So there is already a guard in the ContainerManager api... but I have included
an additional check in the transition in the new patch as per your suggestion.
bq. Instead of a launch event we should send a relaunch event - the relaunch
takes care of trying to run in same work dir as the earlier attempt, etc
I actually tried using relaunch initially... but it looks like the pid has to
be running for the re launch to work correctly. Also, looks like we would need
an intermediate state there too and would result in same (or more) amount of
code change. I would actually prefer to use launch itself, since I am more
confident of how it works. I have also updated the testcase to verify that the
upgraded container has access to and is able to read files created by the
previous process in the working directory.
bq. think an explicit commit API(with auto-commit option being the default
option) should satisfy both use cases.
Thanks.. will update the patch with it once we agree that the reinit flow is
fine.
[~jianhe],
bq. While AM issues the upgrade command, the container could exit with success
or failure. in this case, should we still continue the upgrade process ?
I am nullifying the reInitContext in the event of an explicit kill or if
process completed successfully during the reInit.. the upgrade should thus be
cancelled. Do take a look at the latest patch and let me know if you think i've
cover all cases.
> Core changes in NodeManager to support for upgrade and rollback of Containers
> -----------------------------------------------------------------------------
>
> Key: YARN-5620
> URL: https://issues.apache.org/jira/browse/YARN-5620
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Arun Suresh
> Assignee: Arun Suresh
> Attachments: YARN-5620.001.patch, YARN-5620.002.patch,
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch,
> YARN-5620.006.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to
> support upgrade of a running container with a new {{ContainerLaunchContext}}
> as well as the ability to rollback the upgrade if the container is not able
> to restart using the new launch Context.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]