> On Sept. 18, 2016, 11:51 p.m., Jie Yu wrote: > > src/slave/containerizer/mesos/containerizer.cpp, lines 1794-1800 > > <https://reviews.apache.org/r/51407/diff/6/?file=1501288#file1501288line1794> > > > > Hum, should we do the deletion here if we don't even sure the processes > > are killed properly? > > > > I think my suggestion on RuntimePath for container is: > > 1) we create the RuntimePath as the first thing in 'launch', even > > before we call any provisioner/isolator functions. > > 2) we checkpoint the pid right after fork > > 3) we delete the RuntimePath after the destroy is successful. > > > > The invariants we have if using the above way are: > > 1) If RuntimePath exists, we know that provisioner/isolator prepare > > might be called, so cleanup is necessary during recovery. > > 2) If RuntimePath does not exist, we know that all cleanups have been > > done properly and we no longer need to worry about cleanup. > > 3) If pid file exists, we know that the process has been forked. > > 4) If the pid file does not exist, we may or maynot have process being > > forked. > > > > For the upgrade situation, some checkpointed containers or orphans may > > not have RuntimePath. In such case, we should not create RuntimePath (i.e., > > do not checkpoint pid or exit status) in order to maintain the above > > invariant. So for old containers launched by previous version of agent, > > there will be no RuntimePath for it at any time (this is another invariant). > > > > It's likely that a container has RuntimePath, but launcher does not > > know about it. This is the case where launcher->destroy has been called > > (and being successful), but agent crashes before removing RuntimePath. > > > > It's also likely that a container does not have RuntimePath, but > > launcher knows about it. This is the legacy container case.
Hum, I don't see all the comments here are addressed? Especially the following: > 1) we create the RuntimePath as the first thing in 'launch', even before we > call any provisioner/isolator functions > For the upgrade situation, some checkpointed containers or orphans may not > have RuntimePath. In such case, we should not create RuntimePath (i.e., do > not checkpoint pid or exit status) in order to maintain the above invariant. - Jie ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51407/#review149406 ----------------------------------------------------------- On Sept. 23, 2016, 9:01 p.m., Kevin Klues wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51407/ > ----------------------------------------------------------- > > (Updated Sept. 23, 2016, 9:01 p.m.) > > > Review request for mesos and Jie Yu. > > > Bugs: MESOS-6204 > https://issues.apache.org/jira/browse/MESOS-6204 > > > Repository: mesos > > > Description > ------- > > This includes checkpointing both the container pid and the status of > the container upon exit. This also includes an update to tests to > account for new 'init' process semantics in a container. That is, the > name of the init process of the container is now "mesos-containerizer" > not "sh". > > > Diffs > ----- > > src/slave/containerizer/mesos/containerizer.hpp > 16f9e3e92e90fe7f8a0ebd24e567800e1f285bc9 > src/slave/containerizer/mesos/containerizer.cpp > 144b0db501d40d4e0bba12672723616bedd76e7e > src/tests/containerizer/isolator_tests.cpp > b4d25e57df7f0e157769c9ae4f7847657c505e78 > > Diff: https://reviews.apache.org/r/51407/diff/ > > > Testing > ------- > > $ GTEST_FILTER="" make -j check > $ src/mesos-tests > $ sudo src/mesos-tests > > > Thanks, > > Kevin Klues > >
