> On Dec. 5, 2016, 9:11 p.m., Jie Yu wrote: > > src/slave/containerizer/mesos/io/switchboard.cpp, lines 167-171 > > <https://reviews.apache.org/r/54355/diff/3/?file=1576209#file1576209line167> > > > > I think this flag should apply to containers that are about to be > > launched next. > > > > So let's still recover Info for those containers that have io > > switchboard server even if this flag has been turned off after agent > > restart. > > > > What do you think?
I agree. I removed this check, and added a NOTE about this in the comments below. > On Dec. 5, 2016, 9:11 p.m., Jie Yu wrote: > > src/slave/containerizer/mesos/io/switchboard.cpp, lines 537-547 > > <https://reviews.apache.org/r/54355/diff/3/?file=1576209#file1576209line537> > > > > It's likely that the io switchboard server has been forked, but the > > agent crashes before it was able to checkpoint the pid. > > > > If that happens, during recovery, we will not maintain Info for that > > container. As a result, we won't try to cleanup the socket file potentially > > created? > > > > I think we probably need to createa directory for io switchboard > > related files (sock and pid files). When we create the directory, it > > indicates that the io switchboard server might or might not be created. > > During recovery, if we find the directory exists, but pid file does not > > exist, we should still create the Info with pid set to None(), and cleanup > > the socket file in 'cleanup' method. > > > > Thoughts? That seems reasonable, what should we call the directory? As below? ``` io_switchboard |-pid -socket ``` - Kevin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54355/#review158038 ----------------------------------------------------------- On Dec. 5, 2016, 9:46 a.m., Kevin Klues wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54355/ > ----------------------------------------------------------- > > (Updated Dec. 5, 2016, 9:46 a.m.) > > > Review request for mesos and Jie Yu. > > > Bugs: MESOS-6688 > https://issues.apache.org/jira/browse/MESOS-6688 > > > Repository: mesos > > > Description > ------- > > Added implementation of `recover()` to the IOSwitchboard isolator. > > > Diffs > ----- > > src/slave/containerizer/mesos/io/switchboard.hpp > 839665a22aca9b1c1c1cf4992406bc924ee2b065 > src/slave/containerizer/mesos/io/switchboard.cpp > d5211b98616e72a27ca6b472a5ee83505c227f22 > > Diff: https://reviews.apache.org/r/54355/diff/ > > > Testing > ------- > > GTEST_FILTER="" make -j check > sudo src/mesos-tests > > Test added in follow-on patch. > > > Thanks, > > Kevin Klues > >
