> On Feb. 25, 2017, 1:24 a.m., Joseph Wu wrote: > > This is definitely the direction we want to head towards (Job objects, > > without leaking handles). > > > > But, due to the partial duplication of logic in: > > https://reviews.apache.org/r/56366/ > > I think we should move much of this logic into libprocess. > > > > Here's the gist of what should be done: > > > > * We'll want a new, Windows-only actor that is created in > > `process::initialize`. This will look very similar to the > > `GarbageCollector` process in `include/process/gc.hpp`. Something like: > > ``` > > class JobObjectManager : public Process<JobObjectManager> > > { > > public: > > JobObjectManager() : ProcessBase("__job_object_manager__") {} > > virtual ~JobObjectManager() {} > > > > // NOTE: Maybe you'll want the actor to create the JobObject itself. > > // You could even move the CREATE_JOB parent hook into this file. > > void manage(const pid_t pid, const std::string& name, const SharedHandle& > > handle) > > { > > objects[pid] = {name, handle}; > > > > process::reap(pid) > > .onAny(lambda::bind(&Self::cleanup, lambda::_1, pid)); > > } > > > > protected: > > virtual void cleanup(Future<Option<int>> exit_code, const pid_t pid) > > { > > CHECK(!exit_code.isPending()); > > CHECK(!exit_code.isDiscarded()); > > > > Try<Nothing> killJobResult = os::kill_job(objects[pid].handle.get()); > > CHECK(!killJobResult.isError()) > > << "Failed to kill job object: " + killJobResult.error(); > > > > objects.erase(pid); > > } > > > > private: > > struct JobData { > > std::string name; > > SharedHandle handle; > > } > > std::map<pid, JobData> objects; > > }; > > ``` > > * You can either expose the actor's methods via `process.hpp` or via moving > > the ParentHook into the new actor. > > * The WindowsLauncher would become a thin wrapper around the new actor. > > * https://reviews.apache.org/r/56366/ should become much simpler.
Hm, okay, I like your idea. Let's chat and go over the design you're proposing. > On Feb. 25, 2017, 1:24 a.m., Joseph Wu wrote: > > src/slave/containerizer/mesos/windows_launcher.cpp, lines 50-62 > > <https://reviews.apache.org/r/56362/diff/4/?file=1630626#file1630626line50> > > > > Erm... saving the set of PIDs in a prior loop guarantees that the > > `pids.contains(pid)` condition will be met. containers != containerStates The containers is what's already tracked in the launcher; the states is what's being recovered from disk. - Andrew ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/56362/#review166640 ----------------------------------------------------------- On Feb. 11, 2017, 2:04 a.m., Andrew Schwartzmeyer wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/56362/ > ----------------------------------------------------------- > > (Updated Feb. 11, 2017, 2:04 a.m.) > > > Review request for mesos, Alex Clemmer and Joseph Wu. > > > Bugs: MESOS-6892 > https://issues.apache.org/jira/browse/MESOS-6892 > > > Repository: mesos > > > Description > ------- > > Instead of deriving `WindowsLauncher` from `PosixLauncher`, > this commit implements a separate `WindowsLauncher` derived > from `Launcher` parallel to the `PosixLauncher`. > > This launcher is then refactored to use Windows' "Job Objects," which > are similar to Linux's cgroups, and enable us to reason about a group > of processes associated with a task/container as a "Job Object" > instead of a root PID and the tree containing its children. The latter > is not a reasonable approach on Windows, and has been the source of > subtle bugs. > > The Job Object approach creates a named job with a one-to-one mapping > to the containerizer, and assigns the first process started for the > task to the job object. After being assigned, the Windows kernel > ensures all spawned child processes > (specifically those made with the `CreateProcess` system call) are > also assigned to the named job object. Thus this job object can then > be used to query the process group's resource usage, kill the process > group, and set limits on the process group. > > So instead of seeing a process group as a tree and referring to it by > the root process's PID, the `WindowsLauncher` sees a process group as > a named Job Object, the same way the Windows kernel sees it. However, > the containerizer code which interacts with the launcher still refers > to a task group by the singular PID, and so we have a sort of shim > which maps the initial PID to the name of the job object. This is a an > unfortunate consequence of the shared containerizer code being > originally written for POSIX-like systems. > > This abstraction sets us up for implementing resource usage limits on > the process group as a whole. > > > Diffs > ----- > > src/CMakeLists.txt 3a4ace9c8011ac8eec5067cd085fa7fe4166b9ee > src/slave/containerizer/mesos/containerizer.cpp > d2b4f75a55dbe4746bc2dfc180335fa831a554ef > src/slave/containerizer/mesos/launcher.hpp > 79f6eea0ee8e564e90b36208672df150dbc5d540 > src/slave/containerizer/mesos/launcher.cpp > 5114c130efbfb252dde1e85c081f5174e66f57af > src/slave/containerizer/mesos/windows_launcher.hpp PRE-CREATION > src/slave/containerizer/mesos/windows_launcher.cpp PRE-CREATION > > Diff: https://reviews.apache.org/r/56362/diff/ > > > Testing > ------- > > > Thanks, > > Andrew Schwartzmeyer > >