-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56362/#review166640
-----------------------------------------------------------



This is definitely the direction we want to head towards (Job objects, without 
leaking handles).

But, due to the partial duplication of logic in: 
https://reviews.apache.org/r/56366/
I think we should move much of this logic into libprocess.

Here's the gist of what should be done:

* We'll want a new, Windows-only actor that is created in 
`process::initialize`.  This will look very similar to the `GarbageCollector` 
process in `include/process/gc.hpp`.  Something like:
```
class JobObjectManager : public Process<JobObjectManager>
{
public:
  JobObjectManager() : ProcessBase("__job_object_manager__") {}
  virtual ~JobObjectManager() {}

  // NOTE: Maybe you'll want the actor to create the JobObject itself.
  // You could even move the CREATE_JOB parent hook into this file.
  void manage(const pid_t pid, const std::string& name, const SharedHandle& 
handle)
  {
    objects[pid] = {name, handle};
    
    process::reap(pid)
      .onAny(lambda::bind(&Self::cleanup, lambda::_1, pid));
  }

protected:
  virtual void cleanup(Future<Option<int>> exit_code, const pid_t pid)
  {
    CHECK(!exit_code.isPending());
    CHECK(!exit_code.isDiscarded());

    Try<Nothing> killJobResult = os::kill_job(objects[pid].handle.get());
    CHECK(!killJobResult.isError())
      << "Failed to kill job object: " + killJobResult.error();

    objects.erase(pid);
  }

private:
  struct JobData {
    std::string name;
    SharedHandle handle;
  }
  std::map<pid, JobData> objects;
};
```
* You can either expose the actor's methods via `process.hpp` or via moving the 
ParentHook into the new actor.
* The WindowsLauncher would become a thin wrapper around the new actor.
* https://reviews.apache.org/r/56366/ should become much simpler.


src/slave/containerizer/mesos/windows_launcher.cpp (lines 50 - 62)
<https://reviews.apache.org/r/56362/#comment238647>

    Erm... saving the set of PIDs in a prior loop guarantees that the 
`pids.contains(pid)` condition will be met.


- Joseph Wu


On Feb. 10, 2017, 6:04 p.m., Andrew Schwartzmeyer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56362/
> -----------------------------------------------------------
> 
> (Updated Feb. 10, 2017, 6:04 p.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
>     https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Instead of deriving `WindowsLauncher` from `PosixLauncher`,
> this commit implements a separate `WindowsLauncher` derived
> from `Launcher` parallel to the `PosixLauncher`.
> 
> This launcher is then refactored to use Windows' "Job Objects," which
> are similar to Linux's cgroups, and enable us to reason about a group
> of processes associated with a task/container as a "Job Object"
> instead of a root PID and the tree containing its children. The latter
> is not a reasonable approach on Windows, and has been the source of
> subtle bugs.
> 
> The Job Object approach creates a named job with a one-to-one mapping
> to the containerizer, and assigns the first process started for the
> task to the job object. After being assigned, the Windows kernel
> ensures all spawned child processes
> (specifically those made with the `CreateProcess` system call) are
> also assigned to the named job object. Thus this job object can then
> be used to query the process group's resource usage, kill the process
> group, and set limits on the process group.
> 
> So instead of seeing a process group as a tree and referring to it by
> the root process's PID, the `WindowsLauncher` sees a process group as
> a named Job Object, the same way the Windows kernel sees it. However,
> the containerizer code which interacts with the launcher still refers
> to a task group by the singular PID, and so we have a sort of shim
> which maps the initial PID to the name of the job object. This is a an
> unfortunate consequence of the shared containerizer code being
> originally written for POSIX-like systems.
> 
> This abstraction sets us up for implementing resource usage limits on
> the process group as a whole.
> 
> 
> Diffs
> -----
> 
>   src/CMakeLists.txt 3a4ace9c8011ac8eec5067cd085fa7fe4166b9ee 
>   src/slave/containerizer/mesos/containerizer.cpp 
> d2b4f75a55dbe4746bc2dfc180335fa831a554ef 
>   src/slave/containerizer/mesos/launcher.hpp 
> 79f6eea0ee8e564e90b36208672df150dbc5d540 
>   src/slave/containerizer/mesos/launcher.cpp 
> 5114c130efbfb252dde1e85c081f5174e66f57af 
>   src/slave/containerizer/mesos/windows_launcher.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/windows_launcher.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/56362/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>

Reply via email to