> On Feb. 8, 2017, 2:26 a.m., Joseph Wu wrote:
> > I need to check how this is used in the rest of the review chain, but...
> > 
> > Giving the ownership of the HANDLE to the caller may require much larger 
> > changes in the codebase.  You may notice that we simply leak some pid's in 
> > some parts of the codebase.  So we have to make sure we aren't leaking 
> > these shared objects.
> 
> Alex Clemmer wrote:
>     I'm not sure I understand, actually.
>     
>     Just so we're on the same page, right now we leak the Job Object handles 
> because they're set to kill the corresponding Job when the last handle is 
> closed -- in other words, when the Agent dies. So any time the agent dies, 
> all of our Executors die, too. :(
>     
>     One of the goals of this patch is to set the stage so the agent and 
> executor lifecycles _can_ be decoupled, so that when the agent dies, it can 
> recover and reconnect to the running Executors instead of simply killing them 
> all and restarting them.
>     
>     This implies that the agent should be managing the lifecycle of the Job 
> Objects, which in particular seems to imply that it is convenient to keep 
> those handles (or _some_ sort of ID) as state.
>     
>     Make sense?

What Alex said correctly describes these changes. Instead of leaking the 
`HANDLE` such that the process implicitly obtains ownership of the job object 
(keeping it alive for the liftime of the executor), this patch makes this an 
explicit action by forcing the launcher to own the `SharedHandle` to the job 
object. The lifetime semantics have not changed; it's just been made explicit 
instead of implicit.

I need to note that:
> Giving the ownership of the HANDLE to the caller may require much larger 
> changes in the codebase.

This is inaccurate. The original code already gave the caller ownership of the 
`HANDLE` (unsafely, and implicitly).

As for:
> So we have to make sure we aren't leaking these shared objects.

This is a valid concern. The `SharedHandle` needs to be owned explicitly. I 
believe I made this the case between this patch and 
https://reviews.apache.org/r/56366/; thus this is of utmost concern for review.


> On Feb. 8, 2017, 2:26 a.m., Joseph Wu wrote:
> > 3rdparty/stout/include/stout/windows/os.hpp, lines 718-721
> > <https://reviews.apache.org/r/56364/diff/1/?file=1625895#file1625895line718>
> >
> >     This is a good default.  But we need a way to toggle this behavior, 
> > such that the agent's death does not kill child jobs.
> >     
> >     i.e. A Windows version of ChildHook::SETSID
> 
> Alex Clemmer wrote:
>     I asked Andy to follow up this patch with a different changeset that 
> decouples the life of the executor from the life of the agent. Since the 
> agent already kills all executors when it dies, I think it makes sense to 
> have one set of patches just adding Mesos Containers support to Windows, and 
> one fixing the semantics of a dying Agent.
>     
>     Thoughts?

I don't think that making this behavior togglable belongs in this patch. I 
attempted to retain the existing lifecycle and behavior as closely as possible. 
If you note, in `recover()` there is a `TODO` that states we should attempt to 
reconnect to a possibly still running executor; this is not currently a 
possible scenario, but a later patch should enable this.


> On Feb. 8, 2017, 2:26 a.m., Joseph Wu wrote:
> > 3rdparty/stout/include/stout/windows/os.hpp, lines 740-744
> > <https://reviews.apache.org/r/56364/diff/1/?file=1625895#file1625895line740>
> >
> >     There's no `name` argument in this function...

Heh, yeah. This went through quite a few iterations ;) I'll fix, thanks.


- Andrew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56364/#review164619
-----------------------------------------------------------


On Feb. 8, 2017, 9:10 p.m., Andrew Schwartzmeyer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56364/
> -----------------------------------------------------------
> 
> (Updated Feb. 8, 2017, 9:10 p.m.)
> 
> 
> Review request for mesos, Alex Clemmer and Joseph Wu.
> 
> 
> Bugs: MESOS-6892
>     https://issues.apache.org/jira/browse/MESOS-6892
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> `os::create_job` now returns a `Try<SharedHandle>` instead of a raw
> `HANDLE`, forcing ownership of the job object handle onto the caller
> of the function. `create_job` requires a `std::string name` for the
> job object, which is mapped from a PID using `os::name_job`.
> 
> The assignment of a process to the job object is now done via
> `Try<Nothing> os::assign_job(SharedHandle, pid_t)`.
> 
> The equivalent of killing a process tree with job object semantics
> is simply to terminate the job object. This is done via
> `os::kill_job(SharedHandle)`.
> 
> 
> Diffs
> -----
> 
>   3rdparty/stout/include/stout/windows/os.hpp 
> b5172fca96c4151f4b1ebb6d343022558f45fc34 
> 
> Diff: https://reviews.apache.org/r/56364/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>

Reply via email to