> On Jan. 22, 2016, 6:58 a.m., Ben Mahler wrote:
> > Logically looks good, just a couple of trivial comments and we can get this
> > landed!
I was going to start fixing things up, but it looks like you took care of it
for me. Thanks!
- Kevin
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42519/#review115796
---
On Jan. 19, 2016, 10:58 p.m., Kevin Klues wrote:
>
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42519/
> ---
>
> (Updated Jan. 19, 2016, 10:58 p.m.)
>
>
> Review request for mesos, Ben Mahler and Greg Mann.
>
>
> Bugs: MESOS-4409
> https://issues.apache.org/jira/browse/MESOS-4409
>
>
> Repository: mesos
>
>
> Description
> ---
>
> Previously, it was possible for join() to return before a schedDriver
> was actually fully stopped or aborted (breaking the semantics of the
> join() call). The race came from a short circuit in join(), which
> simply checked for status != DRIVER_RUNNING before returning. It appears
> this short circuit was introduced to handle cases where initialize() or
> start() ended up aborting before ever starting the driver to begin with.
> However, it unintentionally covers cases where stop() or abort() were
> called *after* the driver started running as well.
>
> The problem is that stop() and abort() will change the status
> to DRIVER_STOPPED or DRIVER_ABORTED before actually processing
> dispatched stop or abort events (which happen asynchronously in a
> libprocess thread). Under normal operation, join() would wait for these
> events to trigger a latch that allowed the join() call to return.
> However, with the short circuit, join() exits immediately even if the
> libprocess thread hasn't yet processed the stop() or abort() events.
>
> This commit fixes the semantics of the join() call to avoid this race.
> We considered removing the latch completely and replacing it with
> process.wait(), but, unlike the latch, this wouldn't ensure that stop()
> or abort() was ever called in the first place.
>
>
> Diffs
> -
>
> src/sched/sched.cpp 38940b7e2563a2156be2f8c228afdc27c45b6e83
>
> Diff: https://reviews.apache.org/r/42519/diff/
>
>
> Testing
> ---
>
> Ran the entire 'make check' suite with no failures on both Mac OS X and
> ubuntu 14.04.
>
>
> Thanks,
>
> Kevin Klues
>
>