> On April 20, 2015, 5:55 a.m., Adam B wrote:
> > LGTM, barring a question about ordering/synchronization. I'll let another 
> > committer take a look before we commit it.
> 
> Adam B wrote:
>     Would also like to see a successful ReviewBot pass. That MasterFailover 
> segfault seems like it could be related to your detector changes.
> 
> Niklas Nielsen wrote:
>     Hey Robert, do you need some help on this patch?
> 
> Niklas Nielsen wrote:
>     Ping :) Let's get some cadence behind this, or drop it (and reopen when 
> we can work on this again).
> 
> Niklas Nielsen wrote:
>     Alright - I think I understand the problem. In the test cases, the owned 
> pointer will get double deleted.
>     
>     Let's iterate on how to address this.

It's unsafe to delete the detector before we stop SchedulerProcess since we're 
using the detector in 'SchedulerProcess::detected' and it's possible that the 
detector will attempt to be used after we dispatch to 'SchedulerProcess::stop' 
but before we call 'terminate' in 'SchedulerProcess::stop'!

We can't delete the detector until after we've waited for the process to 
terminate. Here's one suggestion, delete the detector in 'join', which is 
already waiting for 'SchedulerProcess::stop' to complete, so it's not a big 
deal to also wait for the entire process to terminate (assuming that 'stop' has 
actually been called). Here's an example of how this might be able to work: 
https://reviews.apache.org/r/35405

Note that I haven't actually compiled this code or tested it, it's just a proof 
of concept to hopefully unblock folks here, please let me know if I can help 
further. The general expectatoin with this could would be that from Java you'd 
do something like:

driver.stop();
driver.join(); // After this returns we know that the detector has been deleted.

And in the abort paths it would look like:

driver.abort();
driver.stop();
driver.join(); // After this returns we know that the detector has been deleted.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33208/#review80655
-----------------------------------------------------------


On April 15, 2015, 5:23 a.m., Robert Lacroix wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33208/
> -----------------------------------------------------------
> 
> (Updated April 15, 2015, 5:23 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-1634
>     https://issues.apache.org/jira/browse/MESOS-1634
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When the Mesos Scheduler Driver stops, it does not delete the detector 
> process before the object is garbage collected, which leaves ZooKeeper 
> connections hanging around unnecessarily. This deletes the process on stop as 
> well, not only on destruction.
> 
> 
> Diffs
> -----
> 
>   src/sched/sched.cpp 66fd2b3146568900356cc48e0f17c6720665ae80 
> 
> Diff: https://reviews.apache.org/r/33208/diff/
> 
> 
> Testing
> -------
> 
> make check, manual testing
> 
> 
> Thanks,
> 
> Robert Lacroix
> 
>

Reply via email to