> On Aug. 5, 2016, 2:43 p.m., Greg Mann wrote: > > 3rdparty/libprocess/src/process.cpp, lines 2407-2413 > > <https://reviews.apache.org/r/40266/diff/5/?file=1458242#file1458242line2407> > > > > I discovered while running my SSL scheduler test that it's possible for > > new processes to be spawned in between the destruction of `gc` and the > > stopping of the event loop - see the gist > > [here](https://gist.github.com/greggomann/4e1d6a4101d4a3c52a5d9ea2571a043b). > > Just before the backtrace, you can see some debug output I added to > > indicate when `gc` is deleted and set to `nullptr`. > > > > In this case, it looks like the scheduler process was attempting to > > reopen a `Connection`; the GC's `manage()` method is dispatched to manage > > the new `ConnectionProcess`, and when the dispatch calls the GC process's > > `self()` and attempts to construct a new `PID` using the `gc` pointer we > > get a segfault. > > > > To avoid this, perhaps we should have a check in `spawn` which refuses > > to spawn new processes while libprocess is being finalized/reinitialized? > > It seems to me that some processes may need to spawn during termination, so > > maybe enforcing that constraint after `terminate_all()` would make sense? > > Joseph Wu wrote: > The "scheduler process" should have been killed in `terminate_all`. If > that scheduler process lives in another OS process, it should not have been > able to make a connection (since the server socket is destroyed before this > method). > > From you gist, it looks like the server socket is still alive: > ``` > I0805 14:35:37.724653 264331264 libevent_ssl_socket.cpp:1048] Accepting > from localhost > ``` > > It's possible that this code is not enough to close an SSL server socket: > ``` > synchronized (socket_mutex) { > future_accept.discard(); > delete __s__; > __s__ = nullptr; > ```
Added some guards in the codepath for `process::spawn`. You can hit a similar segfault if you do this: ``` process::initialize(); process::finalize(); process::spawn(new ProcessBase(), true); // <- Segfault because `gc` doesn't exist anymore. ``` - Joseph ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/40266/#review144989 ----------------------------------------------------------- On Aug. 12, 2016, 2 p.m., Joseph Wu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/40266/ > ----------------------------------------------------------- > > (Updated Aug. 12, 2016, 2 p.m.) > > > Review request for mesos, Greg Mann, Artem Harutyunyan, Joris Van Remoortere, > and Vinod Kone. > > > Bugs: MESOS-3910 > https://issues.apache.org/jira/browse/MESOS-3910 > > > Repository: mesos > > > Description > ------- > > The `SocketManager` and `ProcessManager` are highly inter-dependent, > which requires some untangling in `process::finalize`. > > * Logic originally found in `~ProcessManager` has been split into > `ProcessManager::finalize` due to what happens during cleanup. > * The future from `__s__->accept()` must be explicitly discarded as > libevent does not detect a locally closed socket. > * Terminating `HttpProxy`s must close the associated socket. > > > Diffs > ----- > > 3rdparty/libprocess/src/process.cpp > 629f1644bc0a263972ec9efc41890c33f9406a34 > > Diff: https://reviews.apache.org/r/40266/diff/ > > > Testing > ------- > > `make check` (libev) > `make check` (--enable-libevent --enable-ssl) > > > Thanks, > > Joseph Wu > >
