> On Aug. 5, 2016, 2:43 p.m., Greg Mann wrote:
> > 3rdparty/libprocess/src/process.cpp, lines 2407-2413
> > <https://reviews.apache.org/r/40266/diff/5/?file=1458242#file1458242line2407>
> >
> >     I discovered while running my SSL scheduler test that it's possible for 
> > new processes to be spawned in between the destruction of `gc` and the 
> > stopping of the event loop - see the gist 
> > [here](https://gist.github.com/greggomann/4e1d6a4101d4a3c52a5d9ea2571a043b).
> >  Just before the backtrace, you can see some debug output I added to 
> > indicate when `gc` is deleted and set to `nullptr`.
> >     
> >     In this case, it looks like the scheduler process was attempting to 
> > reopen a `Connection`; the GC's `manage()` method is dispatched to manage 
> > the new `ConnectionProcess`, and when the dispatch calls the GC process's 
> > `self()` and attempts to construct a new `PID` using the `gc` pointer we 
> > get a segfault.
> >     
> >     To avoid this, perhaps we should have a check in `spawn` which refuses 
> > to spawn new processes while libprocess is being finalized/reinitialized? 
> > It seems to me that some processes may need to spawn during termination, so 
> > maybe enforcing that constraint after `terminate_all()` would make sense?
> 
> Joseph Wu wrote:
>     The "scheduler process" should have been killed in `terminate_all`.  If 
> that scheduler process lives in another OS process, it should not have been 
> able to make a connection (since the server socket is destroyed before this 
> method).
>     
>     From you gist, it looks like the server socket is still alive:
>     ```
>     I0805 14:35:37.724653 264331264 libevent_ssl_socket.cpp:1048] Accepting 
> from localhost
>     ```
>     
>     It's possible that this code is not enough to close an SSL server socket:
>     ```
>       synchronized (socket_mutex) {
>         future_accept.discard();
>         delete __s__;
>         __s__ = nullptr;
>     ```

Added some guards in the codepath for `process::spawn`.

You can hit a similar segfault if you do this:
```
process::initialize();

process::finalize();
process::spawn(new ProcessBase(), true); // <- Segfault because `gc` doesn't 
exist anymore.
```


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40266/#review144989
-----------------------------------------------------------


On Aug. 12, 2016, 2 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40266/
> -----------------------------------------------------------
> 
> (Updated Aug. 12, 2016, 2 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Artem Harutyunyan, Joris Van Remoortere, 
> and Vinod Kone.
> 
> 
> Bugs: MESOS-3910
>     https://issues.apache.org/jira/browse/MESOS-3910
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The `SocketManager` and `ProcessManager` are highly inter-dependent, 
> which requires some untangling in `process::finalize`.
> 
> * Logic originally found in `~ProcessManager` has been split into 
>   `ProcessManager::finalize` due to what happens during cleanup.
> * The future from `__s__->accept()` must be explicitly discarded as 
>   libevent does not detect a locally closed socket.
> * Terminating `HttpProxy`s must close the associated socket.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/process.cpp 
> 629f1644bc0a263972ec9efc41890c33f9406a34 
> 
> Diff: https://reviews.apache.org/r/40266/diff/
> 
> 
> Testing
> -------
> 
> `make check` (libev)
> `make check` (--enable-libevent --enable-ssl)
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>

Reply via email to