----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70778/#review215734 -----------------------------------------------------------
Ship it! - Joseph Wu On June 6, 2019, 1:55 p.m., Benjamin Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/70778/ > ----------------------------------------------------------- > > (Updated June 6, 2019, 1:55 p.m.) > > > Review request for mesos, Andrei Sekretenko, Chun-Hung Hsiao, and Joseph Wu. > > > Bugs: MESOS-9808 > https://issues.apache.org/jira/browse/MESOS-9808 > > > Repository: mesos > > > Description > ------- > > Per MESOS-9808, when we deliver a DispatchEvent to a TERMINATING > Process, the DispatchEvent is destructed while holding a > TERMINATING process reference. This can lead to a deadlock > because the DispatchEvent destruction can run additional code > (e.g. destructors of objects that have been bound into the > dispatch) that ultimately invokes terminate (or dispatch, or send) > while the original TERMINATING process reference is still held. If > the TERMINATING process referenced is being cleaned up, we may > deadlock between (1) `ProcessManager::cleanup()` holding the > `processes_mutex` and spinning for all references to go away and > (2) the terminate (or dispatch, or send) in ProcessManager::use > trying to resolve the destination Process by locking > `processes_mutex`. > > Example from MESOS-9808: > > Thread 19: > Running ReaderProcess > ReaderProcess::_consume, sets a Promise > Dispatches to mock API subscriber Process. **It's TERMINATING, > so DispatchEvent gets deleted while holding a reference to > the mock api subscriber!** > Destructs the process::Loop in mock API subscriber Process > Deletes the Reader, Reader terminates ReaderProcess > **Trying to lock `processes_mutex` in `ProcessManager::use()` > of ReaderProcess (deadlock)** > > Thread 5: > ProcessManager::cleanup of Mock API Subscriber Process, > looping on references to go away (spinning forever) > > Here, thread 5 is holding `processes_mutex` locked and spinning > waiting for references to the Mock API Subscriber Process to go > away, but thread 19 is blocked also trying to lock `processes_mutex` > while holding a reference to the Mock API Subscriber Process! > > The fix here is to not allow arbitrary code to execute while a > ProcessReference is held. This is already done in other locations > but this spot was missed. The reason we haven't hit this bug yet > is that it's rather uncommon for a dispatch to have bound > objects whose destructors lead to a `terminate()` call (today > `terminate()` always resolves the process reference, even if > it's already resolved, whereas `dispatch()` uses cached process > references). > > > Diffs > ----- > > 3rdparty/libprocess/include/process/process.hpp > 7c255acc21c695eba37062a3dcf72ce33f650cd0 > 3rdparty/libprocess/src/event_queue.hpp > 999d5520dc5824e47ba5cf676cf28cb6ca37ceb2 > 3rdparty/libprocess/src/process.cpp > a8d9151c91892b55d31cc7c3fb4c7243c41d9d20 > > > Diff: https://reviews.apache.org/r/70778/diff/3/ > > > Testing > ------- > > > Thanks, > > Benjamin Mahler > >
