> On Aug. 27, 2015, 6:14 a.m., Neil Conway wrote: > > 3rdparty/libprocess/src/process.cpp, line 2212 > > <https://reviews.apache.org/r/37821/diff/2/?file=1055749#file1055749line2212> > > > > Somewhat race-prone: we might see "shutting_down.load() == false", > > proceed to deliver the inbound message, and yet the shutdown code can > > proceed concurrently. After a bit of poking I couldn't find a situation in > > which that would be problematic, but maybe worth exploring if there's a > > known data race/hang...
Thanks Neil, good point. It turns out the race condition was occurring in schedule() and was easily fixed by moving a boolean test. However, you're right that currently it's possible for processes to get queued up in ProcessManager::handle() after shutting_down has been set to true, and this is not great. I could move the "if (shutting_down.load())" test closer to the actual calls to deliver() and dispatch(), which would require duplicating it a number of times. It would be messy, but would lessen the raciness. Placing the test in deliver() seems like a lot of unnecessary work when internal libprocess messages are sent, and we still want to let internal processes send/receive messages while they're terminating. Perhaps there's another superior location for this test that I'm not finding? - Greg ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37821/#review96655 ----------------------------------------------------------- On Aug. 27, 2015, 10:59 p.m., Greg Mann wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/37821/ > ----------------------------------------------------------- > > (Updated Aug. 27, 2015, 10:59 p.m.) > > > Review request for mesos, Benjamin Hindman, Joris Van Remoortere, and > switched to 'mcypark'. > > > Bugs: MESOS-3158 > https://issues.apache.org/jira/browse/MESOS-3158 > > > Repository: mesos > > > Description > ------- > > Join threads in libprocess when shutting down. > > > Diffs > ----- > > 3rdparty/libprocess/src/event_loop.hpp > 36a4cd2b1ff59f6922173ad17115bf80cc3c8f30 > 3rdparty/libprocess/src/libev.cpp 97a2694f9b10bc61841443b21f4f96055493e840 > 3rdparty/libprocess/src/libevent.cpp > d7c47fbd1dbdec1fc974840e6f3a1428a8f189d5 > 3rdparty/libprocess/src/process.cpp > 755187c8761137cb2bf2f7295b29a63f63c68bc6 > > Diff: https://reviews.apache.org/r/37821/diff/ > > > Testing > ------- > > After configuring with both "../configure" and "../configure > --enable-libevent --enable-ssl": > > make check > > > Also, to check for race conditions related to the initialization/shutdown of > libprocess, try something like: > > for n in {1..1000}; do echo $n; 3rdparty/libprocess/tests > --gtest_filter=ProcessTest.Spawn; done > > > Thanks, > > Greg Mann > >
