> On April 13, 2020, 5:20 p.m., Benjamin Mahler wrote: > > Perhaps describing an example of such a race in the description would be > > helpful for posterity? Ideally the one we encountered in practice with the > > check failure?
Good call, updated. - Greg ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72354/#review220299 ----------------------------------------------------------- On April 13, 2020, 8:11 p.m., Greg Mann wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72354/ > ----------------------------------------------------------- > > (Updated April 13, 2020, 8:11 p.m.) > > > Review request for mesos, Andrei Sekretenko and Benjamin Mahler. > > > Bugs: MESOS-10111 > https://issues.apache.org/jira/browse/MESOS-10111 > > > Repository: mesos > > > Description > ------- > > This fixes an issue where the functions `shutdown()` and > `event_callback()` race to access the bufferevent held by > our libevent SSL socket implementation, leading to a > CHECK failure. > > This race resulted in MESOS-10111, where multiple rapid > changes in ZK membership led to one master re-linking to > another multiple times in RECONNECT mode. This causes > `shutdown()` to be called on the existing socket while > it's attempting a connection, at which point a failure to > connect can produce the CHECK failure. > > > Diffs > ----- > > 3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp > dcb6d8e6c82005145c853afa9c24a61d7d0f04a9 > > > Diff: https://reviews.apache.org/r/72354/diff/1/ > > > Testing > ------- > > This fix is tested in https://reviews.apache.org/r/72355/, though it's likely > the test code will not be merged since it involves unsightly modifications to > the socket interface. > > > Thanks, > > Greg Mann > >
