----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69451/#review210914 -----------------------------------------------------------
src/master/master.hpp Lines 2594-2609 (original), 2596-2619 (patched) <https://reviews.apache.org/r/69451/#comment295743> How about: ``` if (!connected()) { LOG(WARNING) << "Master attempting to send message to " << (recovered() ? "recovered" : "disconnected") << " framework " << *this; // NOTE: We proceed here without returning to support the case where a // `disconnected()` framework is still talking to the master and the // master wants to shut it down by sending a `FrameworkErrorMessage`. // This can occur in a one way link breakage where the master -> // framework link is broken but the framework -> master link remains // intact. Note that we don't have periodic heartbeating between master // and pid-based schedulers. // // TODO(cshiao): Update the `FrameworkErrorMessage` call-sites that // rely on the lack of a `return` here to directly call `process::send()` // so that this function doesn't need to deal with the special case. // Then we can check that if we're connected -> one of `http` or `pid` // is set. } if (http.isSome()) { if (!http->send(message)) { LOG(WARNING) << "Unable to send event to framework " << *this << ":" << " connection closed"; } } else if (pid.isSome()) { master->send(pid.get(), message); } ``` - Benjamin Mahler On Nov. 27, 2018, 11:01 p.m., Chun-Hung Hsiao wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69451/ > ----------------------------------------------------------- > > (Updated Nov. 27, 2018, 11:01 p.m.) > > > Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, > and Till Toenshoff. > > > Bugs: MESOS-9419 > https://issues.apache.org/jira/browse/MESOS-9419 > > > Repository: mesos > > > Description > ------- > > The `Framework::send` function assumes that either `http` or `pid` is > set, which is not true for a framework that hasn't yet reregistered yet > but recovered from a reregistered agent. As a result, the master would > crash when a recovered executor tries to send a message to such a > framework (see MESOS-9419). This patch fixes this crash bug. > > > Diffs > ----- > > src/master/master.hpp 3b3c1a4e61de9503c8d038dd3bee623ded5914c9 > src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e > > > Diff: https://reviews.apache.org/r/69451/diff/2/ > > > Testing > ------- > > make check > > > Thanks, > > Chun-Hung Hsiao > >
