> On July 1, 2015, 12:20 a.m., Adam B wrote: > > I'm sorry, but I don't understand how LOG(FATAL) was segfaulting before. > > Please explain. > > Joris Van Remoortere wrote: > Hi Adam, > This was reported as a segfault by a user in IRC. The specific segfault > was fixed by MESOS-2636. > The segfault _was_ happening under an error condition, so after the > segfault was fixed, we would _still_ exit with a SIGABRT / stack trace that > is generated by a `LOG(FATAL)`. > > There has been a drive to reduce the number of bad user experiences by > converting some of the exit points that SIGABRT / stack trace when just a > friendly message to the user would be much more helpful. > I acknowledge that `PLOG(FATAL)` already prints out an error message; > however, it is rather hidden amongst the other messages and stack trace. > Users tend to miss this and complain that Mesos is broken, rather than fixing > the user / configuration problem mentioned in the error message. > > Does this answer your question?
That helps. Could somebody paste an example log/stacktrace (and exit code) from the current behavior (without this patch), for comparison? While I agree that we should log something more helpful than `ip.error()`, I'm not convinced that we need to convert this (or every?) LOG(FATAL) to an EXIT(). - Adam ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36061/#review90014 ----------------------------------------------------------- On June 30, 2015, 5:32 p.m., Marco Massenzio wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/36061/ > ----------------------------------------------------------- > > (Updated June 30, 2015, 5:32 p.m.) > > > Review request for mesos, Adam B and Joris Van Remoortere. > > > Bugs: MESOS-2962 > https://issues.apache.org/jira/browse/MESOS-2962 > > > Repository: mesos > > > Description > ------- > > Jira: MESOS-2962 > > Slave fails with Abort stacktrace when DNS cannot resolve hostname > > If the DNS cannot resolve the hostname for a slave node, we correctly return > an Error object, but we then fail with a segfault. > > This code adds a more user-friendly message and exits normally (with an > `EXIT_FAILURE` code). > For example, forcing `net::getIp()` to always return an Error, now causes the > slave to exit like this: > ``` > $ ./bin/mesos-slave.sh --master=10.10.1.121:5405 > WARNING: Logging before InitGoogleLogging() is written to STDERR > E0630 11:31:45.777465 1944417024 process.cpp:899] Could not obtain the IP > address for stratos.local; the DNS service may not be able to resolve it: >>> > Marco was here!!! > > $ echo $? > 1 > ``` > > > Diffs > ----- > > 3rdparty/libprocess/src/process.cpp > d99947c1598c43c47c88ef3e8038081855f0d1dc > > Diff: https://reviews.apache.org/r/36061/diff/ > > > Testing > ------- > > make check > and manual failing the DNS > > > Thanks, > > Marco Massenzio > >
