----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69775/#review212092 -----------------------------------------------------------
Thanks for doing this, this will avoid a lot of confusion around the master recovery failure case! Can you list all the `fail()` cases and make sure that they will output a clear message now that there's no stack trace? * failure to recover master (this definitely shouldn't stack trace, will help avoid a lot of confusion to remove the stack trace) * failure to mark agent unreachable (it's odd that this particular registry operation is handled via `fail()` and the others are not) * failure to acquire agent removal rate limit token (this should never fail and so stack trace is actually desirable?) I'm also inclined to not keep `fail()` and either use lambdas or have wrappers for the common cases. For example, have all registry operations go through the same wrapper: ``` // Old: registrar->apply(Owned<RegistryOperation>(new AdmitSlave(slaveInfo_))) .<something> // this part is done inconsistently // New: applyRegistryOperation(Owned<RegistryOperation>(new AdmitSlave(slaveInfo_))) .then( ... ); Future<bool> applyRegistryOperation(Owned<RegistryOperation>&& operation) { return registrar->apply(std::move(operation)) .onAbandoned(...) // LOG(FATAL) << ...; .onDiscarded(...) // LOG(FATAL) << ...; .onFailed(...) // EXIT(EXIT_FAILURE) << ...; ? } ``` Some cases don't even handle the failures? :O E.g. https://github.com/apache/mesos/blob/f01853aea4eaa3df6dec3f7342e5583f5addd07d/src/master/master.cpp#L1745-L1746 Ideally, the return type of the registry apply operation would allow us to distinguish between timeout and other failures. E.g. `Future<variant<TimeoutError, bool>>` - Benjamin Mahler On Jan. 16, 2019, 11:07 p.m., Gilbert Song wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69775/ > ----------------------------------------------------------- > > (Updated Jan. 16, 2019, 11:07 p.m.) > > > Review request for mesos, Andrei Budnik, Benjamin Mahler, Greg Mann, and Qian > Zhang. > > > Repository: mesos > > > Description > ------- > > LOG(FATAL) would dump a stack trace which may confuse people with > a master crash case. We should just print out an error msg. > > > Diffs > ----- > > src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2 > > > Diff: https://reviews.apache.org/r/69775/diff/2/ > > > Testing > ------- > > N/A > > > Thanks, > > Gilbert Song > >