-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69775/#review212092
-----------------------------------------------------------
Thanks for doing this, this will avoid a lot of confusion around the master
recovery failure case!
Can you list all the `fail()` cases and make sure that they will output a clear
message now that there's no stack trace?
* failure to recover master (this definitely shouldn't stack trace, will help
avoid a lot of confusion to remove the stack trace)
* failure to mark agent unreachable (it's odd that this particular registry
operation is handled via `fail()` and the others are not)
* failure to acquire agent removal rate limit token (this should never fail and
so stack trace is actually desirable?)
I'm also inclined to not keep `fail()` and either use lambdas or have wrappers
for the common cases. For example, have all registry operations go through the
same wrapper:
```
// Old:
registrar->apply(Owned<RegistryOperation>(new AdmitSlave(slaveInfo_)))
.<something> // this part is done inconsistently
// New:
applyRegistryOperation(Owned<RegistryOperation>(new AdmitSlave(slaveInfo_)))
.then( ... );
Future<bool> applyRegistryOperation(Owned<RegistryOperation>&& operation)
{
return registrar->apply(std::move(operation))
.onAbandoned(...) // LOG(FATAL) << ...;
.onDiscarded(...) // LOG(FATAL) << ...;
.onFailed(...) // EXIT(EXIT_FAILURE) << ...; ?
}
```
Some cases don't even handle the failures? :O
E.g.
https://github.com/apache/mesos/blob/f01853aea4eaa3df6dec3f7342e5583f5addd07d/src/master/master.cpp#L1745-L1746
Ideally, the return type of the registry apply operation would allow us to
distinguish between timeout and other failures. E.g.
`Future<variant<TimeoutError, bool>>`
- Benjamin Mahler
On Jan. 16, 2019, 11:07 p.m., Gilbert Song wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69775/
> -----------------------------------------------------------
>
> (Updated Jan. 16, 2019, 11:07 p.m.)
>
>
> Review request for mesos, Andrei Budnik, Benjamin Mahler, Greg Mann, and Qian
> Zhang.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> LOG(FATAL) would dump a stack trace which may confuse people with
> a master crash case. We should just print out an error msg.
>
>
> Diffs
> -----
>
> src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2
>
>
> Diff: https://reviews.apache.org/r/69775/diff/2/
>
>
> Testing
> -------
>
> N/A
>
>
> Thanks,
>
> Gilbert Song
>
>