-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69775/#review212092
-----------------------------------------------------------



Thanks for doing this, this will avoid a lot of confusion around the master 
recovery failure case!

Can you list all the `fail()` cases and make sure that they will output a clear 
message now that there's no stack trace?

* failure to recover master (this definitely shouldn't stack trace, will help 
avoid a lot of confusion to remove the stack trace)
* failure to mark agent unreachable (it's odd that this particular registry 
operation is handled via `fail()` and the others are not)
* failure to acquire agent removal rate limit token (this should never fail and 
so stack trace is actually desirable?)

I'm also inclined to not keep `fail()` and either use lambdas or have wrappers 
for the common cases. For example, have all registry operations go through the 
same wrapper:

```
  // Old:
  registrar->apply(Owned<RegistryOperation>(new AdmitSlave(slaveInfo_)))
    .<something> // this part is done inconsistently
    
  // New:
  applyRegistryOperation(Owned<RegistryOperation>(new AdmitSlave(slaveInfo_)))
    .then( ... );
    

Future<bool> applyRegistryOperation(Owned<RegistryOperation>&& operation)
{
  return registrar->apply(std::move(operation))
    .onAbandoned(...) // LOG(FATAL) << ...;
    .onDiscarded(...) // LOG(FATAL) << ...;
    .onFailed(...) // EXIT(EXIT_FAILURE) << ...; ?
}
```

Some cases don't even handle the failures? :O

E.g. 
https://github.com/apache/mesos/blob/f01853aea4eaa3df6dec3f7342e5583f5addd07d/src/master/master.cpp#L1745-L1746

Ideally, the return type of the registry apply operation would allow us to 
distinguish between timeout and other failures. E.g. 
`Future<variant<TimeoutError, bool>>`

- Benjamin Mahler


On Jan. 16, 2019, 11:07 p.m., Gilbert Song wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69775/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2019, 11:07 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Benjamin Mahler, Greg Mann, and Qian 
> Zhang.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> LOG(FATAL) would dump a stack trace which may confuse people with
> a master crash case. We should just print out an error msg.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2 
> 
> 
> Diff: https://reviews.apache.org/r/69775/diff/2/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>

Reply via email to