You can detect when we remove an agent due to health check failures via the metrics endpoint, but these are counters that are better used for alerting / dashboards for visibility. If you need to know which agents, you can also consume the logs as a stop-gap solution, until we offer a mechanism for subscribing to cluster events.
On Wed, Sep 16, 2015 at 10:11 AM, Paul Bell <[email protected]> wrote: > Hi All, > > I am led to believe that, unlike Marathon, Mesos doesn't (yet?) offer a > subscribable event bus. > > So I am wondering if there's a best practices way of determining if a > slave node has crashed. By "crashed" I mean something like the power plug > got yanked, or anything that would cause Mesos to stop talking to the slave > node. > > I suppose such information would be recorded in /var/log/mesos. > > Interested to learn how best to detect this. > > Thank you. > > -Paul >

