Re: Detecting slave crashes event

Benjamin Mahler Wed, 23 Sep 2015 10:31:23 -0700

I believe some of the contributors from Mesosphere have been thinking about
it, but not sure on the plans. I'll let them reply here.


On Wed, Sep 16, 2015 at 11:11 AM, Paul Bell <[email protected]> wrote:

> Thank you, Benjamin.
>
> So, I could periodically request the metrics endpoint, or stream the logs
> (maybe via mesos.cli; or SSH)? What, roughly, does the "agent removed"
> message look like in the logs?
>
> Are there plans to offer a mechanism for event subscription?
>
> Cordially,
>
> Paul
>
>
>
> On Wed, Sep 16, 2015 at 1:30 PM, Benjamin Mahler <
> [email protected]> wrote:
>
>> You can detect when we remove an agent due to health check failures via
>> the metrics endpoint, but these are counters that are better used for
>> alerting / dashboards for visibility. If you need to know which agents, you
>> can also consume the logs as a stop-gap solution, until we offer a
>> mechanism for subscribing to cluster events.
>>
>> On Wed, Sep 16, 2015 at 10:11 AM, Paul Bell <[email protected]> wrote:
>>
>>> Hi All,
>>>
>>> I am led to believe that, unlike Marathon, Mesos doesn't (yet?) offer a
>>> subscribable event bus.
>>>
>>> So I am wondering if there's a best practices way of determining if a
>>> slave node has crashed. By "crashed" I mean something like the power plug
>>> got yanked, or anything that would cause Mesos to stop talking to the slave
>>> node.
>>>
>>> I suppose such information would be recorded in /var/log/mesos.
>>>
>>> Interested to learn how best to detect this.
>>>
>>> Thank you.
>>>
>>> -Paul
>>>
>>
>>
>

Re: Detecting slave crashes event

Reply via email to