There are a few things going on there, you're having ZooKeeper connectivity
issues. And the master is not able to health check the agent.

I would recommend triaging what occurred in your network, but you can
increase the master's health check timeouts as a mitigation. You can also
control the maximum rate at which the master removes unhealthy agents.

On Sun, Oct 4, 2015 at 1:45 AM, Jeremy Olexa <[email protected]> wrote:

> Hello,
>
>
> We have been observing some agent processes disconnects when our agent
> processes are in another datacenter, A, and accessing the master cluster in
> datacenter B. I would like to mitigate this issue because it ejects all the
> applications running and then all of the sandbox links, etc, are not
> available because the slave is "lost"
>
>
> I have attached the disconnect portion of the log here:
>
> https://gist.github.com/jolexa/1a80e26a4b017846d083
>
> I am curious if anyone can offer some advice on making the relevant Mesos
> processes more resilient in this regard. I'm confused on all the timeout
> options and I don't know exactly what to tweak safely.
>
>
> Thanks for any assistance!
>
> -Jeremy
>
>
>

Reply via email to