On Thursday, March 31, 2016, Evan Krall <kr...@yelp.com> wrote:

> On Wed, Mar 30, 2016 at 6:56 PM, Jeff Schroeder <
> jeffschroe...@computer.org
> <javascript:_e(%7B%7D,'cvml','jeffschroe...@computer.org');>> wrote:
>
>> Given regional bare metal Mesos clusters on multiple continents, are
>> there any known issues running some of the agents over the WAN? Is anyone
>> else doing it, or is this a terrible idea that I should tell management no
>> on?
>>
>> A few specifics:
>>
>> 1. Are there any known limitations or configuration gotchas I might
>> encounter?
>>
>
> One thing to keep in mind is that the masters maintain a distributed log
> through a consensus protocol, so there needs to be a quorum of masters that
> can talk to each other in order to operate. Consensus protocols tend to be
> very latency-sensitive, so you probably want to keep masters near each
> other.
>
>
> Some of our clusters span semi-wide geographical regions (in production,
> up to about 5 milliseconds RTT between master and some slaves). So far, we
> haven't seen any issues caused by that amount of latency, and I believe we
> have clusters in non-production environments which have even higher round
> trip between slaves and masters, and work fine. I haven't benchmarked task
> launch time or anything like that, so I can't say how much it affects the
> speed of operations.
>
> Mesos generally does the right thing around network partitions (changes
> won't propagate, but it won't kill your tasks), but if you're running
> things in Marathon and using TCP or HTTP healthchecks, be aware that
> Marathon does not rate limit itself on issuing task kills
> <https://github.com/mesosphere/marathon/issues/3317> for healthcheck
> failures. This means during a network partition, your applications will be
> fine, but once the network partition heals (or if you're experiencing
> packet loss but not total failure), Marathon will suddenly kill all of the
> tasks on the far side of the partition. A workaround for that is to use
> command health checks, which are run by the mesos slave.
>

Right. Due to this I didn't plan on having masters across a WAN, just
agents. That makes a ton of sense about Marathon due to the elected leader
doing health checks, thanks for pointing it out. I'm also assuming if using
something like Aurora where the health checks are part of the executor this
would not be an issue. Any idea if there is a way to distribute the
healthchecks with Marathon or should I ask on their ML?


> 2. Does setting up ZK observers in each non-primary dc and pointing the
>> agents at them exclusively make sense?
>>
>
> My understanding of ZK observers is that they proxy writes to the actual
> ZK quorum members, so this would probably be fine. mesos-slave uses ZK to
> discover masters, and mesos-master uses ZK to do leader election; only
> mesos-master is doing any writes to ZK.
>
> I'm not sure how often mesos-slave reads from ZK to get the list of
> masters; I assume it doesn't bother if it has a live connection to a master.
>
>
>> 4. Any suggestions on how best to do agent attributes / constraints for
>> something like this? I was planning on having the config management add a
>> "data_center" agent attribute to match on.
>>
>
> If you're running services on Marathon or similar, I'd definitely
> recommend exposing the location of the slaves as an attribute, and having
> constraints to keep different instances of your application spread across
> the different locations. The "correct" constraints to apply depends on your
> application and latency / failure sensitivity.
>
> Evan
>

This is exactly what I was looking for, thank you for sharing your
experience.


-- 
Text by Jeff, typos by iPhone

Reply via email to