This is great info Evan, especially coming from a production experience. Thanks for sharing it !
On Thu, Mar 31, 2016 at 1:49 PM, Evan Krall <[email protected]> wrote: > On Wed, Mar 30, 2016 at 6:56 PM, Jeff Schroeder < > [email protected]> wrote: > >> Given regional bare metal Mesos clusters on multiple continents, are >> there any known issues running some of the agents over the WAN? Is anyone >> else doing it, or is this a terrible idea that I should tell management no >> on? >> >> A few specifics: >> >> 1. Are there any known limitations or configuration gotchas I might >> encounter? >> > > One thing to keep in mind is that the masters maintain a distributed log > through a consensus protocol, so there needs to be a quorum of masters that > can talk to each other in order to operate. Consensus protocols tend to be > very latency-sensitive, so you probably want to keep masters near each > other. > > Some of our clusters span semi-wide geographical regions (in production, > up to about 5 milliseconds RTT between master and some slaves). So far, we > haven't seen any issues caused by that amount of latency, and I believe we > have clusters in non-production environments which have even higher round > trip between slaves and masters, and work fine. I haven't benchmarked task > launch time or anything like that, so I can't say how much it affects the > speed of operations. > > Mesos generally does the right thing around network partitions (changes > won't propagate, but it won't kill your tasks), but if you're running > things in Marathon and using TCP or HTTP healthchecks, be aware that > Marathon does not rate limit itself on issuing task kills > <https://github.com/mesosphere/marathon/issues/3317> for healthcheck > failures. This means during a network partition, your applications will be > fine, but once the network partition heals (or if you're experiencing > packet loss but not total failure), Marathon will suddenly kill all of the > tasks on the far side of the partition. A workaround for that is to use > command health checks, which are run by the mesos slave. > > >> 2. Does setting up ZK observers in each non-primary dc and pointing the >> agents at them exclusively make sense? >> > > My understanding of ZK observers is that they proxy writes to the actual > ZK quorum members, so this would probably be fine. mesos-slave uses ZK to > discover masters, and mesos-master uses ZK to do leader election; only > mesos-master is doing any writes to ZK. > > I'm not sure how often mesos-slave reads from ZK to get the list of > masters; I assume it doesn't bother if it has a live connection to a master. > > >> 4. Any suggestions on how best to do agent attributes / constraints for >> something like this? I was planning on having the config management add a >> "data_center" agent attribute to match on. >> > > If you're running services on Marathon or similar, I'd definitely > recommend exposing the location of the slaves as an attribute, and having > constraints to keep different instances of your application spread across > the different locations. The "correct" constraints to apply depends on your > application and latency / failure sensitivity. > > Evan > > >> Thanks! >> >> [1] >> https://github.com/kubernetes/kubernetes/blob/8813c955182e3c9daae68a8257365e02cd871c65/release-0.19.0/docs/proposals/federation.md#kubernetes-cluster-federation >> >> -- >> Jeff Schroeder >> >> Don't drink and derive, alcohol and analysis don't mix. >> http://www.digitalprognosis.com >> > >

