Hey, We've been running mesos slaves across sites, most in a private cloud off site and using AWS EC2 for extra burst capacity when required, across a Direct Connect link. We've found this model to work well on the mesos side, though it's key to understand the interaction between tasks running across multiple sites at the same time so you're aware of the effects of latency and throughput (e.g data transfer if you're using Hadoop).
All of our master nodes (and zookeeper quorum) are in the same site (though not physical location), but this isn't an issue for us since we're not using AWS as a mechanism for redundancy. I'm not aware of any built-in resource unit for network latency or throughput, but I don't see any reason you couldn't specify your own on each slave and configure frameworks to take that into account when making scheduling decisions. The recent addition of the network isolator ( http://mesos.apache.org/documentation/latest/network-monitoring/) might also be of use to you here. Very interested in what others are doing in this space. Tom. On 26 August 2014 09:19, Yaron Rosenbaum <[email protected]> wrote: > Hi > > Here's a crazy idea: > Is it possible / has anyone tried to run Mesos where the slaves are in > radically different network zones? For example: A few slaves on Azure, a > few slaves on AWS, and a bunch of other slaves on premises etc. > > - Assuming it's possible, is it possible to define resource > requirements for tasks, in terms of 'access to network resource A with less > than X latency and throughput between i and m' for example? > - Masters would probably have to be 'close' to each other, to prevent > 'brain-splits', true or not ? > - If so, then how does one assure Master HA ? > > > I've been thinking about this for a while, and can't find a reason 'why > not'. > > Please share your thoughts on the subject. > > (Y) > >

