I would second the suggestion of separate Mesos clusters for DC and AWS, with a layer on top for picking one or either based on the job SLAs and resource requirements. The local storage on cloud instances are more ephemeral than I'd expect the DC instances to be. So, persistent storage of job metadata needs consideration. Using something like DynamoDB may work, however, depending on the scale of your operations, you may have to plan for EC2 rate limiting its API calls and/or paying for higher IOPS for data storage/access. Treating the cloud instances as immutable infrastructure has additional benefits. For example, we deploy new Mesos master ASG for version upgrades, let them join the quorum, and then "tear down" the old master ASG. Same for agents. Although, for agent migration our framework does coordinate migration of jobs from old agent ASG to new one with some SLAs on not too many instances of a service being down at a time. Sort of what the maintenance primitives from Mesos aim to address.
On Thu, Jun 30, 2016 at 9:41 AM, Ken Sipe <[email protected]> wrote: > I would suggest a cluster on AWS and a cluster on-prem. Then tooling on > top to manage between the 2. > It is unlikely that a failure of a task on-prem should have a scheduled > replacement on AWS or vise versa. It is likely that you will end up > creating constraints to statically partition the clusters anyway IMO. > 2 Clusters eliminates most of your proposed questions. > > ken > > > On Jun 30, 2016, at 10:57 AM, Florian Pfeiffer <[email protected]> wrote: > > > > Hi, > > > > the last 2 years I managed a mesos cluster with bare-metal on-premise. > Now at my new company, the situation is a little bit different, and I'm > wondering if there are some kind of best practices: > > The company is in the middle of a transition from on-premise to AWS. The > old stuff is still running in the DC, the newer micro services are running > within autoscales groups on AWS and other AWS services like DynamoDB, > Kinesis and Lambda are also on the rise. > > > > So in my naive view of the world (where no problems occur..... never!) > I'm thinking that it would be great to span a hybrid mesos cluster over > AWS&DC to leverage the still available resources in the DC which gets more > and more underutilized over the time. > > > > Now my naive world view slowly crumbles, and I realize that I'm missing > the experience with AWS. Questions that are already popping up (beside all > those Questions, where I currently don't know that I will have them...) are: > > * Is Virtual Private Gateway to my VPC enough, or do I need to aim for a > Direct Connect? > > * Put everything into one Account, or use a Multi-Account strategy? > (Mainly to prevent things running amok and drag stuff down while running > into an account wide shared limit?) > > * Will e.g. DynamoDb be "fast" enough if it's accessed from the > Datacenter. > > > > I'll appreciate any feedback or lessons learned about that topic :) > > > > Thanks, > > Florian > > > >

