You would be surprised how far just scaling when resources offers are 'tight' and keeping track of idle CPU for each slave to shut then down can take you.
-Jason > On May 30, 2014, at 5:57 PM, Diptanu Choudhury <[email protected]> wrote: > > Hi, > > I am currently working on designing an auto-scaling solution for Mesos slaves > in AWS and would love to get some feedback around that. There are a couple of > ways for doing it, and I was thinking to start with simple cases first - > > a. Define the lowest resource offer a framework can afford to get and then we > start using the information published by Mesos master in states.json to > determine if the cluster has enough resources. If we see that the available > resources won't satisfy the lower bounds set, we bring up new EC2 instances > with enough resources that Mesos could use to make offers. > > b. Latency for getting an offer for a given job. Say that the framework has a > job which needs x cpu, y memory and y ports. If the framework doesn't get an > offer until t amount of time, the ASG with slaves of EC2 instance type which > can offer that amount of resource is autoscaled. > > c. Maintain historical information about the resources used, jobs submitted > and running in Mesos and use that information for doing Predictive > autoscaling. > > I would like to understand if potentially there are better ways of achieving > elasticity in a Mesos cluster and where the complexity lies, information that > Mesos could provide us to make it more efficient. > > -- > Thanks, > Diptanu Choudhury > Web - www.linkedin.com/in/diptanu > Twitter - @diptanu

