Hi, I am currently working on designing an auto-scaling solution for Mesos slaves in AWS and would love to get some feedback around that. There are a couple of ways for doing it, and I was thinking to start with simple cases first -
a. Define the lowest resource offer a framework can afford to get and then we start using the information published by Mesos master in states.json to determine if the cluster has enough resources. If we see that the available resources won't satisfy the lower bounds set, we bring up new EC2 instances with enough resources that Mesos could use to make offers. b. Latency for getting an offer for a given job. Say that the framework has a job which needs x cpu, y memory and y ports. If the framework doesn't get an offer until t amount of time, the ASG with slaves of EC2 instance type which can offer that amount of resource is autoscaled. c. Maintain historical information about the resources used, jobs submitted and running in Mesos and use that information for doing Predictive autoscaling. I would like to understand if potentially there are better ways of achieving elasticity in a Mesos cluster and where the complexity lies, information that Mesos could provide us to make it more efficient. -- Thanks, Diptanu Choudhury Web - www.linkedin.com/in/diptanu Twitter - @diptanu <http://twitter.com/diptanu>

