I'm curious to learn what's been going on in Mesos (and the general ecosystem) around service scheduling. In particular, I'm curious about how Mesos might work in a cluster where service tasks are more common than batch tasks, e.g., a cluster with a single framework for running stateless tasks and many frameworks for running stateful tasks.
I haven't been able to find much information about how exactly service scheduling fits with Mesos -- the dialogue is certainly skewed towards ephemeral / batch scheduling at the moment. With that in mind, I've tried to outline some topics I've been thinking about recently. What I'm really curious to know is: 1. Am I way off track? 2. For a service scheduler built today, how much is Mesos responsible for and how much the framework? What about going forward? 3. Are there already some patterns/idioms for these kinds of things in existing frameworks? # Balancing tasks within a framework For this, imagine a framework that schedules long-lived (service), stateless tasks. - If asked to schedule a task with comparatively large resource requirements, the task may never get scheduled if it waits for a sufficiently large resource offer. Instead, it should attempt to reschedule existing tasks to "make room" for it. How might that work? - If asked to schedule multiple copies of a task across different machines, some copies may never get scheduled if it waits for a sufficiently diverse set of resource offers. Instead, it should reschedule existing tasks to meet the availability requirements of the task. What might that look like? Maybe both of these could be accomplished by using some combination of: - using `requestResources` when large tasks are requested to try and get bigger offers. - using saved offers to relaunch existing tasks, and then hoarding the freed resources for scheduling new tasks. # Resource contention / balancing tasks across frameworks For this, imagine there are two frameworks, one like above, running stateless service tasks, the other responsible for a single stateful task. Again, the cluster is relatively full. - If the stateful scheduler wants to run its task on a particular machine, but that machine's resources are currently consumed by the other framework, what happens? - If the stateful scheduler can run its task on any machine, but there exists no single offer sufficiently large to run the task, what does it do? Some possible ways to approach this: - The ability to request that other frameworks release their saved offers, as the resources may actually be available, but currently hoarded. I think `requestResources` on the scheduler might do this? - The ability to request that other frameworks reschedule existing tasks. This could be a "user-land" feature? If I have a particular slave in mind to run my task and there is a way to find frameworks with tasks on that slave, I could randomly send some kind of "reschedule" message to one of the frameworks. This message might include the slave, my requested resources, and a priority understood by all of my frameworks. The other framework could then compare its priority with the message, and decide whether it should reschedule. Cheers, Bernerd Engineer @ SoundCloud