Bernerd,

You should really out Marathon https://github.com/mesosphere/marathon
This fits closely for what you've described ;)




On Wed, Sep 18, 2013 at 4:36 AM, Bernerd Schaefer <bern...@soundcloud.com>wrote:

> I'm curious to learn what's been going on in Mesos (and the general
> ecosystem) around
> service scheduling. In particular, I'm curious about how Mesos might work
> in a
> cluster where service tasks are more common than batch tasks, e.g., a
> cluster
> with a single framework for running stateless tasks and many frameworks for
> running stateful tasks.
>
> I haven't been able to find much information about how exactly service
>  scheduling fits with Mesos -- the dialogue is certainly skewed towards
> ephemeral / batch scheduling at the moment. With that in mind, I've tried
> to
> outline some topics I've been thinking about recently. What I'm really
> curious
> to know is:
>
> 1. Am I way off track?
> 2. For a service scheduler built today, how much is Mesos responsible for
> and
>    how much the framework? What about going forward?
> 3. Are there already some patterns/idioms for these kinds of things in
> existing
>    frameworks?
>
> # Balancing tasks within a framework
>
> For this, imagine a framework that schedules long-lived (service),
> stateless
> tasks.
>
> - If asked to schedule a task with comparatively large resource
> requirements,
>   the task may never get scheduled if it waits for a sufficiently large
>   resource offer. Instead, it should attempt to reschedule existing tasks
> to
>   "make room" for it. How might that work?
>
> - If asked to schedule multiple copies of a task across different machines,
>   some copies may never get scheduled if it waits for a sufficiently
> diverse
>   set of resource offers. Instead, it should reschedule existing tasks to
>   meet the availability requirements of the task. What might that look
> like?
>
> Maybe both of these could be accomplished by using some combination of:
>
> - using `requestResources` when large tasks are requested to try and get
> bigger
>   offers.
>
> - using saved offers to relaunch existing tasks, and then hoarding the
> freed
>   resources for scheduling new tasks.
>
> # Resource contention / balancing tasks across frameworks
>
> For this, imagine there are two frameworks, one like above, running
> stateless
> service tasks, the other responsible for a single stateful task. Again, the
> cluster is relatively full.
>
> - If the stateful scheduler wants to run its task on a particular machine,
> but
>   that machine's resources are currently consumed by the other framework,
> what
>   happens?
>
> - If the stateful scheduler can run its task on any machine, but there
> exists
>   no single offer sufficiently large to run the task, what does it do?
>
> Some possible ways to approach this:
>
> - The ability to request that other frameworks release their saved offers,
> as
>   the resources may actually be available, but currently hoarded. I think
>   `requestResources` on the scheduler might do this?
>
> - The ability to request that other frameworks reschedule existing tasks.
> This
>   could be a "user-land" feature? If I have a particular slave in mind to
> run
>   my task and there is a way to find frameworks with tasks on that slave, I
>   could randomly send some kind of "reschedule" message to one of the
>   frameworks. This message might include the slave, my requested
> resources, and
>   a priority understood by all of my frameworks. The other framework could
> then
>   compare its priority with the message, and decide whether it should
>   reschedule.
>
> Cheers,
>
> Bernerd
> Engineer @ SoundCloud
>

Reply via email to