Re: Service Scheduling in Mesos

Bill Farner Thu, 19 Sep 2013 10:38:28 -0700

(apologies if this breaks threading, i'm replying after subscribing this
email address)

Great questions!  Some responses below from my experience and perspective
formed while working on Aurora.

2. For a service scheduler built today, how much is Mesos responsible for
> and
> how much the framework? What about going forward?

It seems the most natural behavior is for mesos to notify the framework of
events, and the framework to apply those events to its state (persistent or
otherwise).  Notably missing are APIs to help reconcile state mismatches.
 Aurora uses framework messages and a special executor to assist in this
(i.e. compare what the scheduler thinks is on a machine versus what's
actually there).  Never mind whose fault the state mismatch is, it can
happen due to bugs or feature gaps on either side; and it's nice when the
system can auto-correct.

> - If asked to schedule a task with comparatively large resource
> requirements,
>   the task may never get scheduled if it waits for a sufficiently large
>   resource offer. Instead, it should attempt to reschedule existing tasks
> to
>   "make room" for it. How might that work?

Two approaches can help here: proactive defragmentation (induce fewer
larger resource offers), and preemption (create space on-demand).  I
haven't found a great way to approach either of these in mesos without
assuming that your framework has full control of the cluster.  This is
covered a bit in the Omega paper [1]:

*While a Mesos framework can use “ﬁlters” to describe **the kinds of
resources that it would like to be offered, it does **not have access to a
view of the overall cluster state – just the **resources it has been
offered. As a result, it cannot support **preemption or policies requiring
access to the whole cluster **state: a framework simply does not have any
knowledge **of resources that have been allocated to other schedulers.*

- If asked to schedule multiple copies of a task across different machines,
>   some copies may never get scheduled if it waits for a sufficiently
> diverse
>   set of resource offers. Instead, it should reschedule existing tasks to
>   meet the availability requirements of the task. What might that look
> like?

Aurora accepts this possibility on the assumption that stateless tasks
don't need all of their tasks for nominal operation (i.e. they're usually
intentionally over-provisioned).  However, our only strategy to converge
towards zero pending tasks is priority-based preemption.

- using saved offers to relaunch existing tasks, and then hoarding the freed
>   resources for scheduling new tasks.

This is done in aurora, though currently for a different reason (finding
the best offer for a task rather thank choosing the first fit).  The risk
with this approach is that you wind up not playing nicely with other
frameworks, possibly starving them of offers.  Unfortunately this is the
best way i've found to glean the shape of the cluster.

- If the stateful scheduler wants to run its task on a particular machine,
> but
>   that machine's resources are currently consumed by the other framework,
> what
>   happens?

Aurora cheats here by 'pinning' tasks to the same machines all the time,
and (currently) not running anything else on those machines.  Of course,
this strategy falls apart when other frameworks are introduced.  I believe
mesos' reservations feature intends to address this.

-=Bill

[1]
http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf

On Wed, Sep 18, 2013 at 10:50 AM, Vinod Kone <vi...@twitter.com> wrote:

>
>
>
> @vinodkone
>
>
> On Wed, Sep 18, 2013 at 4:36 AM, Bernerd Schaefer 
> <bern...@soundcloud.com>wrote:
>
>> I'm curious to learn what's been going on in Mesos (and the general
>> ecosystem) around
>> service scheduling. In particular, I'm curious about how Mesos might work
>> in a
>> cluster where service tasks are more common than batch tasks, e.g., a
>> cluster
>> with a single framework for running stateless tasks and many frameworks
>> for
>> running stateful tasks.
>>
>> I haven't been able to find much information about how exactly service
>>  scheduling fits with Mesos -- the dialogue is certainly skewed towards
>> ephemeral / batch scheduling at the moment. With that in mind, I've tried
>> to
>> outline some topics I've been thinking about recently. What I'm really
>> curious
>> to know is:
>>
>> 1. Am I way off track?
>> 2. For a service scheduler built today, how much is Mesos responsible for
>> and
>>    how much the framework? What about going forward?
>> 3. Are there already some patterns/idioms for these kinds of things in
>> existing
>>    frameworks?
>>
>> # Balancing tasks within a framework
>>
>> For this, imagine a framework that schedules long-lived (service),
>> stateless
>> tasks.
>>
>> - If asked to schedule a task with comparatively large resource
>> requirements,
>>   the task may never get scheduled if it waits for a sufficiently large
>>   resource offer. Instead, it should attempt to reschedule existing tasks
>> to
>>   "make room" for it. How might that work?
>>
>> - If asked to schedule multiple copies of a task across different
>> machines,
>>   some copies may never get scheduled if it waits for a sufficiently
>> diverse
>>   set of resource offers. Instead, it should reschedule existing tasks to
>>   meet the availability requirements of the task. What might that look
>> like?
>>
>> Maybe both of these could be accomplished by using some combination of:
>>
>> - using `requestResources` when large tasks are requested to try and get
>> bigger
>>   offers.
>>
>> - using saved offers to relaunch existing tasks, and then hoarding the
>> freed
>>   resources for scheduling new tasks.
>>
>> # Resource contention / balancing tasks across frameworks
>>
>> For this, imagine there are two frameworks, one like above, running
>> stateless
>> service tasks, the other responsible for a single stateful task. Again,
>> the
>> cluster is relatively full.
>>
>> - If the stateful scheduler wants to run its task on a particular
>> machine, but
>>   that machine's resources are currently consumed by the other framework,
>> what
>>   happens?
>>
>> - If the stateful scheduler can run its task on any machine, but there
>> exists
>>   no single offer sufficiently large to run the task, what does it do?
>>
>> Some possible ways to approach this:
>>
>> - The ability to request that other frameworks release their saved
>> offers, as
>>   the resources may actually be available, but currently hoarded. I think
>>   `requestResources` on the scheduler might do this?
>>
>> - The ability to request that other frameworks reschedule existing tasks.
>> This
>>   could be a "user-land" feature? If I have a particular slave in mind to
>> run
>>   my task and there is a way to find frameworks with tasks on that slave,
>> I
>>   could randomly send some kind of "reschedule" message to one of the
>>   frameworks. This message might include the slave, my requested
>> resources, and
>>   a priority understood by all of my frameworks. The other framework
>> could then
>>   compare its priority with the message, and decide whether it should
>>   reschedule.
>>
>> Cheers,
>>
>> Bernerd
>> Engineer @ SoundCloud
>>
>
>

Re: Service Scheduling in Mesos

Reply via email to