Re: Mesos sometimes not allocating the entire cluster

Klaus Ma Wed, 20 Jan 2016 16:27:07 -0800

Hi Tom,

Which framework are you using, e.g. Swarm, Marathon or something else? and
which language package are you using?

DRF will sort role/framework by allocation ratio, and offer all "available"
resources by slave; but if the resources it too small (< 0.1CPU) or the
resources was reject/declined by framework, the resources will not offer it
until filter timeout. For example, in Swarm 1.0, the default filter timeout
5s (because of go scheduler API); so here is case that may impact the
utilisation: the Swarm got one slave with 16 CPUS, but only launch one
container with 1 CPUS; the other 15 CPUS will return back  to master and
did not re-offer until filter timeout (5s).
I had pull a request to make Swarm's parameters configurable, refer to
https://github.com/docker/swarm/pull/1585. I think you can check this case
by master log.

If any comments, please let me know.

----
Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
Platform OpenSource Technology, STG, IBM GCG
+86-10-8245 4084 | [email protected] | http://k82.me

On Thu, Jan 21, 2016 at 2:19 AM, Tom Arnfeld <[email protected]> wrote:

> Hey,
>
> I've noticed some interesting behaviour recently when we have lots of
> different frameworks connected to our Mesos cluster at once, all using a
> variety of different shares. Some of the frameworks don't get offered more
> resources (for long periods of time, hours even) leaving the cluster under
> utilised.
>
> Here's an example state where we see this happen..
>
> Framework 1 - 13% (user A)
> Framework 2 - 22% (user B)
> Framework 3 - 4% (user C)
> Framework 4 - 0.5% (user C)
> Framework 5 - 1% (user C)
> Framework 6 - 1% (user C)
> Framework 7 - 1% (user C)
> Framework 8 - 0.8% (user C)
> Framework 9 - 11% (user D)
> Framework 10 - 7% (user C)
> Framework 11 - 1% (user C)
> Framework 12 - 1% (user C)
> Framework 13 - 6% (user E)
>
> In this example, there's another ~30% of the cluster that is unallocated,
> and it stays like this for a significant amount of time until something
> changes, perhaps another user joins and allocates the rest.... chunks of
> this spare resource is offered to some of the frameworks, but not all of
> them.
>
> I had always assumed that when lots of frameworks were involved,
> eventually the frameworks that would keep accepting resources indefinitely
> would consume the remaining resource, as every other framework had rejected
> the offers.
>
> Could someone elaborate a little on how the DRF allocator / sorter handles
> this situation, is this likely to be related to the different users being
> used? Is there a way to mitigate this?
>
> We're running version 0.23.1.
>
> Cheers,
>
> Tom.
>

Re: Mesos sometimes not allocating the entire cluster

Reply via email to