Hi Tom, Which framework are you using, e.g. Swarm, Marathon or something else? and which language package are you using?
DRF will sort role/framework by allocation ratio, and offer all "available" resources by slave; but if the resources it too small (< 0.1CPU) or the resources was reject/declined by framework, the resources will not offer it until filter timeout. For example, in Swarm 1.0, the default filter timeout 5s (because of go scheduler API); so here is case that may impact the utilisation: the Swarm got one slave with 16 CPUS, but only launch one container with 1 CPUS; the other 15 CPUS will return back to master and did not re-offer until filter timeout (5s). I had pull a request to make Swarm's parameters configurable, refer to https://github.com/docker/swarm/pull/1585. I think you can check this case by master log. If any comments, please let me know. ---- Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer Platform OpenSource Technology, STG, IBM GCG +86-10-8245 4084 | [email protected] | http://k82.me On Thu, Jan 21, 2016 at 2:19 AM, Tom Arnfeld <[email protected]> wrote: > Hey, > > I've noticed some interesting behaviour recently when we have lots of > different frameworks connected to our Mesos cluster at once, all using a > variety of different shares. Some of the frameworks don't get offered more > resources (for long periods of time, hours even) leaving the cluster under > utilised. > > Here's an example state where we see this happen.. > > Framework 1 - 13% (user A) > Framework 2 - 22% (user B) > Framework 3 - 4% (user C) > Framework 4 - 0.5% (user C) > Framework 5 - 1% (user C) > Framework 6 - 1% (user C) > Framework 7 - 1% (user C) > Framework 8 - 0.8% (user C) > Framework 9 - 11% (user D) > Framework 10 - 7% (user C) > Framework 11 - 1% (user C) > Framework 12 - 1% (user C) > Framework 13 - 6% (user E) > > In this example, there's another ~30% of the cluster that is unallocated, > and it stays like this for a significant amount of time until something > changes, perhaps another user joins and allocates the rest.... chunks of > this spare resource is offered to some of the frameworks, but not all of > them. > > I had always assumed that when lots of frameworks were involved, > eventually the frameworks that would keep accepting resources indefinitely > would consume the remaining resource, as every other framework had rejected > the offers. > > Could someone elaborate a little on how the DRF allocator / sorter handles > this situation, is this likely to be related to the different users being > used? Is there a way to mitigate this? > > We're running version 0.23.1. > > Cheers, > > Tom. >

