Re: Mesos sometimes not allocating the entire cluster

Erb, Stephan Wed, 20 Jan 2016 15:20:52 -0800

Hi Tom,


have you checked if any of the starvation issues explained in the Ebay blog 
post applies to you as well?


http://www.ebaytechblog.com/2014/04/04/delivering-ebays-ci-solution-with-apache-mesos-part-i/?


Best Regards,

Stephan

________________________________
From: Tom Arnfeld <[email protected]>
Sent: Wednesday, January 20, 2016 7:19 PM
To: [email protected]
Subject: Mesos sometimes not allocating the entire cluster

Hey,

I've noticed some interesting behaviour recently when we have lots of different 
frameworks connected to our Mesos cluster at once, all using a variety of 
different shares. Some of the frameworks don't get offered more resources (for 
long periods of time, hours even) leaving the cluster under utilised.

Here's an example state where we see this happen..

Framework 1 - 13% (user A)
Framework 2 - 22% (user B)
Framework 3 - 4% (user C)
Framework 4 - 0.5% (user C)
Framework 5 - 1% (user C)
Framework 6 - 1% (user C)
Framework 7 - 1% (user C)
Framework 8 - 0.8% (user C)
Framework 9 - 11% (user D)
Framework 10 - 7% (user C)
Framework 11 - 1% (user C)
Framework 12 - 1% (user C)
Framework 13 - 6% (user E)

In this example, there's another ~30% of the cluster that is unallocated, and 
it stays like this for a significant amount of time until something changes, 
perhaps another user joins and allocates the rest.... chunks of this spare 
resource is offered to some of the frameworks, but not all of them.

I had always assumed that when lots of frameworks were involved, eventually the 
frameworks that would keep accepting resources indefinitely would consume the 
remaining resource, as every other framework had rejected the offers.

Could someone elaborate a little on how the DRF allocator / sorter handles this 
situation, is this likely to be related to the different users being used? Is 
there a way to mitigate this?

We're running version 0.23.1.

Cheers,

Tom.

Re: Mesos sometimes not allocating the entire cluster

Reply via email to