Hi Hans,

The biggest thing to note here is that we (in retrospect) made the mistake
a long time ago of offering resources as non-revocable by default. We'd
like to change this default so that frameworks only receive non-revocable
resources when they have quota or reservations in place. While I don't have
enough information to comment on your exact scenario, it's worth mentioning
that we do not have the ability to revoke resources yet and so if there are
long-running executors we can get into situations where fairness is not
respected. For example, if framework 1 arrives before framework 2 and takes
all of the resources, framework 2 will be starved since mesos cannot take
action to revoke and provide fairness. We're currently looking at making
things revocable by default so that mesos can dynamically maintain fairness
via revocation. In that world, you should see weighted fairness maintained,
and you would use quota and/or reservations to provide guarantees to
frameworks.

Hope that helps you diagnose further and get some context on the (current)
caveats!

Ben

On Tue, Jan 26, 2016 at 5:43 AM, Hans van den Bogert <[email protected]>
wrote:

> Hi,
>
> While investigating fairness possibilities with Mesos for Spark workloads
> I’m trying to achieve for example a 4:1 weight ratio for two frameworks.
> Imagine a system with two Spark frameworks (in fine-grained mode if you’re
> familiar with Spark) and I want one the two frameworks to get four times
> more resources than the other when both are contending for resources.
>
> In mesos I set two roles “F1” “F2", with a weight of 4 and 1 respectively.
>
> However during the times when both frameworks are in need of resources the
> latter gets close to zero offers. Having read and more carefully
> investigated DRF I understood that memory is the dominant resource in the
> case of framework 2 (F2) which Spark sets statically, i.e., it doesn’t
> release once acquired, and in my case that is ~25% per slave.  So the
> allocator thinks that F2 has received enough resources, since its dominant
> resource is already above if weighted fair share. Thus all CPU offers go to
> framework 1 (F1).
> To remedy this first hurdle I recalculate, although somewhat contrived,
> the ratio to 3.2 ( =80% / 25% ).
> After using the 3.2:1 ratio things are a bit better but still framework 2
> (F2), during high resource demand of both frameworks, only gets half of the
> resources it should get.
>
> At this point I was quite lost and tried changing several parameters, on
> of them was the allocation interval (master option --allocation_interval)
> and set it to a relatively low 50ms instead of the default 1000ms.
> Suddenly my ratio was being honored perfectly and I was getting roughly a
> 4:1 CPU ratio between the two Spark frameworks. (Verifying that my ratio
> 3.2:1, to circumvent spark’s static memory allocation, was working. )
>
> Perhaps it’s because I’m using only 10 physical nodes, however I made
> unit-tests in the mesos-source to mimic my case, and there I could verify
> that the offers are made fairly according to the weights.
>
> Why is the fairness, expressed as being close as close to the defined
> role-weights, only honored when the allocation interval is relatively low?
> Hope someone can explain the phenomenon.
>
>
> Thanks,
>
> Hans

Reply via email to