Re: Resource allocation cycle in DRF for multiple frameworks

Benjamin Mahler Tue, 05 Dec 2017 18:29:21 -0800

Q1: we randomly sort the agents, so the pseudo-code I showed is:

- for each agent:
+ for each agent in random_sort(agents):


Q2: It depends on which version you're running. We used to immediately
re-offer, but this was problematic since it kept going back to the same
framework when using a low timeout. Now, the current implementation won't
immediately re-offer it in an attempt to let it go to another framework
during the next allocation "cycle":

https://github.com/apache/mesos/blob/1.4.0/src/master/
allocator/mesos/hierarchical.cpp#L1202-L1213

Q3: We implement best-effort DRF to improve utilization. That is, we let a
role go above its fair share if the lower share roles do not want the
resources, and a role may have to wait for the resources to be released
before it can get its fair share (since we cannot revoke resources). So, we
increase utilization at the cost of no longer providing a guarantee that a
role can get its fair share without waiting! In the future, we will use
revocation to ensure a user is guaranteed to get their fair share without
having to wait.

On Tue, Dec 5, 2017 at 9:04 AM, bigggyan <[email protected]> wrote:

> Hi Benjamin,
> Thanks for the clear explanation. This loop structure makes it clear to
> understand how resource allocation is actually happening inside mesos
> master allocation module. However I have few quires. I will try to ask
> questions to clarify them. My goal is to understand how DRF is implemented
> in Apache Mesos based on the DRF paper. I am doing this for an academic
> project to develop a custom framework.
> I am using few in-house frameworks along with Mesosphere Marathon and
> Chronos. I am using default role and no weigh to any frameworks and
> constraint. so  the loop becomes simpler.
>
> I understand that there exists no such cycle, but what I meant was the end
> of the outer loop when all the agents are allocated to frameworks.
>
> Q1: the loop "for each agent" : how one agent is being picked over other
> agents, to be assigned to a framework?
> Q2: now after all the agents are allocated to available frameworks, each
> framework can decide whether to use it or not. So the question is: what if
> a framework rejects a offer with 0 second filter duration, can it be
> offered to the same framework due to its low dominant share again ?  or is
> there any penalty that a rejected offer can not be immediately offered to
> the same framework?
>
> let me explain why this is important to know:
> User A may be using 80% of the share and user B is receiving the rest of
> the offers first, because of its low share, but rejecting offers due to no
> pending tasks to launch. Now according to DRF, master will always pick
>  user B first, and user A will not receive anything even though it has many
> tasks in the waiting queue.
>
> Q3: my observation is once a offers is declined or partially used by a
> framework, it immediately comes to to next available framework even though
> next frameworks share is higher than the previous one. Is that by
> implementation or I am getting something wrong here?
>
> Thanks
>
>
> On Mon, Dec 4, 2017 at 2:37 PM, Benjamin Mahler <[email protected]>
> wrote:
>
>> I don't think I understood the questions here, but let me add some
>> explanation and we can go from there.
>>
>> Mesos will use DRF to choose an ordering amongst the roles that are
>> actively interested in obtaining resources. Within a role, we currently use
>> DRF again to choose an ordering amongst the frameworks in that role. The
>> simplified pseudo-code looks something like this:
>>
>> for each agent:
>>   for each role in drf_sorted(roles):
>>     for each framework subscribed to role in drf_sorted(frameworks):
>>       if framework already filtered these resources:
>>         continue
>>       else
>>         allocate to framework
>>
>> There is no strong concept of a "cycle" as you were referring to, that
>> is, mesos will not remember which offers were sent out during which time we
>> ran this overall loop. Currently, when resources are offered, as far as the
>> allocator is concerned, they are considered allocated to that role and
>> framework.
>>
>> Mesos provides an --offer_timeout flag on the master after which the
>> offer will be rescinded.
>>
>> If you could share a little more about what you're trying to accomplish
>> in your particular use case we could advise on how best to set things up.
>>
>> On Thu, Nov 30, 2017 at 1:05 PM, bigggyan <[email protected]> wrote:
>>
>>> Hello
>>> My understanding is, during a single DRF cycle mesos master will not
>>> offer same framework twice. I believe, if a framework rejects or left over
>>> offer after partial use will come to next eligible framework.
>>> Now the question is if one framework takes longer time to make decision,
>>> will the same DRF allocation cycle will stay alive to allocate rest of the
>>> resources to other users or master will start a new cycle?
>>> Is there any allocation cycle expiry period? I am using multiple
>>> in-house frameworks with same role and same weight with no quota set. Will
>>> appreciate your help to understand the resource allocation.
>>>
>>> Thanks
>>> Bigggyan
>>>
>>
>>
>

Re: Resource allocation cycle in DRF for multiple frameworks

Reply via email to