Hi Gidon,

just to make sure, you mean static reservations on mesos agents (via
--resources flag) and not dynamic reservations, right?

Let me first try to explain, why you get the TASK_ERROR message. The
built-in allocator merges '*' and reserved resources, hinting master to
create a single offer. However, as you mentioned before, validation fails,
if you try to mix resources with different role, because the function
responsible for validation checks whether task resources are "contained" in
offered resources, which obviously includes role equality check. Here are
some source code snippets:
https://github.com/apache/mesos/blob/master/src/master/validation.cpp#L449
https://github.com/apache/mesos/blob/master/src/common/resources.cpp#L598
https://github.com/apache/mesos/blob/master/src/common/resources.cpp#L244
https://github.com/apache/mesos/blob/master/src/common/resources.cpp#L197

Maybe we should split reserved and unreserved resources into two offers?

Now, to your second concern about whether we should disallow tasks using
both '*' and 'role' resources. I see your point: if a framework is entitled
to use reserved and unreserved resources, why not hoard them and launch a
bigger task? I think it's fine, and you should be actually able to do it by
explicitly specifying two different resource objects in the task launch
message, one for '*" resources and one for your role. Why cannot you just
use your framework's role for both? Different roles may have different
guarantees (quota, MESOS-1791), and while reserved resources may still be
available for your framework, '*" may become unavailable for you (in future
Mesos releases or with custom allocators) leading to the whole task
termination. By requiring two different objects in the task launch message
we motivate the framework — i.e. framework writer — to be aware of
different policies that may be attached to different roles. Does it make
sense?

—Alex

On Thu, Aug 13, 2015 at 2:23 PM, Gidon Gershinsky <[email protected]> wrote:

> I have a simple setup where a framework runs with a role, and some
> resources are reserved in cluster for that role.
> The resource offers arrive at the framework as a list of two resource
> sets: one general (cpus(*)), etc)  and one specific for the role
> (cpus("role1"), etc).
>
> So far so good. If two tasks are launched, each with one of the two
> resources, things work.
>
> But problems start when I need to launch multiple smaller tasks (with a
> total resource consumption equal to the offered). I run this by creating
> resource objects, and attaching them to tasks, using calls from the
> standard Mesos samples (python):
>                     task = mesos_pb2.TaskInfo()
>                    cpus = task.resources.add()
>                     cpus.name = "cpus"
>                     cpus.scalar.value = TASK_CPUS
>
> checking that total doesnt surpass the offered resources. This starts
> fine, but soon I get TASK_ERROR messages, due to Master validator finding
> that more resources are requested by tasks than available in the offer.
> This obviously happens because all tasks resources, as defined above, come
> with (*) role, while the offer resources are split between "*" and "role1"
> ! Ok, then I assign a role to task resources, by adding
>                    cpus.role = "role1"
>
> But this fails again, and for the same reason..
>
> Shouldn't this work differently? When a resource offer is received
>  framework with a "role1", why should it care which part is 'unreserved'
> and which part is reserved to "role1"? When a task launch request is
> received by the master, from a framework with a role, why can't it check
> only the total resource amount, instead of treating unreserved and reserved
> resources separately? They are reserved for this role anyway.. Or I'm
> missing something?
>
>
> Regards,
> Gidon
>
>
>
>

Reply via email to