Re: Resource modelling questions

Alex Rukletsov Wed, 17 Jun 2015 05:02:57 -0700

Brian,

these are very good and nicely written questions, let me try to ask them.

Mesos built-in allocator does the bookkeeping based on the declared
resource consumption and not actual (you call it "measured"). This means
the example in Apache docs and the white paper is correct.

However, there is an ongoing work for adding oversubscription to Mesos. I'm
not an expert in this area, maybe Niklas Nielsen chimes in and corrects me
later, but IMO it is planned to measure the actual resource consumption on
each Mesos agent node and notify Mesos master (including allocator) about
extra free but revokable resources. Here
<https://docs.google.com/document/d/1pUnElxHy1uWfHY_FOvvRC73QaOGgdXE0OXN-gbxdXA0/edit#heading=h.yvd9qbi4swb4>
is the design doc for this feature, some code has been already landed in
Mesos master branch, check, for example,
include/mesos/slave/resource_estimator.hpp.

As far as I know we do not have execution priorities for tasks. We plan to
takle this problem from a different direction: introduce quota (i.e.
cluster-wide resource reservations) for production frameworks, which
guarantees a certain amount of resources can be used by the framework at
any time, together with introducing oversubscription for quota resources,
that are currently unused by the framework. Another effort that aims to
increase cluster utilization is optimistic offers, which means offering
same resources to multiple frameworks at the same time.

Please be advised, that both quota and optimistic offers are in the early
design phase right now and will definitely not land in Mesos 0.23. And yes,
to increase CPU utilization, you may also lie to Mesos master about how
many CPUs your agent nodes have : ).

Hope this sheds some light on the topic.

On Wed, Jun 17, 2015 at 11:47 AM, Brian Candler <b.cand...@pobox.com> wrote:

> On 17/06/2015 10:33, Brian Candler wrote:
>
>> It's made more complicated by the fact that the jobs use mmap() on large
>> shared databases, so running multiple instances of the same task doesn't
>> use N times as much memory as one task.
>>
> Aside: combined with cgroups this gets hairy.
>
> As I understand it, mmap() memory is charged to the first process which
> touches it, and not to subsequent users of the same page. When a process
> terminates, the charge gets passed to the parent cgroup.
>
> Also: 3.2-vintage kernels have issues: even if only using cgroups for
> accounting (no hard limits), it seems the OOM killer kicks in if there are
> too many dirty pages waiting to be written to disk.
>
>
> https://www-auth.cs.wisc.edu/lists/htcondor-users/2015-February/msg00087.shtml
>
> https://www-auth.cs.wisc.edu/lists/htcondor-users/2015-February/msg00135.shtml
>
> Just thought that might be of interest.
>
>

Re: Resource modelling questions

Reply via email to