Hi Niklas,

I want to do this from a custom executor. I think I can accomplish
everything I need as things exist today, however, it would be nice if I
didn't have to make an API call to grab the container id.

However, regarding the general issue, the root cause is sort of discussed
here:
http://mail-archives.apache.org/mod_mbox/mesos-user/201406.mbox/%3ccajrb3tej+x4vryicjm7aj7avcjr6qexr8bmsuehrc6_tv62...@mail.gmail.com%3E

The issue is that downloading and untarring a large file can fill up a
large amount of page cache, which is considered part of the cgroups memory
limit.

-Whitney




On Tue, Aug 12, 2014 at 5:16 PM, Niklas Nielsen <[email protected]> wrote:

> Hi Whitney,
>
> Are you thinking an API to do that from within any executor or the
> command-executor in particular? The executor won't start before the fetcher
> has pulled all artifacts, so wouldn't it be too late to change the cgroups
> limits from whiten the executor?
> If not, you should be able to experiment with a custom executor run as
> root to change the limits?
>
> A path to change the cgroups hierarchy from the executor in this case
> seems a bit like a local trouble shooting of a bigger problem. I may be
> missing something - but we should indeed get to the root cause of your
> OOM's.
> Are you running into the same problems if you use other
> compression/packaging formats - zip for example?
>
> Niklas
>
>
> On Tue, Aug 12, 2014 at 3:18 AM, Whitney Sorenson <[email protected]>
> wrote:
>
>> We're still seeing sporadic cgroup OOMs due to page cache usage (even
>> with the 3.4.98 kernel) in the download and untar process of our executor.
>>
>> One thing I'd like to experiment with is possibly dynamically changing
>> cgroup memory limits from the executor process itself (since it knows when
>> it will temporarily require a higher memory limit - setting it back down
>> afterwords - proceeded by a echo 1 > /proc/sys/vm/drop_caches as per
>> https://www.kernel.org/doc/Documentation/cgroups/memory.txt.) I welcome
>> any feedback about this approach. The other alternatives are to use the
>> mesos fetcher (which lacks a few key features) or to implement our own
>> fetcher as a separate service on the box.
>>
>> One wrinkle of this is finding the cgroup container for a given executor.
>> It would make sense to me if that information was conveyed to the executor
>> process itself via the register call (perhaps in ExecutorInfo?) Right now I
>> am forced to make an HTTP call and parse the entire mesos slave state in
>> order to find this id.
>>
>> -Whitney
>>
>
>
>
> --
> Niklas
>

Reply via email to