Hi Niklas, I want to do this from a custom executor. I think I can accomplish everything I need as things exist today, however, it would be nice if I didn't have to make an API call to grab the container id.
However, regarding the general issue, the root cause is sort of discussed here: http://mail-archives.apache.org/mod_mbox/mesos-user/201406.mbox/%3ccajrb3tej+x4vryicjm7aj7avcjr6qexr8bmsuehrc6_tv62...@mail.gmail.com%3E The issue is that downloading and untarring a large file can fill up a large amount of page cache, which is considered part of the cgroups memory limit. -Whitney On Tue, Aug 12, 2014 at 5:16 PM, Niklas Nielsen <[email protected]> wrote: > Hi Whitney, > > Are you thinking an API to do that from within any executor or the > command-executor in particular? The executor won't start before the fetcher > has pulled all artifacts, so wouldn't it be too late to change the cgroups > limits from whiten the executor? > If not, you should be able to experiment with a custom executor run as > root to change the limits? > > A path to change the cgroups hierarchy from the executor in this case > seems a bit like a local trouble shooting of a bigger problem. I may be > missing something - but we should indeed get to the root cause of your > OOM's. > Are you running into the same problems if you use other > compression/packaging formats - zip for example? > > Niklas > > > On Tue, Aug 12, 2014 at 3:18 AM, Whitney Sorenson <[email protected]> > wrote: > >> We're still seeing sporadic cgroup OOMs due to page cache usage (even >> with the 3.4.98 kernel) in the download and untar process of our executor. >> >> One thing I'd like to experiment with is possibly dynamically changing >> cgroup memory limits from the executor process itself (since it knows when >> it will temporarily require a higher memory limit - setting it back down >> afterwords - proceeded by a echo 1 > /proc/sys/vm/drop_caches as per >> https://www.kernel.org/doc/Documentation/cgroups/memory.txt.) I welcome >> any feedback about this approach. The other alternatives are to use the >> mesos fetcher (which lacks a few key features) or to implement our own >> fetcher as a separate service on the box. >> >> One wrinkle of this is finding the cgroup container for a given executor. >> It would make sense to me if that information was conveyed to the executor >> process itself via the register call (perhaps in ExecutorInfo?) Right now I >> am forced to make an HTTP call and parse the entire mesos slave state in >> order to find this id. >> >> -Whitney >> > > > > -- > Niklas >

