Hi Whitney, Are you thinking an API to do that from within any executor or the command-executor in particular? The executor won't start before the fetcher has pulled all artifacts, so wouldn't it be too late to change the cgroups limits from whiten the executor? If not, you should be able to experiment with a custom executor run as root to change the limits?
A path to change the cgroups hierarchy from the executor in this case seems a bit like a local trouble shooting of a bigger problem. I may be missing something - but we should indeed get to the root cause of your OOM's. Are you running into the same problems if you use other compression/packaging formats - zip for example? Niklas On Tue, Aug 12, 2014 at 3:18 AM, Whitney Sorenson <[email protected]> wrote: > We're still seeing sporadic cgroup OOMs due to page cache usage (even with > the 3.4.98 kernel) in the download and untar process of our executor. > > One thing I'd like to experiment with is possibly dynamically changing > cgroup memory limits from the executor process itself (since it knows when > it will temporarily require a higher memory limit - setting it back down > afterwords - proceeded by a echo 1 > /proc/sys/vm/drop_caches as per > https://www.kernel.org/doc/Documentation/cgroups/memory.txt.) I welcome > any feedback about this approach. The other alternatives are to use the > mesos fetcher (which lacks a few key features) or to implement our own > fetcher as a separate service on the box. > > One wrinkle of this is finding the cgroup container for a given executor. > It would make sense to me if that information was conveyed to the executor > process itself via the register call (perhaps in ExecutorInfo?) Right now I > am forced to make an HTTP call and parse the entire mesos slave state in > order to find this id. > > -Whitney > -- Niklas

