We're still seeing sporadic cgroup OOMs due to page cache usage (even with
the 3.4.98 kernel) in the download and untar process of our executor.

One thing I'd like to experiment with is possibly dynamically changing
cgroup memory limits from the executor process itself (since it knows when
it will temporarily require a higher memory limit - setting it back down
afterwords - proceeded by a echo 1 > /proc/sys/vm/drop_caches as per
https://www.kernel.org/doc/Documentation/cgroups/memory.txt.) I welcome any
feedback about this approach. The other alternatives are to use the mesos
fetcher (which lacks a few key features) or to implement our own fetcher as
a separate service on the box.

One wrinkle of this is finding the cgroup container for a given executor.
It would make sense to me if that information was conveyed to the executor
process itself via the register call (perhaps in ExecutorInfo?) Right now I
am forced to make an HTTP call and parse the entire mesos slave state in
order to find this id.

-Whitney

Reply via email to