We're still seeing sporadic cgroup OOMs due to page cache usage (even with the 3.4.98 kernel) in the download and untar process of our executor.
One thing I'd like to experiment with is possibly dynamically changing cgroup memory limits from the executor process itself (since it knows when it will temporarily require a higher memory limit - setting it back down afterwords - proceeded by a echo 1 > /proc/sys/vm/drop_caches as per https://www.kernel.org/doc/Documentation/cgroups/memory.txt.) I welcome any feedback about this approach. The other alternatives are to use the mesos fetcher (which lacks a few key features) or to implement our own fetcher as a separate service on the box. One wrinkle of this is finding the cgroup container for a given executor. It would make sense to me if that information was conveyed to the executor process itself via the register call (perhaps in ExecutorInfo?) Right now I am forced to make an HTTP call and parse the entire mesos slave state in order to find this id. -Whitney

