Hello Thomas,

Your impression is mostly correct: the kernel will *try* to reclaim
memory by writing out dirty pages before killing processes in a cgroup
but if it's unable to reclaim sufficient pages within some interval (I
don't recall this off-hand) then it will start killing things.

We observed this on a 3.4 kernel where we could overwhelm the disk
subsystem and trigger an oom. Just how quickly this happens depends on
how fast you're writing compared to how fast your disk subsystem can
write it out. A simple "dd if=/dev/zero of=lotsazeros bs=1M" when
contained in a memory cgroup will fill the cache quickly, reach its
limit and get oom'ed. We were not able to reproduce this under 3.10
and 3.11 kernels. Which kernel are you using?

Example: under 3.4:

[idownes@hostname tmp]$ cat /proc/self/cgroup
6:perf_event:/
4:memory:/test
3:freezer:/
2:cpuacct:/
1:cpu:/
[idownes@hostname tmp]$ cat
/sys/fs/cgroup/memory/test/memory.limit_in_bytes  # 128 MB
134217728
[idownes@hostname tmp]$ dd if=/dev/zero of=lotsazeros bs=1M
Killed
[idownes@hostname tmp]$ ls -lah lotsazeros
-rw-r--r-- 1 idownes idownes 131M Jun 17 21:55 lotsazeros


You can also look in /proc/vmstat at nr_dirty to see how many dirty
pages there are (system wide). If you wrote at a rate sustainable by
your disk subsystem then you would see a sawtooth pattern _/|_/| ...
(use something like watch) as the cgroup approached its limit and the
kernel flushed dirty pages to bring it down.

This might be an interesting read:
http://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/

Hope this helps! Please do let us know if you're seeing this on a
kernel >= 3.10, otherwise it's likely this is a kernel issue rather
than something with Mesos.

Thanks,
Ian


On Tue, Jun 17, 2014 at 2:23 PM, Thomas Petr <[email protected]> wrote:
> Hello,
>
> We're running Mesos 0.18.0 with cgroups isolation, and have run into
> situations where lots of file I/O causes tasks to be killed due to exceeding
> memory limits. Here's an example:
> https://gist.github.com/tpetr/ce5d80a0de9f713765f0
>
> We were under the impression that if cache was using a lot of memory it
> would be reclaimed *before* the OOM process decides to kills the task. Is
> this accurate? We also found MESOS-762 while trying to diagnose -- could
> this be a regression?
>
> Thanks,
> Tom

Reply via email to