Re: cgroups memory isolation

Thomas Petr Wed, 18 Jun 2014 08:21:10 -0700

Eric pointed out that I had a typo in the instance type -- it's a
c3.8xlarge (containing SSDs, which could make a difference here).



On Wed, Jun 18, 2014 at 10:36 AM, Thomas Petr <[email protected]> wrote:

> Thanks for all the info, Ian. We're running CentOS 6 with the 2.6.32
> kernel.
>
> I ran `dd if=/dev/zero of=lotsazeros bs=1M` as a task in Mesos and got
> some weird results. I initially gave the task 256 MB, and it never exceeded
> the memory allocation (I killed the task manually after 5 minutes when the
> file hit 50 GB). Then I noticed your example was 128 MB, so I resized and
> tried again. It exceeded memory
> <https://gist.github.com/tpetr/d4ff2adda1b5b0a21f82> almost
> immediately. The next (replacement) task our framework started ran
> successfully and never exceeded memory. I watched nr_dirty and it
> fluctuated between 10000 to 14000 when the task is running. The slave host
> is a c3.xlarge in EC2, if it makes a difference.
>
> As Mesos users, we'd like an isolation strategy that isn't affected by
> cache this much -- it makes it harder for us to appropriately size things.
> Is it possible through Mesos or cgroups itself to make the page cache not
> count towards the total memory consumption? If the answer is no, do you
> think it'd be worth looking at using Docker for isolation instead?
>
> -Tom
>
>
> On Tue, Jun 17, 2014 at 6:18 PM, Ian Downes <[email protected]> wrote:
>
>> Hello Thomas,
>>
>> Your impression is mostly correct: the kernel will *try* to reclaim
>> memory by writing out dirty pages before killing processes in a cgroup
>> but if it's unable to reclaim sufficient pages within some interval (I
>> don't recall this off-hand) then it will start killing things.
>>
>> We observed this on a 3.4 kernel where we could overwhelm the disk
>> subsystem and trigger an oom. Just how quickly this happens depends on
>> how fast you're writing compared to how fast your disk subsystem can
>> write it out. A simple "dd if=/dev/zero of=lotsazeros bs=1M" when
>> contained in a memory cgroup will fill the cache quickly, reach its
>> limit and get oom'ed. We were not able to reproduce this under 3.10
>> and 3.11 kernels. Which kernel are you using?
>>
>> Example: under 3.4:
>>
>> [idownes@hostname tmp]$ cat /proc/self/cgroup
>> 6:perf_event:/
>> 4:memory:/test
>> 3:freezer:/
>> 2:cpuacct:/
>> 1:cpu:/
>> [idownes@hostname tmp]$ cat
>> /sys/fs/cgroup/memory/test/memory.limit_in_bytes  # 128 MB
>> 134217728
>> [idownes@hostname tmp]$ dd if=/dev/zero of=lotsazeros bs=1M
>> Killed
>> [idownes@hostname tmp]$ ls -lah lotsazeros
>> -rw-r--r-- 1 idownes idownes 131M Jun 17 21:55 lotsazeros
>>
>>
>> You can also look in /proc/vmstat at nr_dirty to see how many dirty
>> pages there are (system wide). If you wrote at a rate sustainable by
>> your disk subsystem then you would see a sawtooth pattern _/|_/| ...
>> (use something like watch) as the cgroup approached its limit and the
>> kernel flushed dirty pages to bring it down.
>>
>> This might be an interesting read:
>>
>> http://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
>>
>> Hope this helps! Please do let us know if you're seeing this on a
>> kernel >= 3.10, otherwise it's likely this is a kernel issue rather
>> than something with Mesos.
>>
>> Thanks,
>> Ian
>>
>>
>> On Tue, Jun 17, 2014 at 2:23 PM, Thomas Petr <[email protected]> wrote:
>> > Hello,
>> >
>> > We're running Mesos 0.18.0 with cgroups isolation, and have run into
>> > situations where lots of file I/O causes tasks to be killed due to
>> exceeding
>> > memory limits. Here's an example:
>> > https://gist.github.com/tpetr/ce5d80a0de9f713765f0
>> >
>> > We were under the impression that if cache was using a lot of memory it
>> > would be reclaimed *before* the OOM process decides to kills the task.
>> Is
>> > this accurate? We also found MESOS-762 while trying to diagnose -- could
>> > this be a regression?
>> >
>> > Thanks,
>> > Tom
>>
>
>

Re: cgroups memory isolation

Reply via email to