Thanks Ian. Digging around the cgroup there are 3 processes in there;
* the mesos-executor * the shell script marathon starts the app with * the actual command to run the task ( a perl app in this case) The line of code you mention is never run in our case, because it's wrapped in the conditional I'm talking about! All I see is cpu.shares being set and then mem.soft_limit_in_bytes. On 28 April 2015 at 17:47, Ian Downes <[email protected]> wrote: > The line of code you cite is so the hard limit is not decreased on a running > container because we can't (easily) reclaim anonymous memory from running > processes. See the comment above the code. > > The info->pid.isNone() is for when cgroup is being configured (see the > update() call at the end of MemIsolatorProcess::prepare()), i.e., before any > processes are added to the cgroup. > > The limit > currentLimit.get() ensures the limit is only increased. > > The memory limit defaults to the maximum for the data type, I guess that's > the ridiculous 8 EB. It should be set to what the initial memory allocation > was for the container so this is not expected. Can you look in the slave > logs for when the container was created for the log line on: > https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L393 > > Ian > > On Tue, Apr 28, 2015 at 7:42 AM, Dick Davies <[email protected]> wrote: >> >> Been banging my head against this for a while now. >> >> mesos 0.21.0 , marathon 0.7.5, centos 6 servers. >> >> When I enable cgroups (flags are : --cgroups_limit_swap >> --isolation=cgroups/cpu,groups/mem ) the memory limits I'm setting >> are reflected in memory.soft_limit_in_bytes but not in >> >> memory.limit_in_bytes or memory.memsw.limit_in_bytes. >> >> >> Upshot is our runaway task eats all RAM and swap on the server >> until the OOM steps in and starts firing into the crowd. >> >> This line of code seems to never lower a hard limit: >> >> >> https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L382 >> >> which means both of those tests must be true, right? >> >> the current limit is insanely high (8192 PB if i'm reading it right) - how >> would >> I make info->pid.isNone() be true ? >> >> Have tried restarting the slave, scaling the marathon apps to 0 tasks >> then back. Bit stumped. > >

