That's what led me into reading the code - neither mem.limit_in_bytes or mem.memsw.limit_in_bytes are ever set down from the (insanely high) defaults. I know that second conditional is false, so the first must be too, right?
It's likely I'm reading the wrong branch; we're running the 0.21.0 release - but I don't see any commits that would change this ordering. Just to confirm - we are using the default containerizer (not docker or anything else) - that shouldn't make any difference though, should it? I'm offsite til morning now (UK time), but I'll post the full slave logs when I can get to them. On 28 April 2015 at 18:18, Ian Downes <[email protected]> wrote: > The control flow in the Mesos containerizer to launch a container is: > > 1. Call prepare() on each isolator > 2. Then fork the executor > 3. Then isolate(executor_pid) on each isolator > > The last part of (1) will also call Isolator::update() to set the initial > memory limits (see line 288). This is done *before* the executor is in the > cgroup, i.e., info->pid.isNone() will be true and that block of code should > *always* be executed when a container starts. The LOG(INFO) line at 393 > should be present in your logs. Can you verify this? It should be shortly > after the LOG(INFO) on line 358. > > Ian > > > On Tue, Apr 28, 2015 at 9:54 AM, Dick Davies <[email protected]> wrote: >> >> Thanks Ian. >> >> Digging around the cgroup there are 3 processes in there; >> >> * the mesos-executor >> * the shell script marathon starts the app with >> * the actual command to run the task ( a perl app in this case) >> >> The line of code you mention is never run in our case, because it's >> wrapped in the conditional >> I'm talking about! >> >> All I see is cpu.shares being set and then mem.soft_limit_in_bytes. >> >> >> On 28 April 2015 at 17:47, Ian Downes <[email protected]> wrote: >> > The line of code you cite is so the hard limit is not decreased on a >> > running >> > container because we can't (easily) reclaim anonymous memory from >> > running >> > processes. See the comment above the code. >> > >> > The info->pid.isNone() is for when cgroup is being configured (see the >> > update() call at the end of MemIsolatorProcess::prepare()), i.e., before >> > any >> > processes are added to the cgroup. >> > >> > The limit > currentLimit.get() ensures the limit is only increased. >> > >> > The memory limit defaults to the maximum for the data type, I guess >> > that's >> > the ridiculous 8 EB. It should be set to what the initial memory >> > allocation >> > was for the container so this is not expected. Can you look in the slave >> > logs for when the container was created for the log line on: >> > >> > https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L393 >> > >> > Ian >> > >> > On Tue, Apr 28, 2015 at 7:42 AM, Dick Davies <[email protected]> >> > wrote: >> >> >> >> Been banging my head against this for a while now. >> >> >> >> mesos 0.21.0 , marathon 0.7.5, centos 6 servers. >> >> >> >> When I enable cgroups (flags are : --cgroups_limit_swap >> >> --isolation=cgroups/cpu,groups/mem ) the memory limits I'm setting >> >> are reflected in memory.soft_limit_in_bytes but not in >> >> >> >> memory.limit_in_bytes or memory.memsw.limit_in_bytes. >> >> >> >> >> >> Upshot is our runaway task eats all RAM and swap on the server >> >> until the OOM steps in and starts firing into the crowd. >> >> >> >> This line of code seems to never lower a hard limit: >> >> >> >> >> >> >> >> https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L382 >> >> >> >> which means both of those tests must be true, right? >> >> >> >> the current limit is insanely high (8192 PB if i'm reading it right) - >> >> how >> >> would >> >> I make info->pid.isNone() be true ? >> >> >> >> Have tried restarting the slave, scaling the marathon apps to 0 tasks >> >> then back. Bit stumped. >> > >> > > >

