Thanks Ian.

Digging around the cgroup there are 3 processes in there;

* the mesos-executor
* the shell script marathon starts the app with
* the actual command to run the task ( a perl app in this case)

The line of code you mention is never run in our case, because it's
wrapped in the conditional
I'm talking about!

All I see is cpu.shares being set and then mem.soft_limit_in_bytes.


On 28 April 2015 at 17:47, Ian Downes <[email protected]> wrote:
> The line of code you cite is so the hard limit is not decreased on a running
> container because we can't (easily) reclaim anonymous memory from running
> processes. See the comment above the code.
>
> The info->pid.isNone() is for when cgroup is being configured (see the
> update() call at the end of MemIsolatorProcess::prepare()), i.e., before any
> processes are added to the cgroup.
>
> The limit > currentLimit.get() ensures the limit is only increased.
>
> The memory limit defaults to the maximum for the data type, I guess that's
> the ridiculous 8 EB. It should be set to what the initial memory allocation
> was for the container so this is not expected. Can you look in the slave
> logs for when the container was created for the log line on:
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L393
>
> Ian
>
> On Tue, Apr 28, 2015 at 7:42 AM, Dick Davies <[email protected]> wrote:
>>
>> Been banging my head against this  for a while now.
>>
>> mesos 0.21.0 , marathon 0.7.5, centos 6 servers.
>>
>> When I enable cgroups (flags are : --cgroups_limit_swap
>> --isolation=cgroups/cpu,groups/mem ) the memory limits I'm setting
>> are reflected in memory.soft_limit_in_bytes but not in
>>
>> memory.limit_in_bytes or memory.memsw.limit_in_bytes.
>>
>>
>> Upshot is our runaway task eats all RAM and swap on the server
>> until the OOM steps in and starts firing into the crowd.
>>
>> This line of code seems to never lower a hard limit:
>>
>>
>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L382
>>
>> which means both of those tests must be true, right?
>>
>> the current limit is insanely high (8192 PB if i'm reading it right) - how
>> would
>> I make info->pid.isNone() be true ?
>>
>> Have tried restarting the slave, scaling the marathon apps to 0 tasks
>> then back. Bit stumped.
>
>

Reply via email to