Hi everyone,
I am having issues with the cgroups isolation of Mesos. It seems like
tasks are prevented from allocating more memory than their limit.
However, they are never killed.
* My scheduled task allocates memory in a tight loop. According to
'ps', once its memory requirements are exceeded it is not killed,
but ends up in the state D ("uninterruptible sleep (usually IO)").
* The task is still considered running by Mesos.
* There is no indication of an OOM in dmesg.
* There is neither an OOM notice nor any other output related to the
task in the slave log.
* According to htop, the system load is increased with a significant
portion of CPU time spend within the kernel. Commonly the load is so
high that all zookeeper connections time out.
I am running Aurora and Mesos 0.20.1 using the cgroups isolation on
Debian 7 (kernel 3.2.60-1+deb7u3). .
Sorry for the somewhat unspecific error description. Still, anyone an
idea what might be wrong here?
Thanks and Best Regards,
Stephan