Following up on this issue.
After setting up a test cluster running many parallel VMs and cgroups OOMs
we were able to isolate at least one current issue:
https://bugzilla.kernel.org/show_bug.cgi?id=80881
We've also noticed a separate deadlock issue occurring in 3.10 which does
not appear to be p
Thanks for clearing up about those patches.
I can confirm:
cat /cgroup/memory/memory.oom_control
oom_kill_disable 0
under_oom 0
We can try to reproduce outside of Mesos and see if we have similar issues.
Thankfully, we are not using EBS.
-Whitney
On Tue, Jul 1, 2014 at 1:36 PM, Ian Downes wr
Hi Whitney,
As Vinod said, 0.18.0 will ensure the kernel is set handle OOM
conditions. The patches you linked are refactors that should not have
changed the behavior since 0.18.0. Could you please double check that
/sys/fs/cgroup/memory/memory.oom.control has "oom_kill_disable 0"?
Can you attempt
Hey Whitney,
I'll let Ian Downes comment on the specific patches you linked, but at a
high level the bug in MESOS-662 was due to Mesos trying to handle OOM
situations in user space instead of letting kernel handle it. We have since
then changed the behavior to let Kernel handle the OOM. You can co
We've been running a few clusters on Amazon EC2 with mesos 0.18.0 on the
new generation C3 machines (generally c3.8xl) and have been experiencing
frequent system reboots.
Due to this issue (
http://mail-archives.apache.org/mod_mbox/mesos-user/201406.mbox/%3CCAJRB3TEj%2Bx4VRYicJM7aj7avcjr6QeXR8BmSU
5 matches
Mail list logo