Re: OOM not always detected by Mesos Slave

david.j.palaitis Wed, 12 Nov 2014 21:51:37 -0800


M
Sent via the Samsung GALAXY S® 5, an A pT&T 4G LTE smartphone

-------- Original message --------
From: Whitney Sorenson <[email protected]> 
Date:11/12/2014 5:51 PM (GMT-05:00) 
To: [email protected] 
Cc: Ian Downes <[email protected]> 
Subject: Re: OOM not always detected by Mesos Slave 

I missed the call-to-action here, regarding adding logs. I have some logs from 
a recent occurrence (this seems to happen quite frequently.)

However, in this case, I can't find a corresponding message anywhere on the 
system that refers to a kernel OOM (is there a place to check besides 
/var/log/messages or /var/log/dmesg?)

One problem we have with sizing for JVM-based tasks is appropriately estimating 
max thread counts.

https://gist.github.com/wsorenson/d2e49b96e84af86c9492

On Fri, Sep 12, 2014 at 9:12 PM, Benjamin Mahler <[email protected]> 
wrote:
+Ian

Sorry for the delay, when your cgroup OOMs a few things will occur:

(1) The kernel will notify mesos-slave about the OOM event.
(2) The kernel's OOM killer will pick a process in your cgroup to kill.
(3) Once notified, mesos-slave will begin destroying the cgroup.
(4) Once the executor terminates, any tasks that were non-terminal on the 
executor will have status updates sent with the OOM message.

This does not all happen atomically, so it is possible that the kernel kills 
your task process and your executor sends a status update before the slave 
completes the destruction of the cgroup.

Userspace OOM handling is supported, and we tried using it in the past, but it 
is not reliable:

https://issues.apache.org/jira/browse/MESOS-662
http://lwn.net/Articles/317814/
http://lwn.net/Articles/552789/
http://lwn.net/Articles/590960/
http://lwn.net/Articles/591990/

Since you have the luxury of avoiding the OOM killer (JVM flags w/ padding), I 
would recommend leveraging that for now.

Do you have the logs for your issue? My guess is that it took time for us to 
destroy the cgroup (possibly due to freezer issues) and so there was plenty of 
time for your executor to send the status update to the slave.

On Sat, Sep 6, 2014 at 6:56 AM, Whitney Sorenson <[email protected]> wrote:
We already pad the JVM and make room for our executor, and we try to get users 
to give the correct allowances. 

However, to be fair, your answer to my question about how Mesos is handling 
OOMs is to suggest we avoid them. I think we're always going to experience some 
cgroup OOMs and if we'd be better off if we had a consistent way of handling 
them.

On Fri, Sep 5, 2014 at 3:20 PM, Tomas Barton <[email protected]> wrote:
There is some overhead for the JVM itself, which should be added to the total 
usage of memory for the task. So you can't have the same amount of memory for 
the task as you pass to java, -Xmx parameter.

On 2 September 2014 20:43, Benjamin Mahler <[email protected]> wrote:
Looks like you're using the JVM, can you set all of your JVM flags to limit the 
memory consumption? This would favor an OutOfMemoryError instead of OOMing the 
cgroup.

On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson <[email protected]> wrote:
Recently, I've seen at least one case where a process inside of a task inside 
of a cgroup exceeded memory limits and the process was killed directly. The 
executor recognized the process was killed and sent a TASK_FAILED. However, it 
seems far more common to see the executor process itself destroyed and the 
mesos slave (I'm making some assumptions here about how it all works) sends a 
TASK_FAILED which includes information about the memory usage.

Is there something we can do to make this behavior more consistent?

Alternatively, can we provide some functionality to hook into so we don't need 
to duplicate the work of the mesos slave in order to provide the same 
information in the TASK_FAILED message? I think users would like to know 
definitively that the task OOM'd, whereas in the case where the underlying task 
is killed it may take a lot of digging to find the underlying cause if you 
aren't looking for it.

-Whitney

Here are relevant lines from messages in case something else is amiss:

Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067321] Task in 
/mesos/2dda5398-6aa6-49bb-8904-37548eae837e killed as a result of limit of 
/mesos/2dda5398-6aa6-49bb-8904-37548eae837e 
Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067334] memory: usage 
917420kB, limit 917504kB, failcnt 106672
Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.066947] java7 invoked 
oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0

Re: OOM not always detected by Mesos Slave

Reply via email to