Re: OOM not always detected by Mesos Slave

Whitney Sorenson Thu, 13 Nov 2014 05:30:13 -0800

I found no such file in this case.


On Wed, Nov 12, 2014 at 8:53 PM, Benjamin Mahler <[email protected]>
wrote:

> I find the OOM logging from the kernel in /var/log/kern.log.
>
> On Wed, Nov 12, 2014 at 2:51 PM, Whitney Sorenson <[email protected]>
> wrote:
>
>> I missed the call-to-action here, regarding adding logs. I have some logs
>> from a recent occurrence (this seems to happen quite frequently.)
>>
>> However, in this case, I can't find a corresponding message anywhere on
>> the system that refers to a kernel OOM (is there a place to check besides
>> /var/log/messages or /var/log/dmesg?)
>>
>> One problem we have with sizing for JVM-based tasks is appropriately
>> estimating max thread counts.
>>
>> https://gist.github.com/wsorenson/d2e49b96e84af86c9492
>>
>>
>> On Fri, Sep 12, 2014 at 9:12 PM, Benjamin Mahler <
>> [email protected]> wrote:
>>
>>> +Ian
>>>
>>> Sorry for the delay, when your cgroup OOMs a few things will occur:
>>>
>>> (1) The kernel will notify mesos-slave about the OOM event.
>>> (2) The kernel's OOM killer will pick a process in your cgroup to kill.
>>> (3) Once notified, mesos-slave will begin destroying the cgroup.
>>> (4) Once the executor terminates, any tasks that were non-terminal on
>>> the executor will have status updates sent with the OOM message.
>>>
>>> This does not all happen atomically, so it is possible that the kernel
>>> kills your task process and your executor sends a status update before the
>>> slave completes the destruction of the cgroup.
>>>
>>> Userspace OOM handling is supported, and we tried using it in the past,
>>> but it is not reliable:
>>>
>>> https://issues.apache.org/jira/browse/MESOS-662
>>> http://lwn.net/Articles/317814/
>>> http://lwn.net/Articles/552789/
>>> http://lwn.net/Articles/590960/
>>> http://lwn.net/Articles/591990/
>>>
>>> Since you have the luxury of avoiding the OOM killer (JVM flags w/
>>> padding), I would recommend leveraging that for now.
>>>
>>> Do you have the logs for your issue? My guess is that it took time for
>>> us to destroy the cgroup (possibly due to freezer issues) and so there was
>>> plenty of time for your executor to send the status update to the slave.
>>>
>>> On Sat, Sep 6, 2014 at 6:56 AM, Whitney Sorenson <[email protected]>
>>> wrote:
>>>
>>>> We already pad the JVM and make room for our executor, and we try to
>>>> get users to give the correct allowances.
>>>>
>>>> However, to be fair, your answer to my question about how Mesos is
>>>> handling OOMs is to suggest we avoid them. I think we're always going to
>>>> experience some cgroup OOMs and if we'd be better off if we had a
>>>> consistent way of handling them.
>>>>
>>>>
>>>> On Fri, Sep 5, 2014 at 3:20 PM, Tomas Barton <[email protected]>
>>>> wrote:
>>>>
>>>>> There is some overhead for the JVM itself, which should be added to
>>>>> the total usage of memory for the task. So you can't have the same amount
>>>>> of memory for the task as you pass to java, -Xmx parameter.
>>>>>
>>>>>
>>>>> On 2 September 2014 20:43, Benjamin Mahler <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Looks like you're using the JVM, can you set all of your JVM flags to
>>>>>> limit the memory consumption? This would favor an OutOfMemoryError 
>>>>>> instead
>>>>>> of OOMing the cgroup.
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Recently, I've seen at least one case where a process inside of a
>>>>>>> task inside of a cgroup exceeded memory limits and the process was 
>>>>>>> killed
>>>>>>> directly. The executor recognized the process was killed and sent a
>>>>>>> TASK_FAILED. However, it seems far more common to see the executor 
>>>>>>> process
>>>>>>> itself destroyed and the mesos slave (I'm making some assumptions here
>>>>>>> about how it all works) sends a TASK_FAILED which includes information
>>>>>>> about the memory usage.
>>>>>>>
>>>>>>> Is there something we can do to make this behavior more consistent?
>>>>>>>
>>>>>>> Alternatively, can we provide some functionality to hook into so we
>>>>>>> don't need to duplicate the work of the mesos slave in order to provide 
>>>>>>> the
>>>>>>> same information in the TASK_FAILED message? I think users would like to
>>>>>>> know definitively that the task OOM'd, whereas in the case where the
>>>>>>> underlying task is killed it may take a lot of digging to find the
>>>>>>> underlying cause if you aren't looking for it.
>>>>>>>
>>>>>>> -Whitney
>>>>>>>
>>>>>>> Here are relevant lines from messages in case something else is
>>>>>>> amiss:
>>>>>>>
>>>>>>> Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067321] Task in
>>>>>>> /mesos/2dda5398-6aa6-49bb-8904-37548eae837e killed as a result of limit 
>>>>>>> of
>>>>>>> /mesos/2dda5398-6aa6-49bb-8904-37548eae837e
>>>>>>> Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067334] memory:
>>>>>>> usage 917420kB, limit 917504kB, failcnt 106672
>>>>>>> Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.066947] java7
>>>>>>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: OOM not always detected by Mesos Slave

Reply via email to