Re: OOM not always detected by Mesos Slave

2014-11-13 Thread Whitney Sorenson
I found no such file in this case.


On Wed, Nov 12, 2014 at 8:53 PM, Benjamin Mahler benjamin.mah...@gmail.com
wrote:

 I find the OOM logging from the kernel in /var/log/kern.log.

 On Wed, Nov 12, 2014 at 2:51 PM, Whitney Sorenson wsoren...@hubspot.com
 wrote:

 I missed the call-to-action here, regarding adding logs. I have some logs
 from a recent occurrence (this seems to happen quite frequently.)

 However, in this case, I can't find a corresponding message anywhere on
 the system that refers to a kernel OOM (is there a place to check besides
 /var/log/messages or /var/log/dmesg?)

 One problem we have with sizing for JVM-based tasks is appropriately
 estimating max thread counts.

 https://gist.github.com/wsorenson/d2e49b96e84af86c9492


 On Fri, Sep 12, 2014 at 9:12 PM, Benjamin Mahler 
 benjamin.mah...@gmail.com wrote:

 +Ian

 Sorry for the delay, when your cgroup OOMs a few things will occur:

 (1) The kernel will notify mesos-slave about the OOM event.
 (2) The kernel's OOM killer will pick a process in your cgroup to kill.
 (3) Once notified, mesos-slave will begin destroying the cgroup.
 (4) Once the executor terminates, any tasks that were non-terminal on
 the executor will have status updates sent with the OOM message.

 This does not all happen atomically, so it is possible that the kernel
 kills your task process and your executor sends a status update before the
 slave completes the destruction of the cgroup.

 Userspace OOM handling is supported, and we tried using it in the past,
 but it is not reliable:

 https://issues.apache.org/jira/browse/MESOS-662
 http://lwn.net/Articles/317814/
 http://lwn.net/Articles/552789/
 http://lwn.net/Articles/590960/
 http://lwn.net/Articles/591990/

 Since you have the luxury of avoiding the OOM killer (JVM flags w/
 padding), I would recommend leveraging that for now.

 Do you have the logs for your issue? My guess is that it took time for
 us to destroy the cgroup (possibly due to freezer issues) and so there was
 plenty of time for your executor to send the status update to the slave.

 On Sat, Sep 6, 2014 at 6:56 AM, Whitney Sorenson wsoren...@hubspot.com
 wrote:

 We already pad the JVM and make room for our executor, and we try to
 get users to give the correct allowances.

 However, to be fair, your answer to my question about how Mesos is
 handling OOMs is to suggest we avoid them. I think we're always going to
 experience some cgroup OOMs and if we'd be better off if we had a
 consistent way of handling them.


 On Fri, Sep 5, 2014 at 3:20 PM, Tomas Barton barton.to...@gmail.com
 wrote:

 There is some overhead for the JVM itself, which should be added to
 the total usage of memory for the task. So you can't have the same amount
 of memory for the task as you pass to java, -Xmx parameter.


 On 2 September 2014 20:43, Benjamin Mahler benjamin.mah...@gmail.com
 wrote:

 Looks like you're using the JVM, can you set all of your JVM flags to
 limit the memory consumption? This would favor an OutOfMemoryError 
 instead
 of OOMing the cgroup.


 On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson 
 wsoren...@hubspot.com wrote:

 Recently, I've seen at least one case where a process inside of a
 task inside of a cgroup exceeded memory limits and the process was 
 killed
 directly. The executor recognized the process was killed and sent a
 TASK_FAILED. However, it seems far more common to see the executor 
 process
 itself destroyed and the mesos slave (I'm making some assumptions here
 about how it all works) sends a TASK_FAILED which includes information
 about the memory usage.

 Is there something we can do to make this behavior more consistent?

 Alternatively, can we provide some functionality to hook into so we
 don't need to duplicate the work of the mesos slave in order to provide 
 the
 same information in the TASK_FAILED message? I think users would like to
 know definitively that the task OOM'd, whereas in the case where the
 underlying task is killed it may take a lot of digging to find the
 underlying cause if you aren't looking for it.

 -Whitney

 Here are relevant lines from messages in case something else is
 amiss:

 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067321] Task in
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e killed as a result of limit 
 of
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067334] memory:
 usage 917420kB, limit 917504kB, failcnt 106672
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.066947] java7
 invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0












Re: OOM not always detected by Mesos Slave

2014-11-13 Thread Ian Downes
In reply to your original issue:

It is possible to influence the kernel OOM killer in its decision on
which process to kill to free memory. An OOM score is computed for
each process and it depends on age (tends to kill shortest living) and
usage (tends to kill larger memory users), i.e., this generally favors
killing something other than the executor. This score could be
adjusted to more strongly prefer not killing the executor by setting
and OOM adjustment. See
https://issues.apache.org/jira/browse/MESOS-416 which discusses this
setting for the master and slave.

We could then check for an OOM, even if the executor exits 0, and
report accordingly. Does that address your original question?

Ian

On Thu, Nov 13, 2014 at 5:29 AM, Whitney Sorenson wsoren...@hubspot.com wrote:
 I found no such file in this case.


 On Wed, Nov 12, 2014 at 8:53 PM, Benjamin Mahler benjamin.mah...@gmail.com
 wrote:

 I find the OOM logging from the kernel in /var/log/kern.log.

 On Wed, Nov 12, 2014 at 2:51 PM, Whitney Sorenson wsoren...@hubspot.com
 wrote:

 I missed the call-to-action here, regarding adding logs. I have some logs
 from a recent occurrence (this seems to happen quite frequently.)

 However, in this case, I can't find a corresponding message anywhere on
 the system that refers to a kernel OOM (is there a place to check besides
 /var/log/messages or /var/log/dmesg?)

 One problem we have with sizing for JVM-based tasks is appropriately
 estimating max thread counts.

 https://gist.github.com/wsorenson/d2e49b96e84af86c9492


 On Fri, Sep 12, 2014 at 9:12 PM, Benjamin Mahler
 benjamin.mah...@gmail.com wrote:

 +Ian

 Sorry for the delay, when your cgroup OOMs a few things will occur:

 (1) The kernel will notify mesos-slave about the OOM event.
 (2) The kernel's OOM killer will pick a process in your cgroup to kill.
 (3) Once notified, mesos-slave will begin destroying the cgroup.
 (4) Once the executor terminates, any tasks that were non-terminal on
 the executor will have status updates sent with the OOM message.

 This does not all happen atomically, so it is possible that the kernel
 kills your task process and your executor sends a status update before the
 slave completes the destruction of the cgroup.

 Userspace OOM handling is supported, and we tried using it in the past,
 but it is not reliable:

 https://issues.apache.org/jira/browse/MESOS-662
 http://lwn.net/Articles/317814/
 http://lwn.net/Articles/552789/
 http://lwn.net/Articles/590960/
 http://lwn.net/Articles/591990/

 Since you have the luxury of avoiding the OOM killer (JVM flags w/
 padding), I would recommend leveraging that for now.

 Do you have the logs for your issue? My guess is that it took time for
 us to destroy the cgroup (possibly due to freezer issues) and so there was
 plenty of time for your executor to send the status update to the slave.

 On Sat, Sep 6, 2014 at 6:56 AM, Whitney Sorenson wsoren...@hubspot.com
 wrote:

 We already pad the JVM and make room for our executor, and we try to
 get users to give the correct allowances.

 However, to be fair, your answer to my question about how Mesos is
 handling OOMs is to suggest we avoid them. I think we're always going to
 experience some cgroup OOMs and if we'd be better off if we had a 
 consistent
 way of handling them.


 On Fri, Sep 5, 2014 at 3:20 PM, Tomas Barton barton.to...@gmail.com
 wrote:

 There is some overhead for the JVM itself, which should be added to
 the total usage of memory for the task. So you can't have the same 
 amount of
 memory for the task as you pass to java, -Xmx parameter.


 On 2 September 2014 20:43, Benjamin Mahler benjamin.mah...@gmail.com
 wrote:

 Looks like you're using the JVM, can you set all of your JVM flags to
 limit the memory consumption? This would favor an OutOfMemoryError 
 instead
 of OOMing the cgroup.


 On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson
 wsoren...@hubspot.com wrote:

 Recently, I've seen at least one case where a process inside of a
 task inside of a cgroup exceeded memory limits and the process was 
 killed
 directly. The executor recognized the process was killed and sent a
 TASK_FAILED. However, it seems far more common to see the executor 
 process
 itself destroyed and the mesos slave (I'm making some assumptions here 
 about
 how it all works) sends a TASK_FAILED which includes information about 
 the
 memory usage.

 Is there something we can do to make this behavior more consistent?

 Alternatively, can we provide some functionality to hook into so we
 don't need to duplicate the work of the mesos slave in order to 
 provide the
 same information in the TASK_FAILED message? I think users would like 
 to
 know definitively that the task OOM'd, whereas in the case where the
 underlying task is killed it may take a lot of digging to find the
 underlying cause if you aren't looking for it.

 -Whitney

 Here are relevant lines from messages in case something else is
 amiss:

 Aug 27 23:24:07 ip-10-237-165-119 

Re: OOM not always detected by Mesos Slave

2014-11-13 Thread Ian Downes
Created:

[MESOS-2105] Reliably report OOM even if the executor exits normally

https://issues.apache.org/jira/browse/MESOS-2105

On Thu, Nov 13, 2014 at 12:07 PM, Whitney Sorenson wsoren...@hubspot.com
wrote:

 Yeah I think so, ultimately what me and my users are looking for is
 consistency in the reporting of TASK_FAILED when an OOM is involved. If any
 OOM happens I'd rather the entire process tree always be taken out and that
 it be reliably reported as such.



 On Thu, Nov 13, 2014 at 1:03 PM, Ian Downes ian.dow...@gmail.com wrote:

 In reply to your original issue:

 It is possible to influence the kernel OOM killer in its decision on
 which process to kill to free memory. An OOM score is computed for
 each process and it depends on age (tends to kill shortest living) and
 usage (tends to kill larger memory users), i.e., this generally favors
 killing something other than the executor. This score could be
 adjusted to more strongly prefer not killing the executor by setting
 and OOM adjustment. See
 https://issues.apache.org/jira/browse/MESOS-416 which discusses this
 setting for the master and slave.

 We could then check for an OOM, even if the executor exits 0, and
 report accordingly. Does that address your original question?

 Ian

 On Thu, Nov 13, 2014 at 5:29 AM, Whitney Sorenson wsoren...@hubspot.com
 wrote:
  I found no such file in this case.
 
 
  On Wed, Nov 12, 2014 at 8:53 PM, Benjamin Mahler 
 benjamin.mah...@gmail.com
  wrote:
 
  I find the OOM logging from the kernel in /var/log/kern.log.
 
  On Wed, Nov 12, 2014 at 2:51 PM, Whitney Sorenson 
 wsoren...@hubspot.com
  wrote:
 
  I missed the call-to-action here, regarding adding logs. I have some
 logs
  from a recent occurrence (this seems to happen quite frequently.)
 
  However, in this case, I can't find a corresponding message anywhere
 on
  the system that refers to a kernel OOM (is there a place to check
 besides
  /var/log/messages or /var/log/dmesg?)
 
  One problem we have with sizing for JVM-based tasks is appropriately
  estimating max thread counts.
 
  https://gist.github.com/wsorenson/d2e49b96e84af86c9492
 
 
  On Fri, Sep 12, 2014 at 9:12 PM, Benjamin Mahler
  benjamin.mah...@gmail.com wrote:
 
  +Ian
 
  Sorry for the delay, when your cgroup OOMs a few things will occur:
 
  (1) The kernel will notify mesos-slave about the OOM event.
  (2) The kernel's OOM killer will pick a process in your cgroup to
 kill.
  (3) Once notified, mesos-slave will begin destroying the cgroup.
  (4) Once the executor terminates, any tasks that were non-terminal on
  the executor will have status updates sent with the OOM message.
 
  This does not all happen atomically, so it is possible that the
 kernel
  kills your task process and your executor sends a status update
 before the
  slave completes the destruction of the cgroup.
 
  Userspace OOM handling is supported, and we tried using it in the
 past,
  but it is not reliable:
 
  https://issues.apache.org/jira/browse/MESOS-662
  http://lwn.net/Articles/317814/
  http://lwn.net/Articles/552789/
  http://lwn.net/Articles/590960/
  http://lwn.net/Articles/591990/
 
  Since you have the luxury of avoiding the OOM killer (JVM flags w/
  padding), I would recommend leveraging that for now.
 
  Do you have the logs for your issue? My guess is that it took time
 for
  us to destroy the cgroup (possibly due to freezer issues) and so
 there was
  plenty of time for your executor to send the status update to the
 slave.
 
  On Sat, Sep 6, 2014 at 6:56 AM, Whitney Sorenson 
 wsoren...@hubspot.com
  wrote:
 
  We already pad the JVM and make room for our executor, and we try to
  get users to give the correct allowances.
 
  However, to be fair, your answer to my question about how Mesos is
  handling OOMs is to suggest we avoid them. I think we're always
 going to
  experience some cgroup OOMs and if we'd be better off if we had a
 consistent
  way of handling them.
 
 
  On Fri, Sep 5, 2014 at 3:20 PM, Tomas Barton 
 barton.to...@gmail.com
  wrote:
 
  There is some overhead for the JVM itself, which should be added to
  the total usage of memory for the task. So you can't have the same
 amount of
  memory for the task as you pass to java, -Xmx parameter.
 
 
  On 2 September 2014 20:43, Benjamin Mahler 
 benjamin.mah...@gmail.com
  wrote:
 
  Looks like you're using the JVM, can you set all of your JVM
 flags to
  limit the memory consumption? This would favor an
 OutOfMemoryError instead
  of OOMing the cgroup.
 
 
  On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson
  wsoren...@hubspot.com wrote:
 
  Recently, I've seen at least one case where a process inside of a
  task inside of a cgroup exceeded memory limits and the process
 was killed
  directly. The executor recognized the process was killed and
 sent a
  TASK_FAILED. However, it seems far more common to see the
 executor process
  itself destroyed and the mesos slave (I'm making some
 assumptions here about
  how it all works) sends 

Re: OOM not always detected by Mesos Slave

2014-11-12 Thread Whitney Sorenson
I missed the call-to-action here, regarding adding logs. I have some logs
from a recent occurrence (this seems to happen quite frequently.)

However, in this case, I can't find a corresponding message anywhere on the
system that refers to a kernel OOM (is there a place to check besides
/var/log/messages or /var/log/dmesg?)

One problem we have with sizing for JVM-based tasks is appropriately
estimating max thread counts.

https://gist.github.com/wsorenson/d2e49b96e84af86c9492


On Fri, Sep 12, 2014 at 9:12 PM, Benjamin Mahler benjamin.mah...@gmail.com
wrote:

 +Ian

 Sorry for the delay, when your cgroup OOMs a few things will occur:

 (1) The kernel will notify mesos-slave about the OOM event.
 (2) The kernel's OOM killer will pick a process in your cgroup to kill.
 (3) Once notified, mesos-slave will begin destroying the cgroup.
 (4) Once the executor terminates, any tasks that were non-terminal on the
 executor will have status updates sent with the OOM message.

 This does not all happen atomically, so it is possible that the kernel
 kills your task process and your executor sends a status update before the
 slave completes the destruction of the cgroup.

 Userspace OOM handling is supported, and we tried using it in the past,
 but it is not reliable:

 https://issues.apache.org/jira/browse/MESOS-662
 http://lwn.net/Articles/317814/
 http://lwn.net/Articles/552789/
 http://lwn.net/Articles/590960/
 http://lwn.net/Articles/591990/

 Since you have the luxury of avoiding the OOM killer (JVM flags w/
 padding), I would recommend leveraging that for now.

 Do you have the logs for your issue? My guess is that it took time for us
 to destroy the cgroup (possibly due to freezer issues) and so there was
 plenty of time for your executor to send the status update to the slave.

 On Sat, Sep 6, 2014 at 6:56 AM, Whitney Sorenson wsoren...@hubspot.com
 wrote:

 We already pad the JVM and make room for our executor, and we try to get
 users to give the correct allowances.

 However, to be fair, your answer to my question about how Mesos is
 handling OOMs is to suggest we avoid them. I think we're always going to
 experience some cgroup OOMs and if we'd be better off if we had a
 consistent way of handling them.


 On Fri, Sep 5, 2014 at 3:20 PM, Tomas Barton barton.to...@gmail.com
 wrote:

 There is some overhead for the JVM itself, which should be added to the
 total usage of memory for the task. So you can't have the same amount of
 memory for the task as you pass to java, -Xmx parameter.


 On 2 September 2014 20:43, Benjamin Mahler benjamin.mah...@gmail.com
 wrote:

 Looks like you're using the JVM, can you set all of your JVM flags to
 limit the memory consumption? This would favor an OutOfMemoryError instead
 of OOMing the cgroup.


 On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson 
 wsoren...@hubspot.com wrote:

 Recently, I've seen at least one case where a process inside of a task
 inside of a cgroup exceeded memory limits and the process was killed
 directly. The executor recognized the process was killed and sent a
 TASK_FAILED. However, it seems far more common to see the executor process
 itself destroyed and the mesos slave (I'm making some assumptions here
 about how it all works) sends a TASK_FAILED which includes information
 about the memory usage.

 Is there something we can do to make this behavior more consistent?

 Alternatively, can we provide some functionality to hook into so we
 don't need to duplicate the work of the mesos slave in order to provide 
 the
 same information in the TASK_FAILED message? I think users would like to
 know definitively that the task OOM'd, whereas in the case where the
 underlying task is killed it may take a lot of digging to find the
 underlying cause if you aren't looking for it.

 -Whitney

 Here are relevant lines from messages in case something else is amiss:

 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067321] Task in
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e killed as a result of limit of
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067334] memory:
 usage 917420kB, limit 917504kB, failcnt 106672
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.066947] java7
 invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0










Re: OOM not always detected by Mesos Slave

2014-11-12 Thread Benjamin Mahler
I find the OOM logging from the kernel in /var/log/kern.log.

On Wed, Nov 12, 2014 at 2:51 PM, Whitney Sorenson wsoren...@hubspot.com
wrote:

 I missed the call-to-action here, regarding adding logs. I have some logs
 from a recent occurrence (this seems to happen quite frequently.)

 However, in this case, I can't find a corresponding message anywhere on
 the system that refers to a kernel OOM (is there a place to check besides
 /var/log/messages or /var/log/dmesg?)

 One problem we have with sizing for JVM-based tasks is appropriately
 estimating max thread counts.

 https://gist.github.com/wsorenson/d2e49b96e84af86c9492


 On Fri, Sep 12, 2014 at 9:12 PM, Benjamin Mahler 
 benjamin.mah...@gmail.com wrote:

 +Ian

 Sorry for the delay, when your cgroup OOMs a few things will occur:

 (1) The kernel will notify mesos-slave about the OOM event.
 (2) The kernel's OOM killer will pick a process in your cgroup to kill.
 (3) Once notified, mesos-slave will begin destroying the cgroup.
 (4) Once the executor terminates, any tasks that were non-terminal on the
 executor will have status updates sent with the OOM message.

 This does not all happen atomically, so it is possible that the kernel
 kills your task process and your executor sends a status update before the
 slave completes the destruction of the cgroup.

 Userspace OOM handling is supported, and we tried using it in the past,
 but it is not reliable:

 https://issues.apache.org/jira/browse/MESOS-662
 http://lwn.net/Articles/317814/
 http://lwn.net/Articles/552789/
 http://lwn.net/Articles/590960/
 http://lwn.net/Articles/591990/

 Since you have the luxury of avoiding the OOM killer (JVM flags w/
 padding), I would recommend leveraging that for now.

 Do you have the logs for your issue? My guess is that it took time for us
 to destroy the cgroup (possibly due to freezer issues) and so there was
 plenty of time for your executor to send the status update to the slave.

 On Sat, Sep 6, 2014 at 6:56 AM, Whitney Sorenson wsoren...@hubspot.com
 wrote:

 We already pad the JVM and make room for our executor, and we try to get
 users to give the correct allowances.

 However, to be fair, your answer to my question about how Mesos is
 handling OOMs is to suggest we avoid them. I think we're always going to
 experience some cgroup OOMs and if we'd be better off if we had a
 consistent way of handling them.


 On Fri, Sep 5, 2014 at 3:20 PM, Tomas Barton barton.to...@gmail.com
 wrote:

 There is some overhead for the JVM itself, which should be added to the
 total usage of memory for the task. So you can't have the same amount of
 memory for the task as you pass to java, -Xmx parameter.


 On 2 September 2014 20:43, Benjamin Mahler benjamin.mah...@gmail.com
 wrote:

 Looks like you're using the JVM, can you set all of your JVM flags to
 limit the memory consumption? This would favor an OutOfMemoryError instead
 of OOMing the cgroup.


 On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson 
 wsoren...@hubspot.com wrote:

 Recently, I've seen at least one case where a process inside of a
 task inside of a cgroup exceeded memory limits and the process was killed
 directly. The executor recognized the process was killed and sent a
 TASK_FAILED. However, it seems far more common to see the executor 
 process
 itself destroyed and the mesos slave (I'm making some assumptions here
 about how it all works) sends a TASK_FAILED which includes information
 about the memory usage.

 Is there something we can do to make this behavior more consistent?

 Alternatively, can we provide some functionality to hook into so we
 don't need to duplicate the work of the mesos slave in order to provide 
 the
 same information in the TASK_FAILED message? I think users would like to
 know definitively that the task OOM'd, whereas in the case where the
 underlying task is killed it may take a lot of digging to find the
 underlying cause if you aren't looking for it.

 -Whitney

 Here are relevant lines from messages in case something else is amiss:

 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067321] Task in
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e killed as a result of limit 
 of
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067334] memory:
 usage 917420kB, limit 917504kB, failcnt 106672
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.066947] java7
 invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0











Re: OOM not always detected by Mesos Slave

2014-11-12 Thread david.j.palaitis


M
Sent via the Samsung GALAXY S® 5, an A pTT 4G LTE smartphone


 Original message 
From: Whitney Sorenson wsoren...@hubspot.com 
Date:11/12/2014 5:51 PM (GMT-05:00) 
To: user@mesos.apache.org 
Cc: Ian Downes idow...@twitter.com 
Subject: Re: OOM not always detected by Mesos Slave 

I missed the call-to-action here, regarding adding logs. I have some logs from 
a recent occurrence (this seems to happen quite frequently.)

However, in this case, I can't find a corresponding message anywhere on the 
system that refers to a kernel OOM (is there a place to check besides 
/var/log/messages or /var/log/dmesg?)

One problem we have with sizing for JVM-based tasks is appropriately estimating 
max thread counts.

https://gist.github.com/wsorenson/d2e49b96e84af86c9492


On Fri, Sep 12, 2014 at 9:12 PM, Benjamin Mahler benjamin.mah...@gmail.com 
wrote:
+Ian

Sorry for the delay, when your cgroup OOMs a few things will occur:

(1) The kernel will notify mesos-slave about the OOM event.
(2) The kernel's OOM killer will pick a process in your cgroup to kill.
(3) Once notified, mesos-slave will begin destroying the cgroup.
(4) Once the executor terminates, any tasks that were non-terminal on the 
executor will have status updates sent with the OOM message.

This does not all happen atomically, so it is possible that the kernel kills 
your task process and your executor sends a status update before the slave 
completes the destruction of the cgroup.

Userspace OOM handling is supported, and we tried using it in the past, but it 
is not reliable:

https://issues.apache.org/jira/browse/MESOS-662
http://lwn.net/Articles/317814/
http://lwn.net/Articles/552789/
http://lwn.net/Articles/590960/
http://lwn.net/Articles/591990/

Since you have the luxury of avoiding the OOM killer (JVM flags w/ padding), I 
would recommend leveraging that for now.

Do you have the logs for your issue? My guess is that it took time for us to 
destroy the cgroup (possibly due to freezer issues) and so there was plenty of 
time for your executor to send the status update to the slave.

On Sat, Sep 6, 2014 at 6:56 AM, Whitney Sorenson wsoren...@hubspot.com wrote:
We already pad the JVM and make room for our executor, and we try to get users 
to give the correct allowances. 

However, to be fair, your answer to my question about how Mesos is handling 
OOMs is to suggest we avoid them. I think we're always going to experience some 
cgroup OOMs and if we'd be better off if we had a consistent way of handling 
them.


On Fri, Sep 5, 2014 at 3:20 PM, Tomas Barton barton.to...@gmail.com wrote:
There is some overhead for the JVM itself, which should be added to the total 
usage of memory for the task. So you can't have the same amount of memory for 
the task as you pass to java, -Xmx parameter.


On 2 September 2014 20:43, Benjamin Mahler benjamin.mah...@gmail.com wrote:
Looks like you're using the JVM, can you set all of your JVM flags to limit the 
memory consumption? This would favor an OutOfMemoryError instead of OOMing the 
cgroup.


On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson wsoren...@hubspot.com wrote:
Recently, I've seen at least one case where a process inside of a task inside 
of a cgroup exceeded memory limits and the process was killed directly. The 
executor recognized the process was killed and sent a TASK_FAILED. However, it 
seems far more common to see the executor process itself destroyed and the 
mesos slave (I'm making some assumptions here about how it all works) sends a 
TASK_FAILED which includes information about the memory usage.

Is there something we can do to make this behavior more consistent?

Alternatively, can we provide some functionality to hook into so we don't need 
to duplicate the work of the mesos slave in order to provide the same 
information in the TASK_FAILED message? I think users would like to know 
definitively that the task OOM'd, whereas in the case where the underlying task 
is killed it may take a lot of digging to find the underlying cause if you 
aren't looking for it.

-Whitney

Here are relevant lines from messages in case something else is amiss:

Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067321] Task in 
/mesos/2dda5398-6aa6-49bb-8904-37548eae837e killed as a result of limit of 
/mesos/2dda5398-6aa6-49bb-8904-37548eae837e

Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067334] memory: usage 
917420kB, limit 917504kB, failcnt 106672
Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.066947] java7 invoked 
oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0










Re: OOM not always detected by Mesos Slave

2014-09-12 Thread Benjamin Mahler
+Ian

Sorry for the delay, when your cgroup OOMs a few things will occur:

(1) The kernel will notify mesos-slave about the OOM event.
(2) The kernel's OOM killer will pick a process in your cgroup to kill.
(3) Once notified, mesos-slave will begin destroying the cgroup.
(4) Once the executor terminates, any tasks that were non-terminal on the
executor will have status updates sent with the OOM message.

This does not all happen atomically, so it is possible that the kernel
kills your task process and your executor sends a status update before the
slave completes the destruction of the cgroup.

Userspace OOM handling is supported, and we tried using it in the past, but
it is not reliable:

https://issues.apache.org/jira/browse/MESOS-662
http://lwn.net/Articles/317814/
http://lwn.net/Articles/552789/
http://lwn.net/Articles/590960/
http://lwn.net/Articles/591990/

Since you have the luxury of avoiding the OOM killer (JVM flags w/
padding), I would recommend leveraging that for now.

Do you have the logs for your issue? My guess is that it took time for us
to destroy the cgroup (possibly due to freezer issues) and so there was
plenty of time for your executor to send the status update to the slave.

On Sat, Sep 6, 2014 at 6:56 AM, Whitney Sorenson wsoren...@hubspot.com
wrote:

 We already pad the JVM and make room for our executor, and we try to get
 users to give the correct allowances.

 However, to be fair, your answer to my question about how Mesos is
 handling OOMs is to suggest we avoid them. I think we're always going to
 experience some cgroup OOMs and if we'd be better off if we had a
 consistent way of handling them.


 On Fri, Sep 5, 2014 at 3:20 PM, Tomas Barton barton.to...@gmail.com
 wrote:

 There is some overhead for the JVM itself, which should be added to the
 total usage of memory for the task. So you can't have the same amount of
 memory for the task as you pass to java, -Xmx parameter.


 On 2 September 2014 20:43, Benjamin Mahler benjamin.mah...@gmail.com
 wrote:

 Looks like you're using the JVM, can you set all of your JVM flags to
 limit the memory consumption? This would favor an OutOfMemoryError instead
 of OOMing the cgroup.


 On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson wsoren...@hubspot.com
  wrote:

 Recently, I've seen at least one case where a process inside of a task
 inside of a cgroup exceeded memory limits and the process was killed
 directly. The executor recognized the process was killed and sent a
 TASK_FAILED. However, it seems far more common to see the executor process
 itself destroyed and the mesos slave (I'm making some assumptions here
 about how it all works) sends a TASK_FAILED which includes information
 about the memory usage.

 Is there something we can do to make this behavior more consistent?

 Alternatively, can we provide some functionality to hook into so we
 don't need to duplicate the work of the mesos slave in order to provide the
 same information in the TASK_FAILED message? I think users would like to
 know definitively that the task OOM'd, whereas in the case where the
 underlying task is killed it may take a lot of digging to find the
 underlying cause if you aren't looking for it.

 -Whitney

 Here are relevant lines from messages in case something else is amiss:

 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067321] Task in
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e killed as a result of limit of
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067334] memory:
 usage 917420kB, limit 917504kB, failcnt 106672
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.066947] java7
 invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0









Re: OOM not always detected by Mesos Slave

2014-09-05 Thread Tomas Barton
There is some overhead for the JVM itself, which should be added to the
total usage of memory for the task. So you can't have the same amount of
memory for the task as you pass to java, -Xmx parameter.


On 2 September 2014 20:43, Benjamin Mahler benjamin.mah...@gmail.com
wrote:

 Looks like you're using the JVM, can you set all of your JVM flags to
 limit the memory consumption? This would favor an OutOfMemoryError instead
 of OOMing the cgroup.


 On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson wsoren...@hubspot.com
 wrote:

 Recently, I've seen at least one case where a process inside of a task
 inside of a cgroup exceeded memory limits and the process was killed
 directly. The executor recognized the process was killed and sent a
 TASK_FAILED. However, it seems far more common to see the executor process
 itself destroyed and the mesos slave (I'm making some assumptions here
 about how it all works) sends a TASK_FAILED which includes information
 about the memory usage.

 Is there something we can do to make this behavior more consistent?

 Alternatively, can we provide some functionality to hook into so we don't
 need to duplicate the work of the mesos slave in order to provide the same
 information in the TASK_FAILED message? I think users would like to know
 definitively that the task OOM'd, whereas in the case where the underlying
 task is killed it may take a lot of digging to find the underlying cause if
 you aren't looking for it.

 -Whitney

 Here are relevant lines from messages in case something else is amiss:

 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067321] Task in
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e killed as a result of limit of
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067334] memory: usage
 917420kB, limit 917504kB, failcnt 106672
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.066947] java7 invoked
 oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0







Re: OOM not always detected by Mesos Slave

2014-09-02 Thread Benjamin Mahler
Looks like you're using the JVM, can you set all of your JVM flags to limit
the memory consumption? This would favor an OutOfMemoryError instead of
OOMing the cgroup.


On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson wsoren...@hubspot.com
wrote:

 Recently, I've seen at least one case where a process inside of a task
 inside of a cgroup exceeded memory limits and the process was killed
 directly. The executor recognized the process was killed and sent a
 TASK_FAILED. However, it seems far more common to see the executor process
 itself destroyed and the mesos slave (I'm making some assumptions here
 about how it all works) sends a TASK_FAILED which includes information
 about the memory usage.

 Is there something we can do to make this behavior more consistent?

 Alternatively, can we provide some functionality to hook into so we don't
 need to duplicate the work of the mesos slave in order to provide the same
 information in the TASK_FAILED message? I think users would like to know
 definitively that the task OOM'd, whereas in the case where the underlying
 task is killed it may take a lot of digging to find the underlying cause if
 you aren't looking for it.

 -Whitney

 Here are relevant lines from messages in case something else is amiss:

 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067321] Task in
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e killed as a result of limit of
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067334] memory: usage
 917420kB, limit 917504kB, failcnt 106672
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.066947] java7 invoked
 oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0






OOM not always detected by Mesos Slave

2014-08-28 Thread Whitney Sorenson
Recently, I've seen at least one case where a process inside of a task
inside of a cgroup exceeded memory limits and the process was killed
directly. The executor recognized the process was killed and sent a
TASK_FAILED. However, it seems far more common to see the executor process
itself destroyed and the mesos slave (I'm making some assumptions here
about how it all works) sends a TASK_FAILED which includes information
about the memory usage.

Is there something we can do to make this behavior more consistent?

Alternatively, can we provide some functionality to hook into so we don't
need to duplicate the work of the mesos slave in order to provide the same
information in the TASK_FAILED message? I think users would like to know
definitively that the task OOM'd, whereas in the case where the underlying
task is killed it may take a lot of digging to find the underlying cause if
you aren't looking for it.

-Whitney

Here are relevant lines from messages in case something else is amiss:

Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067321] Task in
/mesos/2dda5398-6aa6-49bb-8904-37548eae837e killed as a result of limit of
/mesos/2dda5398-6aa6-49bb-8904-37548eae837e
Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067334] memory: usage
917420kB, limit 917504kB, failcnt 106672
Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.066947] java7 invoked
oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0