Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread Kamil Chmielewski
SIGKILL can't be caught.

2016-02-12 11:29 GMT+01:00 haosdent :

> >Is there a specific reason why the slave does not first send a TERM
> signal, and if that does not help after a certain timeout, send a KILL
> signal?
> >That would give us a chance to cleanup consul registrations (and other
> cleanup).
> I think maybe this flow more complex? How about you register a KILL signal
> listener to cleanup consul registration?
>
>
> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske 
> wrote:
>
>> Hi,
>>
>> we have a Mesos (0.27) cluster running with (here relevant) slave options:
>> --cgroups_enable_cfs=true
>> --cgroups_limit_swap=true
>> --isolation=cgroups/cpu,cgroups/mem
>>
>> What we see happening is that people are running Tasks (Java
>> applications) and specify a memory resource limit that is too low, which
>> cause these tasks to be terminated, see logs below.
>> That's all fine, after all you should specify reasonable memory limits.
>> It looks like the slave sends a KILL signal when the limit is reached, so
>> the application has no chance to do recovery termination, which (in our
>> case) results in consul registrations not being cleaned up.
>> Is there a specific reason why the slave does not first send a TERM
>> signal, and if that does not help after a certain timeout, send a KILL
>> signal?
>> That would give us a chance to cleanup consul registrations (and other
>> cleanup).
>>
>> kind regards,
>> Harry
>>
>>
>> I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container
>> bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource
>> mem(*):160 and will be terminated
>>
>> I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container
>> 'bed2585a-c361-4c66-afd9-69e70e748ae2'
>>
>> I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup
>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>>
>> I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup
>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
>> 104.21376ms
>>
>> I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup
>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>>
>> I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed cgroup
>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
>> 2.123008ms
>>
>> I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@
>> 10.239.204.142:43950 exited
>>
>> I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for
>> container 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited
>>
>> I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy request
>> for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2
>>
>> I0212 09:27:49.389853 11062 slave.cpp:3816] Executor
>> 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework
>> 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>


memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread Harry Metske
Hi,

we have a Mesos (0.27) cluster running with (here relevant) slave options:
--cgroups_enable_cfs=true
--cgroups_limit_swap=true
--isolation=cgroups/cpu,cgroups/mem

What we see happening is that people are running Tasks (Java applications)
and specify a memory resource limit that is too low, which cause these
tasks to be terminated, see logs below.
That's all fine, after all you should specify reasonable memory limits.
It looks like the slave sends a KILL signal when the limit is reached, so
the application has no chance to do recovery termination, which (in our
case) results in consul registrations not being cleaned up.
Is there a specific reason why the slave does not first send a TERM signal,
and if that does not help after a certain timeout, send a KILL signal?
That would give us a chance to cleanup consul registrations (and other
cleanup).

kind regards,
Harry


I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container
bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource
mem(*):160 and will be terminated

I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container
'bed2585a-c361-4c66-afd9-69e70e748ae2'

I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup
/sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2

I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup
/sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
104.21376ms

I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup
/sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2

I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed cgroup
/sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
2.123008ms

I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@10.239.204.142:43950
exited

I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for container
'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited

I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy request
for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2

I0212 09:27:49.389853 11062 slave.cpp:3816] Executor
'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework
7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed


Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread Shuai Lin
I'm not familiar with why SIGKILL is sent directly without SIGTERM, but is
it possible to have your consul registry cleaned up when task killed by
adding consul health checks?

On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske 
wrote:

> Hi,
>
> we have a Mesos (0.27) cluster running with (here relevant) slave options:
> --cgroups_enable_cfs=true
> --cgroups_limit_swap=true
> --isolation=cgroups/cpu,cgroups/mem
>
> What we see happening is that people are running Tasks (Java applications)
> and specify a memory resource limit that is too low, which cause these
> tasks to be terminated, see logs below.
> That's all fine, after all you should specify reasonable memory limits.
> It looks like the slave sends a KILL signal when the limit is reached, so
> the application has no chance to do recovery termination, which (in our
> case) results in consul registrations not being cleaned up.
> Is there a specific reason why the slave does not first send a TERM
> signal, and if that does not help after a certain timeout, send a KILL
> signal?
> That would give us a chance to cleanup consul registrations (and other
> cleanup).
>
> kind regards,
> Harry
>
>
> I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container
> bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource
> mem(*):160 and will be terminated
>
> I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container
> 'bed2585a-c361-4c66-afd9-69e70e748ae2'
>
> I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup
> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>
> I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup
> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
> 104.21376ms
>
> I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup
> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>
> I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed cgroup
> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
> 2.123008ms
>
> I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@
> 10.239.204.142:43950 exited
>
> I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for container
> 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited
>
> I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy request
> for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2
>
> I0212 09:27:49.389853 11062 slave.cpp:3816] Executor
> 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework
> 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed
>


Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread haosdent
>Is there a specific reason why the slave does not first send a TERM
signal, and if that does not help after a certain timeout, send a KILL
signal?
>That would give us a chance to cleanup consul registrations (and other
cleanup).
I think maybe this flow more complex? How about you register a KILL signal
listener to cleanup consul registration?


On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske 
wrote:

> Hi,
>
> we have a Mesos (0.27) cluster running with (here relevant) slave options:
> --cgroups_enable_cfs=true
> --cgroups_limit_swap=true
> --isolation=cgroups/cpu,cgroups/mem
>
> What we see happening is that people are running Tasks (Java applications)
> and specify a memory resource limit that is too low, which cause these
> tasks to be terminated, see logs below.
> That's all fine, after all you should specify reasonable memory limits.
> It looks like the slave sends a KILL signal when the limit is reached, so
> the application has no chance to do recovery termination, which (in our
> case) results in consul registrations not being cleaned up.
> Is there a specific reason why the slave does not first send a TERM
> signal, and if that does not help after a certain timeout, send a KILL
> signal?
> That would give us a chance to cleanup consul registrations (and other
> cleanup).
>
> kind regards,
> Harry
>
>
> I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container
> bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource
> mem(*):160 and will be terminated
>
> I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container
> 'bed2585a-c361-4c66-afd9-69e70e748ae2'
>
> I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup
> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>
> I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup
> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
> 104.21376ms
>
> I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup
> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>
> I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed cgroup
> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
> 2.123008ms
>
> I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@
> 10.239.204.142:43950 exited
>
> I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for container
> 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited
>
> I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy request
> for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2
>
> I0212 09:27:49.389853 11062 slave.cpp:3816] Executor
> 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework
> 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed
>



-- 
Best Regards,
Haosdent Huang


Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread haosdent
>I'm not familiar with why SIGKILL is sent directly without SIGTERM
We send KILL in both posix_launcher and linux_launcher
https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/launcher.cpp#L170
https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1566

>SIGKILL can't be caught.
Seems could not cleanup consul registrations when receive killed in
MesosContainerizer. Do you try DockerContainerizer? I think "docker stop"
would send TERM first.

On Fri, Feb 12, 2016 at 6:33 PM, Kamil Chmielewski 
wrote:

> SIGKILL can't be caught.
>
> 2016-02-12 11:29 GMT+01:00 haosdent :
>
>> >Is there a specific reason why the slave does not first send a TERM
>> signal, and if that does not help after a certain timeout, send a KILL
>> signal?
>> >That would give us a chance to cleanup consul registrations (and other
>> cleanup).
>> I think maybe this flow more complex? How about you register a KILL
>> signal listener to cleanup consul registration?
>>
>>
>> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske 
>> wrote:
>>
>>> Hi,
>>>
>>> we have a Mesos (0.27) cluster running with (here relevant) slave
>>> options:
>>> --cgroups_enable_cfs=true
>>> --cgroups_limit_swap=true
>>> --isolation=cgroups/cpu,cgroups/mem
>>>
>>> What we see happening is that people are running Tasks (Java
>>> applications) and specify a memory resource limit that is too low, which
>>> cause these tasks to be terminated, see logs below.
>>> That's all fine, after all you should specify reasonable memory limits.
>>> It looks like the slave sends a KILL signal when the limit is reached,
>>> so the application has no chance to do recovery termination, which (in our
>>> case) results in consul registrations not being cleaned up.
>>> Is there a specific reason why the slave does not first send a TERM
>>> signal, and if that does not help after a certain timeout, send a KILL
>>> signal?
>>> That would give us a chance to cleanup consul registrations (and other
>>> cleanup).
>>>
>>> kind regards,
>>> Harry
>>>
>>>
>>> I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container
>>> bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource
>>> mem(*):160 and will be terminated
>>>
>>> I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container
>>> 'bed2585a-c361-4c66-afd9-69e70e748ae2'
>>>
>>> I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup
>>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>>>
>>> I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup
>>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
>>> 104.21376ms
>>>
>>> I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup
>>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>>>
>>> I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed
>>> cgroup /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>>> after 2.123008ms
>>>
>>> I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@
>>> 10.239.204.142:43950 exited
>>>
>>> I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for
>>> container 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited
>>>
>>> I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy
>>> request for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2
>>>
>>> I0212 09:27:49.389853 11062 slave.cpp:3816] Executor
>>> 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework
>>> 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


-- 
Best Regards,
Haosdent Huang


Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread Kamil Chmielewski
>
> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske 
>>> wrote:
>>>

 Is there a specific reason why the slave does not first send a TERM
 signal, and if that does not help after a certain timeout, send a KILL
 signal?
 That would give us a chance to cleanup consul registrations (and other
 cleanup).


First of all it's wrong that you want to handle memory limit in your app.
Things like this are outside of its scope. Your app can be lost because
many different system or hardware failures that you just can't caught. You
need to let it crash and design your architecture with this in mind.
Secondly Mesos SIGKILL is consistent with linux OOM killer and it do the
right thing
https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586

Best regards,
Kamil


Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread Harry Metske
We don't want to use Docker (yet) in this environment, so DockerContainerizer
is not an option.
After thinking a bit longer, I tend to agree with Kamil and let the problem
be handled differently.

Thanks for the amazing fast responses!

kind regards,
Harry


On 12 February 2016 at 12:28, Kamil Chmielewski  wrote:

> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske 
 wrote:

>
> Is there a specific reason why the slave does not first send a TERM
> signal, and if that does not help after a certain timeout, send a KILL
> signal?
> That would give us a chance to cleanup consul registrations (and other
> cleanup).
>
>
> First of all it's wrong that you want to handle memory limit in your app.
> Things like this are outside of its scope. Your app can be lost because
> many different system or hardware failures that you just can't caught. You
> need to let it crash and design your architecture with this in mind.
> Secondly Mesos SIGKILL is consistent with linux OOM killer and it do the
> right thing
> https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586
>
> Best regards,
> Kamil
>


Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread David J. Palaitis
In larger deployments, with many applications, you may not always be able
to ask good memory practices from app developers. We've found that
reporting *why* a job was killed, with details of container utilization, is
an effective way of helping app developers get better at mem mgmt.

The alternative, just having jobs die, incentives bad behaviors. For
example, a hurried job owner may just double memory of the executor,
trading slack for stability.

On Fri, Feb 12, 2016 at 6:36 AM Harry Metske  wrote:

> We don't want to use Docker (yet) in this environment, so DockerContainerizer
> is not an option.
> After thinking a bit longer, I tend to agree with Kamil and let the
> problem be handled differently.
>
> Thanks for the amazing fast responses!
>
> kind regards,
> Harry
>
>
> On 12 February 2016 at 12:28, Kamil Chmielewski 
> wrote:
>
>> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske 
> wrote:
>
>>
>> Is there a specific reason why the slave does not first send a TERM
>> signal, and if that does not help after a certain timeout, send a KILL
>> signal?
>> That would give us a chance to cleanup consul registrations (and
>> other cleanup).
>>
>>
>> First of all it's wrong that you want to handle memory limit in your app.
>> Things like this are outside of its scope. Your app can be lost because
>> many different system or hardware failures that you just can't caught. You
>> need to let it crash and design your architecture with this in mind.
>> Secondly Mesos SIGKILL is consistent with linux OOM killer and it do the
>> right thing
>> https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586
>>
>> Best regards,
>> Kamil
>>
>
>


Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread Vinod Kone
+1 to what kamil said. That is exactly the reason why we designed it that way. 

Also, the why is included in the status update message. 

@vinodkone

> On Feb 12, 2016, at 6:08 AM, David J. Palaitis  
> wrote:
> 
> In larger deployments, with many applications, you may not always be able to 
> ask good memory practices from app developers. We've found that reporting 
> *why* a job was killed, with details of container utilization, is an 
> effective way of helping app developers get better at mem mgmt. 
> 
> The alternative, just having jobs die, incentives bad behaviors. For example, 
> a hurried job owner may just double memory of the executor, trading slack for 
> stability. 
> 
>> On Fri, Feb 12, 2016 at 6:36 AM Harry Metske  wrote:
>> We don't want to use Docker (yet) in this environment, so 
>> DockerContainerizer is not an option.
>> After thinking a bit longer, I tend to agree with Kamil and let the problem 
>> be handled differently.
>> 
>> Thanks for the amazing fast responses!
>> 
>> kind regards,
>> Harry
>> 
>> 
>> On 12 February 2016 at 12:28, Kamil Chmielewski  wrote:
>>> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske  
>>> wrote:
>> 
>>> 
>>> Is there a specific reason why the slave does not first send a TERM 
>>> signal, and if that does not help after a certain timeout, send a KILL 
>>> signal?
>>> That would give us a chance to cleanup consul registrations (and other 
>>> cleanup).
>>> 
>>> First of all it's wrong that you want to handle memory limit in your app. 
>>> Things like this are outside of its scope. Your app can be lost because 
>>> many different system or hardware failures that you just can't caught. You 
>>> need to let it crash and design your architecture with this in mind.
>>> Secondly Mesos SIGKILL is consistent with linux OOM killer and it do the 
>>> right thing 
>>> https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586
>>> 
>>> Best regards,
>>> Kamil


Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread Harry Metske
David,

that's exactly the scenario I am afraid of, developers specifying way too
large memory requirements, just to make sure their tasks don't get killed.
Any suggestions on how to report this *why* to the developers, as far as I
know the only place where you find the reason is in the logfile of the
slave, the UI only tells the task failed, not the reason.

(we could put some logfile monitoring in place picking up these messages of
course, but if there are better ways, we are always interested)

kind regards,
Harry


On 12 February 2016 at 15:08, David J. Palaitis 
wrote:

> In larger deployments, with many applications, you may not always be able
> to ask good memory practices from app developers. We've found that
> reporting *why* a job was killed, with details of container utilization, is
> an effective way of helping app developers get better at mem mgmt.
>
> The alternative, just having jobs die, incentives bad behaviors. For
> example, a hurried job owner may just double memory of the executor,
> trading slack for stability.
>
> On Fri, Feb 12, 2016 at 6:36 AM Harry Metske 
> wrote:
>
>> We don't want to use Docker (yet) in this environment, so DockerContainerizer
>> is not an option.
>> After thinking a bit longer, I tend to agree with Kamil and let the
>> problem be handled differently.
>>
>> Thanks for the amazing fast responses!
>>
>> kind regards,
>> Harry
>>
>>
>> On 12 February 2016 at 12:28, Kamil Chmielewski 
>> wrote:
>>
>>> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske 
>> wrote:
>>
>>>
>>> Is there a specific reason why the slave does not first send a TERM
>>> signal, and if that does not help after a certain timeout, send a KILL
>>> signal?
>>> That would give us a chance to cleanup consul registrations (and
>>> other cleanup).
>>>
>>>
>>> First of all it's wrong that you want to handle memory limit in your
>>> app. Things like this are outside of its scope. Your app can be lost
>>> because many different system or hardware failures that you just can't
>>> caught. You need to let it crash and design your architecture with this in
>>> mind.
>>> Secondly Mesos SIGKILL is consistent with linux OOM killer and it do the
>>> right thing
>>> https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586
>>>
>>> Best regards,
>>> Kamil
>>>
>>
>>


Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread Erik Weathers
hey Harry,

As Vinod said, the mesos-slave/agent will issue a status update about the
OOM condition.  This will be received by the scheduler of the framework.
In the storm-mesos framework we just log the messages (see below), but you
might consider somehow exposing these messages directly to the app owners:

Received status update:
{"task_id":"TASK_ID","slave_id":"20150806-001422-1801655306-5050-22041-S65","state":"TASK_FAILED","message":"Memory
limit exceeded: Requested: 2200MB Maximum Used: 2200MB\n\nMEMORY
STATISTICS: \ncache 20480\nrss 1811943424\nmapped_file 0\npgpgin
8777434\npgpgout 8805691\nswap 96878592\ninactive_anon
644186112\nactive_anon 1357594624\ninactive_file 20480\nactive_file
0\nunevictable 0\nhierarchical_memory_limit
2306867200\nhierarchical_memsw_limit 9223372036854775807\ntotal_cache
20480\ntotal_rss 1811943424\ntotal_mapped_file 0\ntotal_pgpgin
8777434\ntotal_pgpgout 8805691\ntotal_swap 96878592\ntotal_inactive_anon
644186112\ntotal_active_anon 1355497472\ntotal_inactive_file
20480\ntotal_active_file 0\ntotal_unevictable 0"}

- Erik

On Fri, Feb 12, 2016 at 10:24 AM, Harry Metske 
wrote:

> David,
>
> that's exactly the scenario I am afraid of, developers specifying way too
> large memory requirements, just to make sure their tasks don't get killed.
> Any suggestions on how to report this *why* to the developers, as far as I
> know the only place where you find the reason is in the logfile of the
> slave, the UI only tells the task failed, not the reason.
>
> (we could put some logfile monitoring in place picking up these messages
> of course, but if there are better ways, we are always interested)
>
> kind regards,
> Harry
>
>
> On 12 February 2016 at 15:08, David J. Palaitis <
> david.j.palai...@gmail.com> wrote:
>
>> In larger deployments, with many applications, you may not always be able
>> to ask good memory practices from app developers. We've found that
>> reporting *why* a job was killed, with details of container utilization, is
>> an effective way of helping app developers get better at mem mgmt.
>>
>> The alternative, just having jobs die, incentives bad behaviors. For
>> example, a hurried job owner may just double memory of the executor,
>> trading slack for stability.
>>
>> On Fri, Feb 12, 2016 at 6:36 AM Harry Metske 
>> wrote:
>>
>>> We don't want to use Docker (yet) in this environment, so 
>>> DockerContainerizer
>>> is not an option.
>>> After thinking a bit longer, I tend to agree with Kamil and let the
>>> problem be handled differently.
>>>
>>> Thanks for the amazing fast responses!
>>>
>>> kind regards,
>>> Harry
>>>
>>>
>>> On 12 February 2016 at 12:28, Kamil Chmielewski 
>>> wrote:
>>>
 On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske 
>>> wrote:
>>>

 Is there a specific reason why the slave does not first send a TERM
 signal, and if that does not help after a certain timeout, send a KILL
 signal?
 That would give us a chance to cleanup consul registrations (and
 other cleanup).


 First of all it's wrong that you want to handle memory limit in your
 app. Things like this are outside of its scope. Your app can be lost
 because many different system or hardware failures that you just can't
 caught. You need to let it crash and design your architecture with this in
 mind.
 Secondly Mesos SIGKILL is consistent with linux OOM killer and it do
 the right thing
 https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586

 Best regards,
 Kamil

>>>
>>>
>


Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread Kamil Chmielewski
2016-02-12 19:41 GMT+01:00 Erik Weathers :

> hey Harry,
>
> As Vinod said, the mesos-slave/agent will issue a status update about the
> OOM condition.  This will be received by the scheduler of the framework.
> In the storm-mesos framework we just log the messages (see below), but you
> might consider somehow exposing these messages directly to the app owners:
>
> Received status update:
> {"task_id":"TASK_ID","slave_id":"20150806-001422-1801655306-5050-22041-S65","state":"TASK_FAILED","message":"Memory
> limit exceeded: Requested: 2200MB Maximum Used: 2200MB\n\nMEMORY
> STATISTICS: \ncache 20480\nrss 1811943424\nmapped_file 0\npgpgin
> 8777434\npgpgout 8805691\nswap 96878592\ninactive_anon
> 644186112\nactive_anon 1357594624\ninactive_file 20480\nactive_file
> 0\nunevictable 0\nhierarchical_memory_limit
> 2306867200\nhierarchical_memsw_limit 9223372036854775807\ntotal_cache
> 20480\ntotal_rss 1811943424\ntotal_mapped_file 0\ntotal_pgpgin
> 8777434\ntotal_pgpgout 8805691\ntotal_swap 96878592\ntotal_inactive_anon
> 644186112\ntotal_active_anon 1355497472\ntotal_inactive_file
> 20480\ntotal_active_file 0\ntotal_unevictable 0"}
>
> -
>

Marathon also presents this information. Developers will see it on Debug
tab in Last Task Failure Section.

Best Regards,
Kamil


Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread David J. Palaitis
>> we could put some logfile monitoring in place picking up these messages
of course

that's about what we came up with.

>> the mesos-slave/agent will issue a status update about the OOM
condition.

ok. definitely missed that one - this will help alot. thanks @vinod


On Fri, Feb 12, 2016 at 2:41 PM, Harry Metske 
wrote:

> Yup, I just noticed it's there :-)
>
> tx,
> Harry
>
>
> On 12 February 2016 at 20:38, Kamil Chmielewski 
> wrote:
>
>> 2016-02-12 19:41 GMT+01:00 Erik Weathers :
>>
>>> hey Harry,
>>>
>>> As Vinod said, the mesos-slave/agent will issue a status update about
>>> the OOM condition.  This will be received by the scheduler of the
>>> framework.  In the storm-mesos framework we just log the messages (see
>>> below), but you might consider somehow exposing these messages directly to
>>> the app owners:
>>>
>>> Received status update:
>>> {"task_id":"TASK_ID","slave_id":"20150806-001422-1801655306-5050-22041-S65","state":"TASK_FAILED","message":"Memory
>>> limit exceeded: Requested: 2200MB Maximum Used: 2200MB\n\nMEMORY
>>> STATISTICS: \ncache 20480\nrss 1811943424\nmapped_file 0\npgpgin
>>> 8777434\npgpgout 8805691\nswap 96878592\ninactive_anon
>>> 644186112\nactive_anon 1357594624\ninactive_file 20480\nactive_file
>>> 0\nunevictable 0\nhierarchical_memory_limit
>>> 2306867200\nhierarchical_memsw_limit 9223372036854775807\ntotal_cache
>>> 20480\ntotal_rss 1811943424\ntotal_mapped_file 0\ntotal_pgpgin
>>> 8777434\ntotal_pgpgout 8805691\ntotal_swap 96878592\ntotal_inactive_anon
>>> 644186112\ntotal_active_anon 1355497472\ntotal_inactive_file
>>> 20480\ntotal_active_file 0\ntotal_unevictable 0"}
>>>
>>> -
>>>
>>
>> Marathon also presents this information. Developers will see it on Debug
>> tab in Last Task Failure Section.
>>
>> Best Regards,
>> Kamil
>>
>
>