Re: memory limit exceeded ==> KILL instead of TERM (first)

haosdent Fri, 12 Feb 2016 02:40:45 -0800

>I'm not familiar with why SIGKILL is sent directly without SIGTERM
We send KILL in both posix_launcher and linux_launcher
https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/launcher.cpp#L170
https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1566


>SIGKILL can't be caught.
Seems could not cleanup consul registrations when receive killed in
MesosContainerizer. Do you try DockerContainerizer? I think "docker stop"
would send TERM first.

On Fri, Feb 12, 2016 at 6:33 PM, Kamil Chmielewski <[email protected]>
wrote:

> SIGKILL can't be caught.
>
> 2016-02-12 11:29 GMT+01:00 haosdent <[email protected]>:
>
>> >Is there a specific reason why the slave does not first send a TERM
>> signal, and if that does not help after a certain timeout, send a KILL
>> signal?
>> >That would give us a chance to cleanup consul registrations (and other
>> cleanup).
>> I think maybe this flow more complex? How about you register a KILL
>> signal listener to cleanup consul registration?
>>
>>
>> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> we have a Mesos (0.27) cluster running with (here relevant) slave
>>> options:
>>> --cgroups_enable_cfs=true
>>> --cgroups_limit_swap=true
>>> --isolation=cgroups/cpu,cgroups/mem
>>>
>>> What we see happening is that people are running Tasks (Java
>>> applications) and specify a memory resource limit that is too low, which
>>> cause these tasks to be terminated, see logs below.
>>> That's all fine, after all you should specify reasonable memory limits.
>>> It looks like the slave sends a KILL signal when the limit is reached,
>>> so the application has no chance to do recovery termination, which (in our
>>> case) results in consul registrations not being cleaned up.
>>> Is there a specific reason why the slave does not first send a TERM
>>> signal, and if that does not help after a certain timeout, send a KILL
>>> signal?
>>> That would give us a chance to cleanup consul registrations (and other
>>> cleanup).
>>>
>>> kind regards,
>>> Harry
>>>
>>>
>>> I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container
>>> bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource
>>> mem(*):160 and will be terminated
>>>
>>> I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container
>>> 'bed2585a-c361-4c66-afd9-69e70e748ae2'
>>>
>>> I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup
>>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>>>
>>> I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup
>>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
>>> 104.21376ms
>>>
>>> I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup
>>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>>>
>>> I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed
>>> cgroup /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>>> after 2.123008ms
>>>
>>> I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@
>>> 10.239.204.142:43950 exited
>>>
>>> I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for
>>> container 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited
>>>
>>> I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy
>>> request for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2
>>>
>>> I0212 09:27:49.389853 11062 slave.cpp:3816] Executor
>>> 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework
>>> 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


-- 
Best Regards,
Haosdent Huang

Re: memory limit exceeded ==> KILL instead of TERM (first)

Reply via email to