>I'm not familiar with why SIGKILL is sent directly without SIGTERM We send KILL in both posix_launcher and linux_launcher https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/launcher.cpp#L170 https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1566
>SIGKILL can't be caught. Seems could not cleanup consul registrations when receive killed in MesosContainerizer. Do you try DockerContainerizer? I think "docker stop" would send TERM first. On Fri, Feb 12, 2016 at 6:33 PM, Kamil Chmielewski <[email protected]> wrote: > SIGKILL can't be caught. > > 2016-02-12 11:29 GMT+01:00 haosdent <[email protected]>: > >> >Is there a specific reason why the slave does not first send a TERM >> signal, and if that does not help after a certain timeout, send a KILL >> signal? >> >That would give us a chance to cleanup consul registrations (and other >> cleanup). >> I think maybe this flow more complex? How about you register a KILL >> signal listener to cleanup consul registration? >> >> >> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske <[email protected]> >> wrote: >> >>> Hi, >>> >>> we have a Mesos (0.27) cluster running with (here relevant) slave >>> options: >>> --cgroups_enable_cfs=true >>> --cgroups_limit_swap=true >>> --isolation=cgroups/cpu,cgroups/mem >>> >>> What we see happening is that people are running Tasks (Java >>> applications) and specify a memory resource limit that is too low, which >>> cause these tasks to be terminated, see logs below. >>> That's all fine, after all you should specify reasonable memory limits. >>> It looks like the slave sends a KILL signal when the limit is reached, >>> so the application has no chance to do recovery termination, which (in our >>> case) results in consul registrations not being cleaned up. >>> Is there a specific reason why the slave does not first send a TERM >>> signal, and if that does not help after a certain timeout, send a KILL >>> signal? >>> That would give us a chance to cleanup consul registrations (and other >>> cleanup). >>> >>> kind regards, >>> Harry >>> >>> >>> I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container >>> bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource >>> mem(*):160 and will be terminated >>> >>> I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container >>> 'bed2585a-c361-4c66-afd9-69e70e748ae2' >>> >>> I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup >>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 >>> >>> I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup >>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after >>> 104.21376ms >>> >>> I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup >>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 >>> >>> I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed >>> cgroup /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 >>> after 2.123008ms >>> >>> I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@ >>> 10.239.204.142:43950 exited >>> >>> I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for >>> container 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited >>> >>> I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy >>> request for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2 >>> >>> I0212 09:27:49.389853 11062 slave.cpp:3816] Executor >>> 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework >>> 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed >>> >> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > > -- Best Regards, Haosdent Huang

