I'm not familiar with why SIGKILL is sent directly without SIGTERM, but is it possible to have your consul registry cleaned up when task killed by adding consul health checks?
On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske <[email protected]> wrote: > Hi, > > we have a Mesos (0.27) cluster running with (here relevant) slave options: > --cgroups_enable_cfs=true > --cgroups_limit_swap=true > --isolation=cgroups/cpu,cgroups/mem > > What we see happening is that people are running Tasks (Java applications) > and specify a memory resource limit that is too low, which cause these > tasks to be terminated, see logs below. > That's all fine, after all you should specify reasonable memory limits. > It looks like the slave sends a KILL signal when the limit is reached, so > the application has no chance to do recovery termination, which (in our > case) results in consul registrations not being cleaned up. > Is there a specific reason why the slave does not first send a TERM > signal, and if that does not help after a certain timeout, send a KILL > signal? > That would give us a chance to cleanup consul registrations (and other > cleanup). > > kind regards, > Harry > > > I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container > bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource > mem(*):160 and will be terminated > > I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container > 'bed2585a-c361-4c66-afd9-69e70e748ae2' > > I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 > > I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after > 104.21376ms > > I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 > > I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed cgroup > /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after > 2.123008ms > > I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@ > 10.239.204.142:43950 exited > > I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for container > 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited > > I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy request > for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2 > > I0212 09:27:49.389853 11062 slave.cpp:3816] Executor > 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework > 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed >

