Hi,

we have a Mesos (0.27) cluster running with (here relevant) slave options:
--cgroups_enable_cfs=true
--cgroups_limit_swap=true
--isolation=cgroups/cpu,cgroups/mem

What we see happening is that people are running Tasks (Java applications)
and specify a memory resource limit that is too low, which cause these
tasks to be terminated, see logs below.
That's all fine, after all you should specify reasonable memory limits.
It looks like the slave sends a KILL signal when the limit is reached, so
the application has no chance to do recovery termination, which (in our
case) results in consul registrations not being cleaned up.
Is there a specific reason why the slave does not first send a TERM signal,
and if that does not help after a certain timeout, send a KILL signal?
That would give us a chance to cleanup consul registrations (and other
cleanup).

kind regards,
Harry


I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container
bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource
mem(*):160 and will be terminated

I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container
'bed2585a-c361-4c66-afd9-69e70e748ae2'

I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup
/sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2

I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup
/sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
104.21376ms

I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup
/sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2

I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed cgroup
/sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
2.123008ms

I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@10.239.204.142:43950
exited

I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for container
'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited

I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy request
for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2

I0212 09:27:49.389853 11062 slave.cpp:3816] Executor
'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework
7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed

Reply via email to