Re: memory limit exceeded ==> KILL instead of TERM (first)
SIGKILL can't be caught. 2016-02-12 11:29 GMT+01:00 haosdent: > >Is there a specific reason why the slave does not first send a TERM > signal, and if that does not help after a certain timeout, send a KILL > signal? > >That would give us a chance to cleanup consul registrations (and other > cleanup). > I think maybe this flow more complex? How about you register a KILL signal > listener to cleanup consul registration? > > > On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske > wrote: > >> Hi, >> >> we have a Mesos (0.27) cluster running with (here relevant) slave options: >> --cgroups_enable_cfs=true >> --cgroups_limit_swap=true >> --isolation=cgroups/cpu,cgroups/mem >> >> What we see happening is that people are running Tasks (Java >> applications) and specify a memory resource limit that is too low, which >> cause these tasks to be terminated, see logs below. >> That's all fine, after all you should specify reasonable memory limits. >> It looks like the slave sends a KILL signal when the limit is reached, so >> the application has no chance to do recovery termination, which (in our >> case) results in consul registrations not being cleaned up. >> Is there a specific reason why the slave does not first send a TERM >> signal, and if that does not help after a certain timeout, send a KILL >> signal? >> That would give us a chance to cleanup consul registrations (and other >> cleanup). >> >> kind regards, >> Harry >> >> >> I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container >> bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource >> mem(*):160 and will be terminated >> >> I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container >> 'bed2585a-c361-4c66-afd9-69e70e748ae2' >> >> I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup >> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 >> >> I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup >> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after >> 104.21376ms >> >> I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup >> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 >> >> I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed cgroup >> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after >> 2.123008ms >> >> I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@ >> 10.239.204.142:43950 exited >> >> I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for >> container 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited >> >> I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy request >> for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2 >> >> I0212 09:27:49.389853 11062 slave.cpp:3816] Executor >> 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework >> 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed >> > > > > -- > Best Regards, > Haosdent Huang >
memory limit exceeded ==> KILL instead of TERM (first)
Hi, we have a Mesos (0.27) cluster running with (here relevant) slave options: --cgroups_enable_cfs=true --cgroups_limit_swap=true --isolation=cgroups/cpu,cgroups/mem What we see happening is that people are running Tasks (Java applications) and specify a memory resource limit that is too low, which cause these tasks to be terminated, see logs below. That's all fine, after all you should specify reasonable memory limits. It looks like the slave sends a KILL signal when the limit is reached, so the application has no chance to do recovery termination, which (in our case) results in consul registrations not being cleaned up. Is there a specific reason why the slave does not first send a TERM signal, and if that does not help after a certain timeout, send a KILL signal? That would give us a chance to cleanup consul registrations (and other cleanup). kind regards, Harry I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource mem(*):160 and will be terminated I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container 'bed2585a-c361-4c66-afd9-69e70e748ae2' I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after 104.21376ms I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after 2.123008ms I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@10.239.204.142:43950 exited I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for container 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy request for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2 I0212 09:27:49.389853 11062 slave.cpp:3816] Executor 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed
Re: memory limit exceeded ==> KILL instead of TERM (first)
I'm not familiar with why SIGKILL is sent directly without SIGTERM, but is it possible to have your consul registry cleaned up when task killed by adding consul health checks? On Fri, Feb 12, 2016 at 6:12 PM, Harry Metskewrote: > Hi, > > we have a Mesos (0.27) cluster running with (here relevant) slave options: > --cgroups_enable_cfs=true > --cgroups_limit_swap=true > --isolation=cgroups/cpu,cgroups/mem > > What we see happening is that people are running Tasks (Java applications) > and specify a memory resource limit that is too low, which cause these > tasks to be terminated, see logs below. > That's all fine, after all you should specify reasonable memory limits. > It looks like the slave sends a KILL signal when the limit is reached, so > the application has no chance to do recovery termination, which (in our > case) results in consul registrations not being cleaned up. > Is there a specific reason why the slave does not first send a TERM > signal, and if that does not help after a certain timeout, send a KILL > signal? > That would give us a chance to cleanup consul registrations (and other > cleanup). > > kind regards, > Harry > > > I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container > bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource > mem(*):160 and will be terminated > > I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container > 'bed2585a-c361-4c66-afd9-69e70e748ae2' > > I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 > > I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after > 104.21376ms > > I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 > > I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed cgroup > /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after > 2.123008ms > > I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@ > 10.239.204.142:43950 exited > > I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for container > 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited > > I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy request > for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2 > > I0212 09:27:49.389853 11062 slave.cpp:3816] Executor > 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework > 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed >
Re: memory limit exceeded ==> KILL instead of TERM (first)
>Is there a specific reason why the slave does not first send a TERM signal, and if that does not help after a certain timeout, send a KILL signal? >That would give us a chance to cleanup consul registrations (and other cleanup). I think maybe this flow more complex? How about you register a KILL signal listener to cleanup consul registration? On Fri, Feb 12, 2016 at 6:12 PM, Harry Metskewrote: > Hi, > > we have a Mesos (0.27) cluster running with (here relevant) slave options: > --cgroups_enable_cfs=true > --cgroups_limit_swap=true > --isolation=cgroups/cpu,cgroups/mem > > What we see happening is that people are running Tasks (Java applications) > and specify a memory resource limit that is too low, which cause these > tasks to be terminated, see logs below. > That's all fine, after all you should specify reasonable memory limits. > It looks like the slave sends a KILL signal when the limit is reached, so > the application has no chance to do recovery termination, which (in our > case) results in consul registrations not being cleaned up. > Is there a specific reason why the slave does not first send a TERM > signal, and if that does not help after a certain timeout, send a KILL > signal? > That would give us a chance to cleanup consul registrations (and other > cleanup). > > kind regards, > Harry > > > I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container > bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource > mem(*):160 and will be terminated > > I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container > 'bed2585a-c361-4c66-afd9-69e70e748ae2' > > I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup > /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 > > I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after > 104.21376ms > > I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup > /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 > > I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed cgroup > /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after > 2.123008ms > > I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@ > 10.239.204.142:43950 exited > > I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for container > 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited > > I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy request > for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2 > > I0212 09:27:49.389853 11062 slave.cpp:3816] Executor > 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework > 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed > -- Best Regards, Haosdent Huang
Re: memory limit exceeded ==> KILL instead of TERM (first)
>I'm not familiar with why SIGKILL is sent directly without SIGTERM We send KILL in both posix_launcher and linux_launcher https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/launcher.cpp#L170 https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1566 >SIGKILL can't be caught. Seems could not cleanup consul registrations when receive killed in MesosContainerizer. Do you try DockerContainerizer? I think "docker stop" would send TERM first. On Fri, Feb 12, 2016 at 6:33 PM, Kamil Chmielewskiwrote: > SIGKILL can't be caught. > > 2016-02-12 11:29 GMT+01:00 haosdent : > >> >Is there a specific reason why the slave does not first send a TERM >> signal, and if that does not help after a certain timeout, send a KILL >> signal? >> >That would give us a chance to cleanup consul registrations (and other >> cleanup). >> I think maybe this flow more complex? How about you register a KILL >> signal listener to cleanup consul registration? >> >> >> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske >> wrote: >> >>> Hi, >>> >>> we have a Mesos (0.27) cluster running with (here relevant) slave >>> options: >>> --cgroups_enable_cfs=true >>> --cgroups_limit_swap=true >>> --isolation=cgroups/cpu,cgroups/mem >>> >>> What we see happening is that people are running Tasks (Java >>> applications) and specify a memory resource limit that is too low, which >>> cause these tasks to be terminated, see logs below. >>> That's all fine, after all you should specify reasonable memory limits. >>> It looks like the slave sends a KILL signal when the limit is reached, >>> so the application has no chance to do recovery termination, which (in our >>> case) results in consul registrations not being cleaned up. >>> Is there a specific reason why the slave does not first send a TERM >>> signal, and if that does not help after a certain timeout, send a KILL >>> signal? >>> That would give us a chance to cleanup consul registrations (and other >>> cleanup). >>> >>> kind regards, >>> Harry >>> >>> >>> I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container >>> bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource >>> mem(*):160 and will be terminated >>> >>> I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container >>> 'bed2585a-c361-4c66-afd9-69e70e748ae2' >>> >>> I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup >>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 >>> >>> I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup >>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after >>> 104.21376ms >>> >>> I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup >>> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 >>> >>> I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed >>> cgroup /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 >>> after 2.123008ms >>> >>> I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@ >>> 10.239.204.142:43950 exited >>> >>> I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for >>> container 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited >>> >>> I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy >>> request for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2 >>> >>> I0212 09:27:49.389853 11062 slave.cpp:3816] Executor >>> 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework >>> 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed >>> >> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > > -- Best Regards, Haosdent Huang
Re: memory limit exceeded ==> KILL instead of TERM (first)
> > On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske>>> wrote: >>> Is there a specific reason why the slave does not first send a TERM signal, and if that does not help after a certain timeout, send a KILL signal? That would give us a chance to cleanup consul registrations (and other cleanup). First of all it's wrong that you want to handle memory limit in your app. Things like this are outside of its scope. Your app can be lost because many different system or hardware failures that you just can't caught. You need to let it crash and design your architecture with this in mind. Secondly Mesos SIGKILL is consistent with linux OOM killer and it do the right thing https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586 Best regards, Kamil
Re: memory limit exceeded ==> KILL instead of TERM (first)
We don't want to use Docker (yet) in this environment, so DockerContainerizer is not an option. After thinking a bit longer, I tend to agree with Kamil and let the problem be handled differently. Thanks for the amazing fast responses! kind regards, Harry On 12 February 2016 at 12:28, Kamil Chmielewskiwrote: > On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske wrote: > > Is there a specific reason why the slave does not first send a TERM > signal, and if that does not help after a certain timeout, send a KILL > signal? > That would give us a chance to cleanup consul registrations (and other > cleanup). > > > First of all it's wrong that you want to handle memory limit in your app. > Things like this are outside of its scope. Your app can be lost because > many different system or hardware failures that you just can't caught. You > need to let it crash and design your architecture with this in mind. > Secondly Mesos SIGKILL is consistent with linux OOM killer and it do the > right thing > https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586 > > Best regards, > Kamil >
Re: memory limit exceeded ==> KILL instead of TERM (first)
In larger deployments, with many applications, you may not always be able to ask good memory practices from app developers. We've found that reporting *why* a job was killed, with details of container utilization, is an effective way of helping app developers get better at mem mgmt. The alternative, just having jobs die, incentives bad behaviors. For example, a hurried job owner may just double memory of the executor, trading slack for stability. On Fri, Feb 12, 2016 at 6:36 AM Harry Metskewrote: > We don't want to use Docker (yet) in this environment, so DockerContainerizer > is not an option. > After thinking a bit longer, I tend to agree with Kamil and let the > problem be handled differently. > > Thanks for the amazing fast responses! > > kind regards, > Harry > > > On 12 February 2016 at 12:28, Kamil Chmielewski > wrote: > >> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske > wrote: > >> >> Is there a specific reason why the slave does not first send a TERM >> signal, and if that does not help after a certain timeout, send a KILL >> signal? >> That would give us a chance to cleanup consul registrations (and >> other cleanup). >> >> >> First of all it's wrong that you want to handle memory limit in your app. >> Things like this are outside of its scope. Your app can be lost because >> many different system or hardware failures that you just can't caught. You >> need to let it crash and design your architecture with this in mind. >> Secondly Mesos SIGKILL is consistent with linux OOM killer and it do the >> right thing >> https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586 >> >> Best regards, >> Kamil >> > >
Re: memory limit exceeded ==> KILL instead of TERM (first)
+1 to what kamil said. That is exactly the reason why we designed it that way. Also, the why is included in the status update message. @vinodkone > On Feb 12, 2016, at 6:08 AM, David J. Palaitis> wrote: > > In larger deployments, with many applications, you may not always be able to > ask good memory practices from app developers. We've found that reporting > *why* a job was killed, with details of container utilization, is an > effective way of helping app developers get better at mem mgmt. > > The alternative, just having jobs die, incentives bad behaviors. For example, > a hurried job owner may just double memory of the executor, trading slack for > stability. > >> On Fri, Feb 12, 2016 at 6:36 AM Harry Metske wrote: >> We don't want to use Docker (yet) in this environment, so >> DockerContainerizer is not an option. >> After thinking a bit longer, I tend to agree with Kamil and let the problem >> be handled differently. >> >> Thanks for the amazing fast responses! >> >> kind regards, >> Harry >> >> >> On 12 February 2016 at 12:28, Kamil Chmielewski wrote: >>> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske >>> wrote: >> >>> >>> Is there a specific reason why the slave does not first send a TERM >>> signal, and if that does not help after a certain timeout, send a KILL >>> signal? >>> That would give us a chance to cleanup consul registrations (and other >>> cleanup). >>> >>> First of all it's wrong that you want to handle memory limit in your app. >>> Things like this are outside of its scope. Your app can be lost because >>> many different system or hardware failures that you just can't caught. You >>> need to let it crash and design your architecture with this in mind. >>> Secondly Mesos SIGKILL is consistent with linux OOM killer and it do the >>> right thing >>> https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586 >>> >>> Best regards, >>> Kamil
Re: memory limit exceeded ==> KILL instead of TERM (first)
David, that's exactly the scenario I am afraid of, developers specifying way too large memory requirements, just to make sure their tasks don't get killed. Any suggestions on how to report this *why* to the developers, as far as I know the only place where you find the reason is in the logfile of the slave, the UI only tells the task failed, not the reason. (we could put some logfile monitoring in place picking up these messages of course, but if there are better ways, we are always interested) kind regards, Harry On 12 February 2016 at 15:08, David J. Palaitiswrote: > In larger deployments, with many applications, you may not always be able > to ask good memory practices from app developers. We've found that > reporting *why* a job was killed, with details of container utilization, is > an effective way of helping app developers get better at mem mgmt. > > The alternative, just having jobs die, incentives bad behaviors. For > example, a hurried job owner may just double memory of the executor, > trading slack for stability. > > On Fri, Feb 12, 2016 at 6:36 AM Harry Metske > wrote: > >> We don't want to use Docker (yet) in this environment, so DockerContainerizer >> is not an option. >> After thinking a bit longer, I tend to agree with Kamil and let the >> problem be handled differently. >> >> Thanks for the amazing fast responses! >> >> kind regards, >> Harry >> >> >> On 12 February 2016 at 12:28, Kamil Chmielewski >> wrote: >> >>> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske >> wrote: >> >>> >>> Is there a specific reason why the slave does not first send a TERM >>> signal, and if that does not help after a certain timeout, send a KILL >>> signal? >>> That would give us a chance to cleanup consul registrations (and >>> other cleanup). >>> >>> >>> First of all it's wrong that you want to handle memory limit in your >>> app. Things like this are outside of its scope. Your app can be lost >>> because many different system or hardware failures that you just can't >>> caught. You need to let it crash and design your architecture with this in >>> mind. >>> Secondly Mesos SIGKILL is consistent with linux OOM killer and it do the >>> right thing >>> https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586 >>> >>> Best regards, >>> Kamil >>> >> >>
Re: memory limit exceeded ==> KILL instead of TERM (first)
hey Harry, As Vinod said, the mesos-slave/agent will issue a status update about the OOM condition. This will be received by the scheduler of the framework. In the storm-mesos framework we just log the messages (see below), but you might consider somehow exposing these messages directly to the app owners: Received status update: {"task_id":"TASK_ID","slave_id":"20150806-001422-1801655306-5050-22041-S65","state":"TASK_FAILED","message":"Memory limit exceeded: Requested: 2200MB Maximum Used: 2200MB\n\nMEMORY STATISTICS: \ncache 20480\nrss 1811943424\nmapped_file 0\npgpgin 8777434\npgpgout 8805691\nswap 96878592\ninactive_anon 644186112\nactive_anon 1357594624\ninactive_file 20480\nactive_file 0\nunevictable 0\nhierarchical_memory_limit 2306867200\nhierarchical_memsw_limit 9223372036854775807\ntotal_cache 20480\ntotal_rss 1811943424\ntotal_mapped_file 0\ntotal_pgpgin 8777434\ntotal_pgpgout 8805691\ntotal_swap 96878592\ntotal_inactive_anon 644186112\ntotal_active_anon 1355497472\ntotal_inactive_file 20480\ntotal_active_file 0\ntotal_unevictable 0"} - Erik On Fri, Feb 12, 2016 at 10:24 AM, Harry Metskewrote: > David, > > that's exactly the scenario I am afraid of, developers specifying way too > large memory requirements, just to make sure their tasks don't get killed. > Any suggestions on how to report this *why* to the developers, as far as I > know the only place where you find the reason is in the logfile of the > slave, the UI only tells the task failed, not the reason. > > (we could put some logfile monitoring in place picking up these messages > of course, but if there are better ways, we are always interested) > > kind regards, > Harry > > > On 12 February 2016 at 15:08, David J. Palaitis < > david.j.palai...@gmail.com> wrote: > >> In larger deployments, with many applications, you may not always be able >> to ask good memory practices from app developers. We've found that >> reporting *why* a job was killed, with details of container utilization, is >> an effective way of helping app developers get better at mem mgmt. >> >> The alternative, just having jobs die, incentives bad behaviors. For >> example, a hurried job owner may just double memory of the executor, >> trading slack for stability. >> >> On Fri, Feb 12, 2016 at 6:36 AM Harry Metske >> wrote: >> >>> We don't want to use Docker (yet) in this environment, so >>> DockerContainerizer >>> is not an option. >>> After thinking a bit longer, I tend to agree with Kamil and let the >>> problem be handled differently. >>> >>> Thanks for the amazing fast responses! >>> >>> kind regards, >>> Harry >>> >>> >>> On 12 February 2016 at 12:28, Kamil Chmielewski >>> wrote: >>> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske >>> wrote: >>> Is there a specific reason why the slave does not first send a TERM signal, and if that does not help after a certain timeout, send a KILL signal? That would give us a chance to cleanup consul registrations (and other cleanup). First of all it's wrong that you want to handle memory limit in your app. Things like this are outside of its scope. Your app can be lost because many different system or hardware failures that you just can't caught. You need to let it crash and design your architecture with this in mind. Secondly Mesos SIGKILL is consistent with linux OOM killer and it do the right thing https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586 Best regards, Kamil >>> >>> >
Re: memory limit exceeded ==> KILL instead of TERM (first)
2016-02-12 19:41 GMT+01:00 Erik Weathers: > hey Harry, > > As Vinod said, the mesos-slave/agent will issue a status update about the > OOM condition. This will be received by the scheduler of the framework. > In the storm-mesos framework we just log the messages (see below), but you > might consider somehow exposing these messages directly to the app owners: > > Received status update: > {"task_id":"TASK_ID","slave_id":"20150806-001422-1801655306-5050-22041-S65","state":"TASK_FAILED","message":"Memory > limit exceeded: Requested: 2200MB Maximum Used: 2200MB\n\nMEMORY > STATISTICS: \ncache 20480\nrss 1811943424\nmapped_file 0\npgpgin > 8777434\npgpgout 8805691\nswap 96878592\ninactive_anon > 644186112\nactive_anon 1357594624\ninactive_file 20480\nactive_file > 0\nunevictable 0\nhierarchical_memory_limit > 2306867200\nhierarchical_memsw_limit 9223372036854775807\ntotal_cache > 20480\ntotal_rss 1811943424\ntotal_mapped_file 0\ntotal_pgpgin > 8777434\ntotal_pgpgout 8805691\ntotal_swap 96878592\ntotal_inactive_anon > 644186112\ntotal_active_anon 1355497472\ntotal_inactive_file > 20480\ntotal_active_file 0\ntotal_unevictable 0"} > > - > Marathon also presents this information. Developers will see it on Debug tab in Last Task Failure Section. Best Regards, Kamil
Re: memory limit exceeded ==> KILL instead of TERM (first)
>> we could put some logfile monitoring in place picking up these messages of course that's about what we came up with. >> the mesos-slave/agent will issue a status update about the OOM condition. ok. definitely missed that one - this will help alot. thanks @vinod On Fri, Feb 12, 2016 at 2:41 PM, Harry Metskewrote: > Yup, I just noticed it's there :-) > > tx, > Harry > > > On 12 February 2016 at 20:38, Kamil Chmielewski > wrote: > >> 2016-02-12 19:41 GMT+01:00 Erik Weathers : >> >>> hey Harry, >>> >>> As Vinod said, the mesos-slave/agent will issue a status update about >>> the OOM condition. This will be received by the scheduler of the >>> framework. In the storm-mesos framework we just log the messages (see >>> below), but you might consider somehow exposing these messages directly to >>> the app owners: >>> >>> Received status update: >>> {"task_id":"TASK_ID","slave_id":"20150806-001422-1801655306-5050-22041-S65","state":"TASK_FAILED","message":"Memory >>> limit exceeded: Requested: 2200MB Maximum Used: 2200MB\n\nMEMORY >>> STATISTICS: \ncache 20480\nrss 1811943424\nmapped_file 0\npgpgin >>> 8777434\npgpgout 8805691\nswap 96878592\ninactive_anon >>> 644186112\nactive_anon 1357594624\ninactive_file 20480\nactive_file >>> 0\nunevictable 0\nhierarchical_memory_limit >>> 2306867200\nhierarchical_memsw_limit 9223372036854775807\ntotal_cache >>> 20480\ntotal_rss 1811943424\ntotal_mapped_file 0\ntotal_pgpgin >>> 8777434\ntotal_pgpgout 8805691\ntotal_swap 96878592\ntotal_inactive_anon >>> 644186112\ntotal_active_anon 1355497472\ntotal_inactive_file >>> 20480\ntotal_active_file 0\ntotal_unevictable 0"} >>> >>> - >>> >> >> Marathon also presents this information. Developers will see it on Debug >> tab in Last Task Failure Section. >> >> Best Regards, >> Kamil >> > >