Re: [Nagios-users] Change Procs Critical threshold
On Nov 19, 2010, at 5:48 PM, C. Bensend wrote: I had another question regarding adjusting these thresholds, this time on localhost. It regards the Current Load parameter, which is giving me a Critical Load average of -- 2.47, 3.43, and 4.06 in localhost.cfg, /usr/local/nagios/etc/objects/localhost.cfg, I have this define service{ use local-service ; Name of service template to use host_name localhost service_description Current Load check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0 } which I actually went and adjusted to : check_command check_local_load!7.0,4.0,3.0!10.0,6.0,4.0 I restarted the Nagios service..but this didn't have any effect -- the status information still reads the same -- Critical Load Average - 2.47, 3.43, 4.06 From the help for check_load (which I'm assuming you're using in the command definition): Usage:check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15 So, in your service definition, you're telling check_load that you want to trigger a critical condition if the 15 minute average is 4.0. Yours is 4.06. So, yes, it's critical. :) WLOAD1 = 1 minute average warning threshold WLOAD5 = 5 minute average warning threshold WLOAD15 = 15 minute average warning threshold CLOAD1 = 1 minute average critical threshold CLOAD5 = 5 minute average critical threshold CLOAD15 = 15 minute average critical threshold If you want your 15 minute average to *not* trigger a critical, you need to adjust that last value (4.0) to something higher. Benny That sounds logical, and this is what I've adjusted: check_command check_local_load!8.0,5.0,4.0!12.0,7.0,6.0 but I've restarted the nagios process and the alert still persists. I dont see anything in nagios.log or /var/log/messages related to this either. what could I be missing? -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Change Procs Critical threshold
From the help for check_load (which I'm assuming you're using in the command definition): Usage:check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15 So, in your service definition, you're telling check_load that you want to trigger a critical condition if the 15 minute average is 4.0. Yours is 4.06. So, yes, it's critical. :) WLOAD1 = 1 minute average warning threshold WLOAD5 = 5 minute average warning threshold WLOAD15 = 15 minute average warning threshold CLOAD1 = 1 minute average critical threshold CLOAD5 = 5 minute average critical threshold CLOAD15 = 15 minute average critical threshold If you want your 15 minute average to *not* trigger a critical, you need to adjust that last value (4.0) to something higher. Benny That sounds logical, and this is what I've adjusted: check_command check_local_load!8.0,5.0,4.0!12.0,7.0,6.0 but I've restarted the nagios process and the alert still persists. I dont see anything in nagios.log or /var/log/messages related to this either. what could I be missing? Kill your Nagios daemon. Now, kill the *other* Nagios daemon you have running. If you make changes to your config file and send Nagios a SIGHUP (or restart it) and the changes don't seem to stick, you might have multiple Nagios daemons running, one with an old config (that still thinks 4.00 is a critical threshold), while the new daemon is receiving the changes you mean to make. This is a common issue, and it's easy to fix. Shut down your daemon via whatever method you have (service nagios stop, pkill, etc). Then, wait 30 seconds or so to allow outstanding service checks to wrap up, and see if there are still Nagios processes hanging around. If they are, kill them too. Wait another 30 seconds, rinse and repeat until there are no more Nagios processes. At that point, restart Nagios. Do your changes take affect now? Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Change Procs Critical threshold
I have a couple of systems that are reporting critical notifications, that when you drill into them, the Service : Total Processes has been triggered. Its showing critical process level of 231, 453, for example. Which on a production server is nothing really --- my question is, how do I change that threshold level to something like 750, or 1500? Perhaps I'm not searching online for the correct term or using the right parlance. Any ideas where this can be modified? If you have a look at the options for the plugin (hint: ./plugin --help), they will reveal their secrets to you. :) Then, using that knowledge, you can adjust your service check to use the appropriate values for your environment. Benny Hey Benny. I had another question regarding adjusting these thresholds, this time on localhost. It regards the Current Load parameter, which is giving me a Critical Load average of -- 2.47, 3.43, and 4.06 in localhost.cfg, /usr/local/nagios/etc/objects/localhost.cfg, I have this define service{ use local-service ; Name of service template to use host_name localhost service_description Current Load check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0 } which I actually went and adjusted to : check_command check_local_load!7.0,4.0,3.0!10.0,6.0,4.0 I restarted the Nagios service..but this didn't have any effect -- the status information still reads the same -- Critical Load Average - 2.47, 3.43, 4.06 Is there something else I'm missing? -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Change Procs Critical threshold
I had another question regarding adjusting these thresholds, this time on localhost. It regards the Current Load parameter, which is giving me a Critical Load average of -- 2.47, 3.43, and 4.06 in localhost.cfg, /usr/local/nagios/etc/objects/localhost.cfg, I have this define service{ use local-service ; Name of service template to use host_name localhost service_description Current Load check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0 } which I actually went and adjusted to : check_command check_local_load!7.0,4.0,3.0!10.0,6.0,4.0 I restarted the Nagios service..but this didn't have any effect -- the status information still reads the same -- Critical Load Average - 2.47, 3.43, 4.06 From the help for check_load (which I'm assuming you're using in the command definition): Usage:check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15 So, in your service definition, you're telling check_load that you want to trigger a critical condition if the 15 minute average is 4.0. Yours is 4.06. So, yes, it's critical. :) WLOAD1 = 1 minute average warning threshold WLOAD5 = 5 minute average warning threshold WLOAD15 = 15 minute average warning threshold CLOAD1 = 1 minute average critical threshold CLOAD5 = 5 minute average critical threshold CLOAD15 = 15 minute average critical threshold If you want your 15 minute average to *not* trigger a critical, you need to adjust that last value (4.0) to something higher. Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Change Procs Critical threshold
On Nov 5, 2010, at 6:38 PM, C. Bensend wrote: I have a couple of systems that are reporting critical notifications, that when you drill into them, the Service : Total Processes has been triggered. Its showing critical process level of 231, 453, for example. Which on a production server is nothing really --- my question is, how do I change that threshold level to something like 750, or 1500? Perhaps I'm not searching online for the correct term or using the right parlance. Any ideas where this can be modified? If you have a look at the options for the plugin (hint: ./plugin --help), they will reveal their secrets to you. :) Then, using that knowledge, you can adjust your service check to use the appropriate values for your environment. Benny Benny - thanks so much for replying. I was able to find the necessary parameters to adjust the check_procs plugin, from there I see that I can modify this in the monitored system's nrpe.cfg file. So I edited nrpe.cfg relevant line for check_procs to read as: command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 1000 -c 1500 then I restart munin-node: /etc/rc.d/init.d/munin-node restart Now I just got to figure out my other Nagios issues. Maybe another thread later. Thanks again Benny, and of course to the entire Nagios User list. Such a great resource -- The Next 800 Companies to Lead America's Growth: New Video Whitepaper David G. Thomson, author of the best-selling book Blueprint to a Billion shares his insights and actions to help propel your business during the next growth cycle. Listen Now! http://p.sf.net/sfu/SAP-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Change Procs Critical threshold
I have a couple of systems that are reporting critical notifications, that when you drill into them, the Service : Total Processes has been triggered. Its showing critical process level of 231, 453, for example. Which on a production server is nothing really --- my question is, how do I change that threshold level to something like 750, or 1500? Perhaps I'm not searching online for the correct term or using the right parlance. Any ideas where this can be modified? If you have a look at the options for the plugin (hint: ./plugin --help), they will reveal their secrets to you. :) Then, using that knowledge, you can adjust your service check to use the appropriate values for your environment. Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- The Next 800 Companies to Lead America's Growth: New Video Whitepaper David G. Thomson, author of the best-selling book Blueprint to a Billion shares his insights and actions to help propel your business during the next growth cycle. Listen Now! http://p.sf.net/sfu/SAP-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null