Re: [Nagios-users] Alerting
On Thu, Aug 22, 2013 at 1:26 AM, Charles Rice cr...@akassociates911.com wrote: you need to put in the config files of the nodes connected to the switch that the switch is a parent device. I do not have the syntax in front of me, but I think it is just parentdevice name It's parents, just for the sake of completeness. -- Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Alerting
Now, I know this can be done but here is the question. Say our core switch goes down, I obviously don't want to be alerted about every single device that has subsequently gone down as well. I know since the core is down, everything is down. How do I setup these types of relationships so alerting is dependent other another object being up. I would get alerts because we are using a SMS gateway and it would be able to send SMS messages to our cellphones. Thanks *-- Jeremy L. Gibbs* Systems Administrator / Network Engineer Utica College IITS -- Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Alerting
you need to put in the config files of the nodes connected to the switch that the switch is a parent device. I do not have the syntax in front of me, but I think it is just parentdevice name Yours, Charles Rice 911 Specialist This message (and any files transmitted with it) is intended only for the use of the person or organization to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by email or telephone and delete the original message immediately. Thank you. From: Jeremy Gibbs [jlgi...@utica.edu] Sent: Wednesday, August 21, 2013 6:19 PM To: nagios-users Subject: [Nagios-users] Alerting Now, I know this can be done but here is the question. Say our core switch goes down, I obviously don't want to be alerted about every single device that has subsequently gone down as well. I know since the core is down, everything is down. How do I setup these types of relationships so alerting is dependent other another object being up. I would get alerts because we are using a SMS gateway and it would be able to send SMS messages to our cellphones. Thanks -- Jeremy L. Gibbs Systems Administrator / Network Engineer Utica College IITS -- Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Alerting based on past-to-current trends?
On 6 December 2010 19:02, Ian Ehrenwald iehrenw...@tripadvisor.com wrote: Hello I was wondering if there was a straight-forward way to alert based on an average of past data plus a current perfdata entry. I understand I'm not explaining it very well that way, so here is the real-world example I am working with - I am polling a set of machines via SNMP for CPU load every 1 minute (looking at hrProcessorLoad). If the return value is at or above 95%, send out a WARNING. If the return value is 98% or above, send out a CRITICAL. The problem here is that it's OK for a process to take up 100% CPU for multiple seconds, and sometimes that high CPU usage coincides with the SNMP %CPU query, so I get a lot of false alerts. Is there a way to use past perfdata in conjunction with the current returned data to generate an average and send a WARNING or CRITICAL based on that new number? I only care to get alerted from Nagios if, for example, the %CPU has been at 100% for 5 minutes. Or am I just way over-thinking this and should be monitoring 1m, 5m, 15m UNIX load averages (which doesn't seem that accurate anyway)? What are other people doing to monitor CPU usage and alert on abnormal long periods of utilization? Nagios will alert as soon as the plugin returns a non-OK status. You can of course configure max_check_attempts and/or first_notification_delay so that Nagios won't send a notification until after a given time, but this won't stop it from appearing on on the web page for problem services straight away. It would be great if you could get Nagios to display only hard status alerts - I don't think you can though, not with ordinary Nagios Core anyway. Some of the third-party Nagios front ends will do it, for example you can configure the icons in NagVis only to display hard alerts. Cheers, Jim -- Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL, new data types, scalar functions, improved concurrency, built-in packages, OCI, SQL*Plus, data movement tools, best practices and more. http://p.sf.net/sfu/oracle-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Alerting based on past-to-current trends?
On 10 December 2010 18:43, Rick Carter rick.car...@umich.edu wrote: Hi Jim, I'm wondering if load average would get you where you want to be, as in a lot of cases, a CPU busy might not be a big deal unless the run queue is growing. My nagios-fu isn't good enough to tell you how to get that, but when I saw your message, I thought right away of the linux/unix: $ uptime 13:41 up 2 days, 18:11, 2 users, load averages: 0.31 0.25 0.24 Where the 2nd load average is the 5-minute one. - Rick Good point Rick, there is a check_load plugin, and you could indeed set appropriate thresholds to make it concentrate on the 15-minute value rather than the 5-minute or 1-minute values. As to what 'load' actually means I'm not 100% sure. I've read http://www.teamquest.com/resources/gunther/display/5/index.htm a few times, and think it helps a bit! I even bought Gunther's book Guerilla Capacity Planning but confess I haven't read anywhere near all of it. I seem to recall reading somewhere that as a general rule of thumb if load is 2 * the number of cpus, it's probably affecting performance. Certainly on my own Nagios server with 4 CPUs I find it's struggling whenever load is consistently 10. Cheers, Jim -- Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL, new data types, scalar functions, improved concurrency, built-in packages, OCI, SQL*Plus, data movement tools, best practices and more. http://p.sf.net/sfu/oracle-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Alerting behavior at beginning of timeperiod
I have several processes that are started each morning from cron and then run until the early evening and are then killed. For example, every weekday at 8:00am, a daemon is started and it runs until 6:30pm. A timeperiod of this particular process has been created so between 08:00 and 18:30, nagios uses nrpe to check to make sure the process is in the process list and if not, it sends out an alert. For the most part it works exactly as expected with the exception of the alert that is thrown in the morning. I have been getting an alert each day that is timestamped a couple of seconds after 8:00am (Today was sent out at 8:00:06) My guess as to what happens is that at exactly 8am the first check is done and the process might not have been fully started, or cron started it a few seconds after the check is done. However, I have nagios setup so that normal checks are scheduled to be performed every 5 minutes. If a check fails, another check is scheduled for 1 minute after the first failed check and then if that check fails, an alert is sent out. Nagios appears to be ignoring that. My guess as to what happens is that if the first check at the start of a timeperiod fails, it immediately sends out a alert. The issue seems to have gone away after I changed the timeperiod to begin at 8:01am but I wanted to pick the brain of the community to see if this is an expected behavior or something I need to look into more closely. Many Thanks -- Steven Kreuzer http://www.exit2shell.com/~skreuzer -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Alerting on 100% cpu for a period of time
Hi. Is it possible to alert when a windows host has been running af 100% cpu for, say 20 minutes? -- Lars - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Alerting on 100% cpu for a period of time
Anything is _possible_. :) The smart folks on this list can probably suggest something better, but one option would be to have a sar process logging cpu usage to a file and then an NRPE check to look at the values in that file. On Tue, Aug 5, 2008 at 4:19 AM, Lars Jørgensen [EMAIL PROTECTED] wrote: Hi. Is it possible to alert when a windows host has been running af 100% cpu for, say 20 minutes? -- Lars - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Richard Quintin, DBA Database Application Administration Virginia Tech - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Alerting on 100% cpu for a period of time
Lars, I am not sure if this is what you are looking for. We use NSclient++ to monitor windows hosts and this is our setup for monitoring the Cpu Load on windows hosts. #Command Definition for CPU Load (were $USER7 is a macro for the pwd to #access the windows host): define command{ command_namecheck_nt_cpu command_line$USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v CPULOAD -l $ARG1$ -s $USER7$ } #This service definition will generate a critical alert if the 10-minute CPU #load is 90% or more or a warning alert if the 10-minute load is 80% or #greater. Just change the 10,80,90 as you please to fit your monitoring. Service definition: define service { use generic-service host_name thehost001 service_description Cpu servicegroups cpu-load check_command check_nt_cpu!10,80,90 Hope it helps. Thanks, Palle -Original Message- From: [EMAIL PROTECTED] [mailto:nagios-users- [EMAIL PROTECTED] On Behalf Of Richard Quintin Sent: Tuesday, August 05, 2008 8:23 AM To: Lars Jørgensen Cc: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Alerting on 100% cpu for a period of time Anything is _possible_. :) The smart folks on this list can probably suggest something better, but one option would be to have a sar process logging cpu usage to a file and then an NRPE check to look at the values in that file. On Tue, Aug 5, 2008 at 4:19 AM, Lars Jørgensen [EMAIL PROTECTED] wrote: Hi. Is it possible to alert when a windows host has been running af 100% cpu for, say 20 minutes? -- Lars - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Richard Quintin, DBA Database Application Administration Virginia Tech - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Alerting on 100% cpu for a period of time
We alert on our bandwidth this way. We set the alerts to re-check(re-try interval) every min for 10 min. If the pipe is still full then we alert. Hope this helps. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Richard Quintin Sent: Tuesday, August 05, 2008 7:23 AM To: Lars Jørgensen Cc: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Alerting on 100% cpu for a period of time Anything is _possible_. :) The smart folks on this list can probably suggest something better, but one option would be to have a sar process logging cpu usage to a file and then an NRPE check to look at the values in that file. On Tue, Aug 5, 2008 at 4:19 AM, Lars Jørgensen [EMAIL PROTECTED] wrote: Hi. Is it possible to alert when a windows host has been running af 100% cpu for, say 20 minutes? -- Lars - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Richard Quintin, DBA Database Application Administration Virginia Tech - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Alerting on 100% cpu for a period of time
Good morning! We do this by setting the retry_interval to 2 minutes and the max_retries to 10. This means that the service has to be in a non-OK state for 20 minutes straight before it enters a hard status and starts alerting. -Jake -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Lars Jørgensen Sent: Tuesday, August 05, 2008 4:20 AM To: 'nagios-users@lists.sourceforge.net' Subject: [Nagios-users] Alerting on 100% cpu for a period of time Hi. Is it possible to alert when a windows host has been running af 100% cpu for, say 20 minutes? -- Lars - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Alerting on 100% cpu for a period of time
We alert on our bandwidth this way. We set the alerts to re-check(re-try interval) every min for 10 min. If the pipe is still full then we alert. Hope this helps. It sure does, that is both simple and elegant. And I can still do it by SNMP, I think. -- Lars - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Alerting on a Percentage of Threshold of a groupbeing down
-Original Message- From: [EMAIL PROTECTED] [mailto:nagios-users- [EMAIL PROTECTED] On Behalf Of Nick Sent: Wednesday, June 20, 2007 4:37 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Alerting on a Percentage of Threshold of a groupbeing down Hi, Was wondering if there is a way with nagios to report on a percentage or below a threshold of group being unavailable? For example if i have a 100 web servers but i only want to know if more than 30% of them are unreachable or if more than 30 of them are unreachable. http://nagios.sourceforge.net/docs/2_0/clusters.html -- Marc - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] alerting flakey
hello. I've been using nagios for a couple of months now pretty successfully, but I've noticed that the alerting function is a bit flakey. I've been over the configuration many times, but everything seems fine. The amount of alerting it does seems to change after I restart the service with /etc/init.d/nagios restart. It was sending warning and criticals. Then, after a restart, it wasn't sending service critical alerts. Then I restarted it again. It wasn't sending anything. Then I restarted it again, and it was sending warnings. I'm using version 2.6 which I got from the CVS tree a couple of months ago. Can anybody give me a little help on this one? The alert just calls a script I wrote by hand which is referenced in the commands.cfg . I don't use the groups or anything. No alert attempt is showing up in the event log either. Thanks - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] alerting flakey
On 07/03/07, Ezra Radoff [EMAIL PROTECTED] wrote: hello. I've been using nagios for a couple of months now pretty successfully, but I've noticed that the alerting function is a bit flakey. I've been over the configuration many times, but everything seems fine. The amount of alerting it does seems to change after I restart the service with /etc/init.d/nagios restart. It was sending warning and criticals. Then, after a restart, it wasn't sending service critical alerts. Then I restarted it again. It wasn't sending anything. Then I restarted it again, and it was sending warnings. It's not because the hosts or services are flapping is it? - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] alerting flakey
No, but I'm thinking now that it's always sending warnings and never sending criticals. It's not flapping. We had a server down for hours. It wasn't sending the warnings because it only does it after four. I think that part has been consistant. In the service def it looks like all four states are configured for sending alerts. I don't get it. -Original Message- From: [EMAIL PROTECTED] on behalf of Jim Avery Sent: Wed 3/7/2007 2:48 AM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] alerting flakey On 07/03/07, Ezra Radoff [EMAIL PROTECTED] wrote: hello. I've been using nagios for a couple of months now pretty successfully, but I've noticed that the alerting function is a bit flakey. I've been over the configuration many times, but everything seems fine. The amount of alerting it does seems to change after I restart the service with /etc/init.d/nagios restart. It was sending warning and criticals. Then, after a restart, it wasn't sending service critical alerts. Then I restarted it again. It wasn't sending anything. Then I restarted it again, and it was sending warnings. It's not because the hosts or services are flapping is it? - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] alerting flakey
On 07/03/07, Ezra Radoff [EMAIL PROTECTED] wrote: No, but I'm thinking now that it's always sending warnings and never sending criticals. It's not flapping. We had a server down for hours. It wasn't sending the warnings because it only does it after four. I think that part has been consistant. In the service def it looks like all four states are configured for sending alerts. I don't get it. Whether a critical alert gets generated or not can depend on the notification_options in the service definition, the host definition and/or the contact definition. Whether notifications are generated at all can depend on notification_enabled in the host or service definition, on the timeperiod in the contact definition, globally in the nagios configuration and it can be dynamically enabled/disabled for hosts, services and for nagios as a whole. My guess is that it might be something quite simple in the notification_options somewhere. See http://nagios.sourceforge.net/docs/2_0/notifications.html Another option worth trying is check_for_orphaned_services in your main nagios.cfg file. See: http://nagios.sourceforge.net/docs/2_0/configmain.html#check_for_orphaned_services Cheers, Jim - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] alerting flakey
OK. It's definatly none of those. take a look. define service{ use local-service hostgroup_name cisco_routers service_description Cisco_load check_command check_snmp_load_cisco!cisco!90,80,60!100,100,100 } ## define service{ namelocal-service ; The name of this service template use generic-service ; Inherit default values from the generic-service definition check_period24x7; The service can be checked at any time of the day max_check_attempts 4 ; Re-check the service up to 4 times in order to determine its final (hard) state normal_check_interval 5 ; Check the service every 5 minutes under normal conditions retry_check_interval1 ; Re-check the service every minute until a hard state can be determined contact_groups admins ; Notifications get sent out to everyone in the 'admins' group notification_optionsw,u,c,r ; Send notifications about warning, unknown, critical, and recovery events notification_interval 60 ; Re-notify about service problems every hour notification_period 24x7; Notifications can be sent out at any time register0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! } ## Whether a critical alert gets generated or not can depend on the notification_options in the service definition, the host definition and/or the contact definition. Whether notifications are generated at all can depend on notification_enabled in the host or service definition, on the timeperiod in the contact definition, globally in the nagios configuration and it can be dynamically enabled/disabled for hosts, services and for nagios as a whole. My guess is that it might be something quite simple in the notification_options somewhere. See http://nagios.sourceforge.net/docs/2_0/notifications.html Another option worth trying is check_for_orphaned_services in your main nagios.cfg file. See: http://nagios.sourceforge.net/docs/2_0/configmain.html#check_for_orphaned_services Cheers, Jim - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] alerting flakey
sounds like you've seen this before? I did as you advised. stop restart. I don't know what other processes to look for besides the one below. There weren't any running. isk-nagios:/usr/local/nagios/etc # ps -ef | grep nagios nagios 25494 1 0 14:48 ?00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg root 25501 25032 0 14:50 pts/000:00:00 grep nagios -Original Message- From: Santhosh Kumar A [mailto:[EMAIL PROTECTED] Sent: Wed 3/7/2007 6:43 AM To: Ezra Radoff; nagios-users@lists.sourceforge.net Subject: RE: [Nagios-users] alerting flakey From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ezra Radoff Sent: Wednesday, March 07, 2007 1:28 PM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] alerting flakey hello. I've been using nagios for a couple of months now pretty successfully, but I've noticed that the alerting function is a bit flakey. I've been over the configuration many times, but everything seems fine. The amount of alerting it does seems to change after I restart the service with /etc/init.d/nagios restart. It was sending warning and criticals. Then, after a restart, it wasn't sending service critical alerts. Then I restarted it again. It wasn't sending anything. Then I restarted it again, and it was sending warnings. check whether multiple nagios daemons running or not . stop nagios and ensure every nagios process is killed then do a start (don't use restart) Santhosh I'm using version 2.6 which I got from the CVS tree a couple of months ago. Can anybody give me a little help on this one? The alert just calls a script I wrote by hand which is referenced in the commands.cfg . I don't use the groups or anything. No alert attempt is showing up in the event log either. Thanks - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] alerting flakey
On 07/03/07, Ezra Radoff [EMAIL PROTECTED] wrote: define service{ use local-service hostgroup_name cisco_routers service_description Cisco_load check_command check_snmp_load_cisco!cisco!90,80,60!100,100,100 } I can't see anything obviously wrong there. I'm not familiar with the check_snmp_load_cisco plugin though. It might be wise to run that manually while logged in as nagios. Make sure it returns with the right exit code and returns the expected output. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null