Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
I'm continuing to iron out the wrinkles with 3.5.1 and distributed monitoring. I'm using mod_gearman to submit and receive events from two distributed pollers. Every now and again, I'll get something similar in the log on the centralized collecting machine: CRITICAL: Return code of 127 is out of bounds. Make sure the plugin youre trying to run actually exists. (worker: collector.domain.org) To me, that suggests that the collector system didn't get a result for a host or service in a timely manner from one of the polling systems, and so it attempted to run an active check itself. However, it doesn't seem to be able to, and I don't know why. The collector has the same value for $USER1$, and it has the same set of plugins installed on it: On the collector: grep USER1 etc/resource.cfg $USER1$=/usr/local/nagios/libexec On the two pollers: $USER1$=/usr/local/nagios/libexec $USER1$=/usr/local/nagios/libexec The plugins are installed in identical locations on all three systems, that's enforced via Puppet. The 'nagios' user can find and run them on the collector: /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 NRPE v2.13 Now, because this is a distributed setup, the collector system is not configured to run active checks: grep ^execute etc/nagios.cfg execute_service_checks=0 execute_host_checks=0 ... but *obviously* it's trying to. Is it failing because it's configured to not run them? If that's the case, the error message is not accurate and should be corrected. If that's *not* the case, why can't my collector server run an active check when it believes it needs to? I use NConf to generate my configurations, if that matters. There are a *lot* of hosts/services and quite a few configuration files, so I'm not going to paste a slew of information here. If I'm missing pertinent information, please let me know exactly what you want to see and I'll get it. No one has an idea about this? And no, Andreas, I can't move to 4.0 yet. ;) Thanks! Benny -- No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head. -- #22 on Peter Anspach's Evil Overlord list -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
Do you get many of those error messages in the logs at once, or just one at a time? Only one thought: what are the permissions on your $USER$ variables? Nagios on my systems setuid() to nonroot after startup, and if it gets SIGHUP to reload config, but can't read the file defining $USER*$, will act strangely. Just one at a time, seemingly randomly. A host here, a service there, several times a day. They always almost immediately recover, but I don't understand why my centralized collector seems to have this issue. Nagios runs as the nagios user, which can read the resource.cfg file fine: ls -ld . ; ls -l nagios-hostname.cfg resource.cfg drwxrwx--- 6 root nagios 4096 Aug 27 16:02 . -rw-r--r-- 1 root root 47606 Jul 1 11:18 nagios-hostname.cfg -rw-r- 1 root nagios 2400 Mar 19 11:25 resource.cfg Thanks! -- No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head. -- #22 on Peter Anspach's Evil Overlord list -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
On 8/22/13 13:51, C. Bensend wrote: CRITICAL: Return code of 127 is out of bounds. Make sure the plugin youre trying to run actually exists. (worker: collector.domain.org) Hi, if this is the collector host, why does it have a mod-gearman worker installed? If nagios would have run the check by itself, there would be no hint about the worker in the error. So it seems like there is a worker started on your collector host which then grabs some checks but isn't able to execute them. Oh ho! I have multiple *gearman* processes running: ps axuww | grep gearman gearmand 5662 0.7 0.1 404672 2496 ?Ssl Aug17 118:29 /usr/sbin/gearmand -d -l /var/log/gearmand/gearmand.log nagios5712 0.0 0.0 38024 640 ?Ss Aug17 1:03 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid nagios 25919 0.0 0.1 137492 3016 ?S07:38 0:00 /usr/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid .. etc .. Are you saying I just need gearmand running on the collector? I'm quite new to gearman, so I might have misunderstood which parts are necessary where. I can easily shut down the mod_gearman_worker service, I just need to understand the consequences. I assumed that this was a Nagios error - perhaps I just have my gearman setup configured wrong. Benny -- No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head. -- #22 on Peter Anspach's Evil Overlord list -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
On 8/28/13 14:43, C. Bensend wrote: Are you saying I just need gearmand running on the collector? Well, i assumed it. You are the only one which really can tell that. You will need a worker on each host which should run checks. If your collector should not run any checks, than no worker is necessary. See http://labs.consol.de/nagios/mod-gearman/#_common_scenarios for a list of common setups. OK, yes, I grok that. I guess I would want the collector to be *able* to run checks, if it doesn't get timely information from the pollers. I'm assuming that's why it's even trying in the first place - it doesn't see a result in a timely manner, so it thinks it should run one. Which circles back to my original question - why can't it run the check? Why isn't it finding what it needs to find? The workers are running as the nagios user, and I don't see anything that appears pertinent in the mod_gearman_worker.conf file... What am I missing? Neither the gearmand.log nor the mod_gearman_worker.log files seem to have any complaints (but I haven't bumped up the debug on them yet). Thanks so much for your help! Benny -- No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head. -- #22 on Peter Anspach's Evil Overlord list -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks
Hey folks, I'm continuing to iron out the wrinkles with 3.5.1 and distributed monitoring. I'm using mod_gearman to submit and receive events from two distributed pollers. Every now and again, I'll get something similar in the log on the centralized collecting machine: CRITICAL: Return code of 127 is out of bounds. Make sure the plugin youre trying to run actually exists. (worker: collector.domain.org) To me, that suggests that the collector system didn't get a result for a host or service in a timely manner from one of the polling systems, and so it attempted to run an active check itself. However, it doesn't seem to be able to, and I don't know why. The collector has the same value for $USER1$, and it has the same set of plugins installed on it: On the collector: grep USER1 etc/resource.cfg $USER1$=/usr/local/nagios/libexec On the two pollers: $USER1$=/usr/local/nagios/libexec $USER1$=/usr/local/nagios/libexec The plugins are installed in identical locations on all three systems, that's enforced via Puppet. The 'nagios' user can find and run them on the collector: /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 NRPE v2.13 Now, because this is a distributed setup, the collector system is not configured to run active checks: grep ^execute etc/nagios.cfg execute_service_checks=0 execute_host_checks=0 ... but *obviously* it's trying to. Is it failing because it's configured to not run them? If that's the case, the error message is not accurate and should be corrected. If that's *not* the case, why can't my collector server run an active check when it believes it needs to? I use NConf to generate my configurations, if that matters. There are a *lot* of hosts/services and quite a few configuration files, so I'm not going to paste a slew of information here. If I'm missing pertinent information, please let me know exactly what you want to see and I'll get it. I'd really appreciate a clue-by-four. Thanks, folks! :) Benny -- No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head. -- #22 on Peter Anspach's Evil Overlord list -- Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Misplaced advice in the Nagios preflight check?
I can't seem to parse It doesn't make sense to get a recovery notification for something you never knew was a problem. Are you saying that since Nagios doesn't consider an unknown a problem, it won't send a recovery? Because it does... And in this case, I certainly want to know when a service having a monitoring issue (unknown) recovers. Not sure what you meant there. Thanks! Benny This is by design, and it is only a warning message. The config is valid and should work as you intended. It doesn't make sense to get a recovery notification for something you never knew was a problem. Unknowns are not considered problems in Nagios logic. On Mon, Jun 10, 2013 at 1:25 PM, Chris Beattie cbeat...@geninfo.com wrote: On 6/7/2013 9:28 AM, C. Bensend wrote: Not real sure why Nagios doesn't think that's a valid config - I want a contact that will receive only UNKNOWN alerts for services. Have you tried giving that contact the extra options Nagios wants, and then defining a service escalation for that contact with the escalation_options directive set to u? -- -Chris -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Misplaced advice in the Nagios preflight check?
see the original language here: http://nagios.sourceforge.net/docs/3_0/notifications.html Note: Notifications about host or service recoveries are only sent out if a notification was sent out for the original problem. It doesn't make sense to get a recovery notification for something you never knew was a problem. And: http://nagios.sourceforge.net/docs/3_0/escalations.html If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery. In this case, the nt-admins and managers contact groups would be notified of the recovery. (Although, I believe I've either misunderstood the implications of that statement, or run into misbehaviours in that area myself...) Ah. Well, yes. :) I believe those statements are referring to the filters that Nagios uses to determine whether or not to send a notification at all. *That's* not an issue here, the notification goes out, just like it should. *My* question is why the sanity check thinks that configuration doesn't make sense. I think the answer is probably something to the effect of: I don't know why anyone would want that, so warn about it. I don't want to put words in the mouth of any of the developers that may have touched it, though, so I'm just guessing. I just want to make sure this is a case of Nagios maybe not giving the right advice in its sanity check, and *not* that there's something behind the scenes that I'm not aware of that might actually cause a problem. If it's the former, maybe we can get it adjusted for the next release. If it's the latter, I hope someone will step forth with the ClueBat 5000(tm) and give me a good thump. :) Thanks, everyone! Benny -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Misplaced advice in the Nagios preflight check?
Yep, I've had that one enabled for quite some time. :) There is workaround this is how I fixed in our environment use_large_installation_tweaks=1 in nagios.cfg see whether this helps this removes the warning for you Regards Sunil On Tue, Jun 11, 2013 at 9:52 PM, Justin T Pryzby just...@norchemlab.comwrote: On Tue, Jun 11, 2013 at 11:12:23AM -0500, C. Bensend wrote: I can't seem to parse It doesn't make sense to get a recovery notification for something you never knew was a problem. see the original language here: http://nagios.sourceforge.net/docs/3_0/notifications.html Note: Notifications about host or service recoveries are only sent out if a notification was sent out for the original problem. It doesn't make sense to get a recovery notification for something you never knew was a problem. And: http://nagios.sourceforge.net/docs/3_0/escalations.html If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery. In this case, the nt-admins and managers contact groups would be notified of the recovery. (Although, I believe I've either misunderstood the implications of that statement, or run into misbehaviours in that area myself...) -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Regards Sunil Sankar -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Misplaced advice in the Nagios preflight check?
Have you tried giving that contact the extra options Nagios wants, and then defining a service escalation for that contact with the escalation_options directive set to u? No, I haven't. It *seems* to be working as I intend. My question is more as to why Nagios seems to think it's a bad idea, when it's a perfectly legitimate configuration. Are there unforeseen consequences that I'm not aware of? Or was it just not a configuration anyone thought would be useful/valid, so it is warned about? -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Misplaced advice in the Nagios preflight check?
Hey folks, Still ironing out the wrinkles in my 3.5.0 distributed environment. Yesterday, I added a new contact, and on preflight check it seemed to think that what I did wasn't smart: Jun 6 15:11:02 hostname nagios: Warning: Service recovery notification option for contact 'cbensend-unknown-only' doesn't make any sense - specify critical and/or warning options as well Here's the contact I added that it seems to think is a dumb idea: define contact { contact_namecbensend-unknown-only alias C. Bensend - unknown alerts only host_notification_options n service_notification_optionsu,r email m...@myjob.com host_notification_period24x7 service_notification_period 24x7 host_notification_commands notify-host-by-email service_notification_commands notify-service-by-email } Not real sure why Nagios doesn't think that's a valid config - I want a contact that will receive only UNKNOWN alerts for services. Perfectly valid idea to me; I have a number of services that I truly do not give a crap about, they trip many times a day and are critical for some developers, but I don't do anything about them. I *do*, however, want to know if there's a problem monitoring them, hence the need to see UNKNOWN alerts and recoveries. Is there some reason Nagios would think that's not valid? Or should it not complain about that? Just curious... It loaded the config and the contact exists, just not entirely convinced it's a valid complaint. :) Thanks much! Benny -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring
Ran as nagios user and please find the details below. ( iptables Stopped) [nagios@server ~]$ /usr/bin/sudo /sbin/iptables -nvL | /bin/grep 'Chain' | /bin/awk '{ print $2 }'| /bin/grep Cid | /usr/bin/wc -l| echo $? 0 That 'echo $?' was supposed to be on the next line, not a continuation of the command. Can you run that again, but as two separate commands, one right after the other? I want to see the result of your first command (the iptables one). [nagios@server ~]$ /usr/bin/sudo /sbin/iptables -nvL Chain INPUT (policy ACCEPT 9089 packets, 3303K bytes) pkts bytes target prot opt in out source destination Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 7812 packets, 3436K bytes) pkts bytes target prot opt in out source destination [nagios@server ~]$ I'm assuming server == zurich, right? I wonder if you can cut out the first grep and awk, and just look for 'Cid' ? -Original Message- From: C. Bensend [mailto:be...@bennyvision.com] Sent: Thursday, 30 May 2013 8:44 PM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring I'm assuming that this check is running *on* the host 'zurich'? /var/log/secure should be listing an entry, if sudo is being run. Manually, *as the nagios user*, what happens when you do the following? /usr/bin/sudo /sbin/iptables -nvL | /bin/grep 'Chain' | \ /bin/awk '{ print $2 }'| /bin/grep Cid | /usr/bin/wc -l echo $? How about just (again, as the nagios user): /usr/bin/sudo /sbin/iptables -nvL Please find the details Sudoers Definition:- nagios zurich= NOPASSWD: /sbin/iptables, /usr/local/nagios/libexec/check_iptables.sh, /usr/local/nagios/libexec/check_nrpe /var/log/secure: su: pam_unix(su:session): session opened for user nagios by root(uid=0) su: pam_unix(su:session): session closed for user nagios -Original Message- From: C. Bensend [mailto:be...@bennyvision.com] Sent: Wednesday, 29 May 2013 7:59 PM To: nagios-users@lists.sourceforge.netmailto:nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Where's your sudoers definition that allows the nagios user to run any commands via sudo? And what does /var/log/secure (or equivalent) think about the nagios user trying to run sudo? I have tested with nagios user as well.. still no luck with that. Could you some one update if you have any solution on this case. Kind Regards, Thilak From: Deborah Martin [mailto:deborah.mar...@kognitio.com] Sent: Tuesday, 14 May 2013 7:30 PM To: Nagios Users List Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Ok - if I look at your output, manually, when the plugin is run as the root user it produces the correct result. But, you haven't said what the nrpe user is that is running on the remote node and whether the same manual run of the check produces the same output. For example, I run remote plugins through nrpe as the nagios user so if I want to manually test a plugin on the remote node, I would first login as the nagios user to ensure I've got the same environment that would be used when running via nrpe. It might be that the variables you have set in the script only work as the root user. It's never a good idea to test as the root user but only as the same user as that used by nagios or nrpe. Regards, Deborah From: Thilakraj.Shanmugam [mailto:thilakraj.shanmu...@canberra.edu.au] Sent: 14 May 2013 09:58 To: Nagios Users List Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Hi Deborah, Thanks for the response.. please find the details below. [root@abc libexec]# pwd /usr/local/nagios/libexec [root@abc libexec]# ./check_iptables.sh - Executing manually script + IPT=/sbin/iptables + GREP=/bin/grep + AWK=/bin/awk + EXPR=/usr/bin/expr + WC=/usr/bin/wc + A=/usr/bin/sudo + E_SUCCESS=0 + E_CRITICAL=2 + E_UNKNOWN=3 ++ /usr/bin/sudo /sbin/iptables -nvL ++ /bin/grep Chain ++ /bin/awk '{ print $2 }' ++ /bin/grep Cid ++ /usr/bin/wc -l + CHAINS=5 + '[' 5 -ne 0 ']' + echo 'Firewall is running!' Firewall is running! + exit 0 -- it shows firewall running ( correct output ) [root@abc libexec]# Client - NRPE config file [root@abc libexec]# cat /usr/local/nagios/etc/nrpe.cfg |grep -i iptable command[check_iptables]=/usr/local/nagios/libexec/check_iptables.sh [root@abc libexec]# [root@abc libexec]# ./check_nrpe -H localhost -c check_iptables Firewall is not running - executing via check_nrpe ( wrong output ) [root@abc libexec]# NRPE Logs - May 14 18:52:28 abc nrpe[31158]: Added command[check_Partion_db]=/usr/local/nagios
Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring
OK. So, what differs when you try that first command when iptables *is* running? Please find the details.. [nagios@server ~]$ /usr/bin/sudo /sbin/iptables -nvL | /bin/grep 'Chain' | /bin/awk '{ print $2 }'| /bin/grep Cid | /usr/bin/wc -l 0 [nagios@server ~]$ /usr/bin/sudo /sbin/iptables -nvL | /bin/grep Cid | /usr/bin/wc -l 0 [nagios@server ~]$ [nagios@server ~]$ echo $? 0 [nagios@servef ~]$ Yes, Server = zurich -Original Message- From: C. Bensend [mailto:be...@bennyvision.com] Sent: Friday, 31 May 2013 8:05 PM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Ran as nagios user and please find the details below. ( iptables Stopped) [nagios@server ~]$ /usr/bin/sudo /sbin/iptables -nvL | /bin/grep 'Chain' | /bin/awk '{ print $2 }'| /bin/grep Cid | /usr/bin/wc -l| echo $? 0 That 'echo $?' was supposed to be on the next line, not a continuation of the command. Can you run that again, but as two separate commands, one right after the other? I want to see the result of your first command (the iptables one). [nagios@server ~]$ /usr/bin/sudo /sbin/iptables -nvL Chain INPUT (policy ACCEPT 9089 packets, 3303K bytes) pkts bytes target prot opt in out source destination Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 7812 packets, 3436K bytes) pkts bytes target prot opt in out source destination [nagios@server ~]$ I'm assuming server == zurich, right? I wonder if you can cut out the first grep and awk, and just look for 'Cid' ? -Original Message- From: C. Bensend [mailto:be...@bennyvision.com] Sent: Thursday, 30 May 2013 8:44 PM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring I'm assuming that this check is running *on* the host 'zurich'? /var/log/secure should be listing an entry, if sudo is being run. Manually, *as the nagios user*, what happens when you do the following? /usr/bin/sudo /sbin/iptables -nvL | /bin/grep 'Chain' | \ /bin/awk '{ print $2 }'| /bin/grep Cid | /usr/bin/wc -l echo $? How about just (again, as the nagios user): /usr/bin/sudo /sbin/iptables -nvL Please find the details Sudoers Definition:- nagios zurich= NOPASSWD: /sbin/iptables, /usr/local/nagios/libexec/check_iptables.sh, /usr/local/nagios/libexec/check_nrpe /var/log/secure: su: pam_unix(su:session): session opened for user nagios by root(uid=0) su: pam_unix(su:session): session closed for user nagios -Original Message- From: C. Bensend [mailto:be...@bennyvision.com] Sent: Wednesday, 29 May 2013 7:59 PM To: nagios-users@lists.sourceforge.netmailto:nagios-users@lists.sourcefo rge.net Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Where's your sudoers definition that allows the nagios user to run any commands via sudo? And what does /var/log/secure (or equivalent) think about the nagios user trying to run sudo? I have tested with nagios user as well.. still no luck with that. Could you some one update if you have any solution on this case. Kind Regards, Thilak From: Deborah Martin [mailto:deborah.mar...@kognitio.com] Sent: Tuesday, 14 May 2013 7:30 PM To: Nagios Users List Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Ok - if I look at your output, manually, when the plugin is run as the root user it produces the correct result. But, you haven't said what the nrpe user is that is running on the remote node and whether the same manual run of the check produces the same output. For example, I run remote plugins through nrpe as the nagios user so if I want to manually test a plugin on the remote node, I would first login as the nagios user to ensure I've got the same environment that would be used when running via nrpe. It might be that the variables you have set in the script only work as the root user. It's never a good idea to test as the root user but only as the same user as that used by nagios or nrpe. Regards, Deborah From: Thilakraj.Shanmugam [mailto:thilakraj.shanmu...@canberra.edu.au] Sent: 14 May 2013 09:58 To: Nagios Users List Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Hi Deborah, Thanks for the response.. please find the details below. [root@abc libexec]# pwd /usr/local/nagios/libexec [root@abc libexec]# ./check_iptables.sh - Executing manually script + IPT=/sbin/iptables + GREP=/bin/grep + AWK=/bin/awk + EXPR=/usr/bin/expr + WC=/usr/bin/wc + A=/usr/bin/sudo + E_SUCCESS=0 + E_CRITICAL=2 + E_UNKNOWN=3 ++ /usr/bin/sudo /sbin/iptables -nvL /bin/grep Chain /bin/awk '{ ++ print $2 }' ++ /bin/grep Cid ++ /usr/bin/wc -l + CHAINS=5 + '[' 5 -ne 0 ']' + echo 'Firewall is running!' Firewall is running
Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring
I'm assuming that this check is running *on* the host 'zurich'? /var/log/secure should be listing an entry, if sudo is being run. Manually, *as the nagios user*, what happens when you do the following? /usr/bin/sudo /sbin/iptables -nvL | /bin/grep 'Chain' | \ /bin/awk '{ print $2 }'| /bin/grep Cid | /usr/bin/wc -l echo $? How about just (again, as the nagios user): /usr/bin/sudo /sbin/iptables -nvL Please find the details Sudoers Definition:- nagios zurich= NOPASSWD: /sbin/iptables, /usr/local/nagios/libexec/check_iptables.sh, /usr/local/nagios/libexec/check_nrpe /var/log/secure: su: pam_unix(su:session): session opened for user nagios by root(uid=0) su: pam_unix(su:session): session closed for user nagios -Original Message- From: C. Bensend [mailto:be...@bennyvision.com] Sent: Wednesday, 29 May 2013 7:59 PM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Where's your sudoers definition that allows the nagios user to run any commands via sudo? And what does /var/log/secure (or equivalent) think about the nagios user trying to run sudo? I have tested with nagios user as well.. still no luck with that. Could you some one update if you have any solution on this case. Kind Regards, Thilak From: Deborah Martin [mailto:deborah.mar...@kognitio.com] Sent: Tuesday, 14 May 2013 7:30 PM To: Nagios Users List Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Ok - if I look at your output, manually, when the plugin is run as the root user it produces the correct result. But, you haven't said what the nrpe user is that is running on the remote node and whether the same manual run of the check produces the same output. For example, I run remote plugins through nrpe as the nagios user so if I want to manually test a plugin on the remote node, I would first login as the nagios user to ensure I've got the same environment that would be used when running via nrpe. It might be that the variables you have set in the script only work as the root user. It's never a good idea to test as the root user but only as the same user as that used by nagios or nrpe. Regards, Deborah From: Thilakraj.Shanmugam [mailto:thilakraj.shanmu...@canberra.edu.au] Sent: 14 May 2013 09:58 To: Nagios Users List Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Hi Deborah, Thanks for the response.. please find the details below. [root@abc libexec]# pwd /usr/local/nagios/libexec [root@abc libexec]# ./check_iptables.sh - Executing manually script + IPT=/sbin/iptables + GREP=/bin/grep + AWK=/bin/awk + EXPR=/usr/bin/expr + WC=/usr/bin/wc + A=/usr/bin/sudo + E_SUCCESS=0 + E_CRITICAL=2 + E_UNKNOWN=3 ++ /usr/bin/sudo /sbin/iptables -nvL ++ /bin/grep Chain ++ /bin/awk '{ print $2 }' ++ /bin/grep Cid ++ /usr/bin/wc -l + CHAINS=5 + '[' 5 -ne 0 ']' + echo 'Firewall is running!' Firewall is running! + exit 0 -- it shows firewall running ( correct output ) [root@abc libexec]# Client - NRPE config file [root@abc libexec]# cat /usr/local/nagios/etc/nrpe.cfg |grep -i iptable command[check_iptables]=/usr/local/nagios/libexec/check_iptables.sh [root@abc libexec]# [root@abc libexec]# ./check_nrpe -H localhost -c check_iptables Firewall is not running - executing via check_nrpe ( wrong output ) [root@abc libexec]# NRPE Logs - May 14 18:52:28 abc nrpe[31158]: Added command[check_Partion_db]=/usr/local/nagios/libexec/check_disk -w 15% -c 5% -p /db May 14 18:52:28 abc nrpe[31158]: Added command[check_Partion_app]=/usr/local/nagios/libexec/check_disk -w 15% -c 5% -p /app May 14 18:52:28 abc nrpe[31158]: Added command[check_iptables]=/usr/local/nagios/libexec/check_iptables.sh May 14 18:52:28 abc nrpe[31158]: INFO: SSL/TLS initialized. All network traffic will be encrypted. May 14 18:52:28 abc nrpe[31158]: Handling the connection... May 14 18:52:28 abc nrpe[31158]: Host is asking for command 'check_iptables' to be run... May 14 18:52:28 abc nrpe[31158]: Running command: /usr/local/nagios/libexec/check_iptables.sh May 14 18:52:28 abc nrpe[31158]: Command completed with return code 2 and output: Firewall is not running May 14 18:52:28 abc nrpe[31158]: Return Code: 2, Output: Firewall is not running Kind Regards, Thilak From: Deborah Martin [mailto:deborah.mar...@kognitio.com] Sent: Tuesday, 14 May 2013 6:44 PM To: Nagios Users List Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Hi, What is the wrong output being returned ? This might give us all a clue as to the cause of the problem. When you run the check manually, are you doing this as the same user that check_nrpe will use ? Regards, Deborah From: Thilakraj.Shanmugam
Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring
Where's your sudoers definition that allows the nagios user to run any commands via sudo? And what does /var/log/secure (or equivalent) think about the nagios user trying to run sudo? I have tested with nagios user as well.. still no luck with that. Could you some one update if you have any solution on this case. Kind Regards, Thilak From: Deborah Martin [mailto:deborah.mar...@kognitio.com] Sent: Tuesday, 14 May 2013 7:30 PM To: Nagios Users List Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Ok - if I look at your output, manually, when the plugin is run as the root user it produces the correct result. But, you haven't said what the nrpe user is that is running on the remote node and whether the same manual run of the check produces the same output. For example, I run remote plugins through nrpe as the nagios user so if I want to manually test a plugin on the remote node, I would first login as the nagios user to ensure I've got the same environment that would be used when running via nrpe. It might be that the variables you have set in the script only work as the root user. It's never a good idea to test as the root user but only as the same user as that used by nagios or nrpe. Regards, Deborah From: Thilakraj.Shanmugam [mailto:thilakraj.shanmu...@canberra.edu.au] Sent: 14 May 2013 09:58 To: Nagios Users List Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Hi Deborah, Thanks for the response.. please find the details below. [root@abc libexec]# pwd /usr/local/nagios/libexec [root@abc libexec]# ./check_iptables.sh - Executing manually script + IPT=/sbin/iptables + GREP=/bin/grep + AWK=/bin/awk + EXPR=/usr/bin/expr + WC=/usr/bin/wc + A=/usr/bin/sudo + E_SUCCESS=0 + E_CRITICAL=2 + E_UNKNOWN=3 ++ /usr/bin/sudo /sbin/iptables -nvL ++ /bin/grep Chain ++ /bin/awk '{ print $2 }' ++ /bin/grep Cid ++ /usr/bin/wc -l + CHAINS=5 + '[' 5 -ne 0 ']' + echo 'Firewall is running!' Firewall is running! + exit 0 -- it shows firewall running ( correct output ) [root@abc libexec]# Client - NRPE config file [root@abc libexec]# cat /usr/local/nagios/etc/nrpe.cfg |grep -i iptable command[check_iptables]=/usr/local/nagios/libexec/check_iptables.sh [root@abc libexec]# [root@abc libexec]# ./check_nrpe -H localhost -c check_iptables Firewall is not running - executing via check_nrpe ( wrong output ) [root@abc libexec]# NRPE Logs - May 14 18:52:28 abc nrpe[31158]: Added command[check_Partion_db]=/usr/local/nagios/libexec/check_disk -w 15% -c 5% -p /db May 14 18:52:28 abc nrpe[31158]: Added command[check_Partion_app]=/usr/local/nagios/libexec/check_disk -w 15% -c 5% -p /app May 14 18:52:28 abc nrpe[31158]: Added command[check_iptables]=/usr/local/nagios/libexec/check_iptables.sh May 14 18:52:28 abc nrpe[31158]: INFO: SSL/TLS initialized. All network traffic will be encrypted. May 14 18:52:28 abc nrpe[31158]: Handling the connection... May 14 18:52:28 abc nrpe[31158]: Host is asking for command 'check_iptables' to be run... May 14 18:52:28 abc nrpe[31158]: Running command: /usr/local/nagios/libexec/check_iptables.sh May 14 18:52:28 abc nrpe[31158]: Command completed with return code 2 and output: Firewall is not running May 14 18:52:28 abc nrpe[31158]: Return Code: 2, Output: Firewall is not running Kind Regards, Thilak From: Deborah Martin [mailto:deborah.mar...@kognitio.com] Sent: Tuesday, 14 May 2013 6:44 PM To: Nagios Users List Subject: Re: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Hi, What is the wrong output being returned ? This might give us all a clue as to the cause of the problem. When you run the check manually, are you doing this as the same user that check_nrpe will use ? Regards, Deborah From: Thilakraj.Shanmugam [mailto:thilakraj.shanmu...@canberra.edu.au] Sent: 14 May 2013 08:43 To: nagios-users@lists.sourceforge.netmailto:nagios-users@lists.sourceforge.net Subject: [Nagios-users] Nagios Plugin for IPTABLES Monitoring Greetings! Could someone send me nagios plugin which is tested and works well for monitoring IPTABLES in Linux. I have tested below script but it is not returning correct output to nagios server. If I execute script manually, it shows correct output... But if I execute via ./check_nrpe - H localhost -c check_iptables, it shows wrong output. Below is my plugin -- #!/bin/bash set -x IPT='/sbin/iptables' GREP='/bin/grep' AWK='/bin/awk' EXPR='/usr/bin/expr' WC='/usr/bin/wc' A='/usr/bin/sudo' E_SUCCESS=0 E_CRITICAL=2 E_UNKNOWN=3 CHAINS=`$A $IPT -nvL | $GREP 'Chain' | $AWK '{ print $2 }'| $GREP Cid | $WC -l` if [ $CHAINS -ne 0 ] ; then echo Firewall is
[Nagios-users] Nagios-Users: please unsubscribe gch...@renegade.com
Could one of the list admins unsubscribe gch...@renegade.com? Their email has been bouncing for a while now: Delivery has failed to these recipients or groups: gch...@renegade.commailto:gch...@renegade.com The e-mail address you entered couldn't be found. Please check the recipient's e-mail address and try to resend the message. If the problem continues, please contact your helpdesk. Diagnostic information for administrators: Generating server: renegade.com gch...@renegade.com #550 5.1.1 RESOLVER.ADR.RecipNotFound; not found ##rfc822;gch...@renegade.com -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET Get 100% visibility into your production application - at no cost. Code-level diagnostics for performance bottlenecks with 2% overhead Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap1 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem
On 2013-05-23 17:43, C. Bensend wrote: Hey folks, I recently made two major changes to my Nagios environment: 1) I upgraded to v3.5.0. 2) I moved from a single server to two pollers sending passive results to one central console server. Now, this new distributed system was in place for several months while I tested, and it worked fine. HOWEVER, since this was running in parallel with my production system, notifications were disabled. Hence, I didn't see this problem until I cut over for real and enabled notifications. (please excuse any cut-n-paste ugliness, had to send this info from my work account via Outlook and then try to cleanse and reformat via Squirrelmail) As a test and to capture information, I reboot 'hostname'. This log is from the nagios-console host, which is the host that accepts the passive check results and sends notifications. Here is the console host receiving a service check failure when the host is restarting: May 22 15:57:10 nagios-console nagios: SERVICE ALERT: hostname;/var disk queue;CRITICAL;SOFT;1;Connection refused by host So, the distributed poller system checks the host and sends its results to the console server: May 22 15:57:30 nagios-console nagios: HOST ALERT:hostname;DOWN;SOFT;1;CRITICAL - Host Unreachable (a.b.c.d) And then the centralized server IMMEDIATELY goes into a hard state, which triggers a notification: May 22 15:57:30 nagios-console nagios: HOST ALERT: hostname;DOWN;HARD;1;CRITICAL - Host Unreachable (a.b.c.d) May 22 15:57:30 nagios-console nagios: HOST NOTIFICATION: cbensend;hostname;DOWN;host-notify-by-email-test;CRITICAL - Host Unreachable (a.b.c.d) Um. Wat? Why would the console immediately trigger a hard state? The config files don't support this decision. And this IS a problem with the console server - the distributed monitors continue checking the host for 6 times like they should. But for some reason, the centralized console just immediately calls it a hard state. *snip* Set passive_host_checks_are_soft=1 in nagios.cfg on your master server and things should start working as intended. -- Andreas Ericsson andreas.erics...@op5.se Oh lord, THANK YOU. That appears to have fixed that problem, which was a pain in the ass. In my defense, I *did* see that option, but the way I interpreted the comments didn't quite match up with the behavior I was seeing. I should have experimented with it, I guess. A slight adjustment to the comments would have thrown a red flag for me - perhaps this is just a matter of personal interpretation, but maybe the comments could be a bit more specific: diff -uNp nagios-updated.cfg nagios.cfg --- nagios-updated.cfg Sat May 25 09:05:09 2013 +++ nagios.cfg Sat May 25 09:02:37 2013 @@ -981,9 +981,9 @@ translate_passive_host_checks=0 # PASSIVE HOST CHECKS ARE SOFT OPTION # This determines whether or not Nagios will treat passive host -# checks as being HARD or SOFT. By default, a single passive host -# check result will put a host into an immediate HARD state type. -# This can be changed by enabling this option. +# checks as being HARD or SOFT. By default, a passive host check +# result will put a host into a HARD state type. This can be changed +# by enabling this option. # Values: 0 = passive checks are HARD, 1 = passive checks are SOFT passive_host_checks_are_soft=0 Does that make sense? If I had read something like that, it would have been immediately clear to me what was happening. Thank you so much, Andreas! On to the next problem with the upgrade (something that can wait until next week)... Benny -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem
diff -uNp nagios-updated.cfg nagios.cfg --- nagios-updated.cfg Sat May 25 09:05:09 2013 +++ nagios.cfg Sat May 25 09:02:37 2013 @@ -981,9 +981,9 @@ translate_passive_host_checks=0 # PASSIVE HOST CHECKS ARE SOFT OPTION # This determines whether or not Nagios will treat passive host -# checks as being HARD or SOFT. By default, a single passive host -# check result will put a host into an immediate HARD state type. -# This can be changed by enabling this option. +# checks as being HARD or SOFT. By default, a passive host check +# result will put a host into a HARD state type. This can be changed +# by enabling this option. # Values: 0 = passive checks are HARD, 1 = passive checks are SOFT passive_host_checks_are_soft=0 Does that make sense? If I had read something like that, it would have been immediately clear to me what was happening. Thank you so much, Andreas! On to the next problem with the upgrade (something that can wait until next week)... Sorry, too little caffeine too early, got the files reversed. Here's the right diff: diff -uNp nagios.cfg nagios-updated.cfg --- nagios.cfg Sat May 25 10:25:34 2013 +++ nagios-updated.cfg Sat May 25 10:27:12 2013 @@ -981,9 +981,9 @@ translate_passive_host_checks=0 # PASSIVE HOST CHECKS ARE SOFT OPTION # This determines whether or not Nagios will treat passive host -# checks as being HARD or SOFT. By default, a passive host check -# result will put a host into a HARD state type. This can be changed -# by enabling this option. +# checks as being HARD or SOFT. By default, a single passive host +# check result will put a host into an immediate HARD state type. +# This can be changed by enabling this option. # Values: 0 = passive checks are HARD, 1 = passive checks are SOFT passive_host_checks_are_soft=0 -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Not getting notifications when a service is in an UNKNOWN state
I am not sure what I'm doing wrong, I get notified when it's warning or critical but not unknown... I can't figure out why. Any suggestions? Below is the service check. define service{ hostgroup_name hostgroup-win-2003,hostgroup-win-2008 service_description Windows CPU check check_command check_snmp_load_v1!stand!55!95!!$USER2$ use generic-service-pnp notification_optionsu,w,c,r notification_period workhours contactsjeremy.p...@gilbarco.com check_interval 15 retry_check_interval10 } and the command definition: define command { command_namecheck_snmp_load_v1 command_line$USER1$/check_snmp_load.pl -H $HOSTADDRESS$ -C $ARG5$ -T $ARG1$ -w $ARG2$ -c $ARG3$ $ARG4$ -f } And your contact definition? -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem
Hey folks, I recently made two major changes to my Nagios environment: 1) I upgraded to v3.5.0. 2) I moved from a single server to two pollers sending passive results to one central console server. Now, this new distributed system was in place for several months while I tested, and it worked fine. HOWEVER, since this was running in parallel with my production system, notifications were disabled. Hence, I didn't see this problem until I cut over for real and enabled notifications. (please excuse any cut-n-paste ugliness, had to send this info from my work account via Outlook and then try to cleanse and reformat via Squirrelmail) As a test and to capture information, I reboot 'hostname'. This log is from the nagios-console host, which is the host that accepts the passive check results and sends notifications. Here is the console host receiving a service check failure when the host is restarting: May 22 15:57:10 nagios-console nagios: SERVICE ALERT: hostname;/var disk queue;CRITICAL;SOFT;1;Connection refused by host So, the distributed poller system checks the host and sends its results to the console server: May 22 15:57:30 nagios-console nagios: HOST ALERT:hostname;DOWN;SOFT;1;CRITICAL - Host Unreachable (a.b.c.d) And then the centralized server IMMEDIATELY goes into a hard state, which triggers a notification: May 22 15:57:30 nagios-console nagios: HOST ALERT: hostname;DOWN;HARD;1;CRITICAL - Host Unreachable (a.b.c.d) May 22 15:57:30 nagios-console nagios: HOST NOTIFICATION: cbensend;hostname;DOWN;host-notify-by-email-test;CRITICAL - Host Unreachable (a.b.c.d) Um. Wat? Why would the console immediately trigger a hard state? The config files don't support this decision. And this IS a problem with the console server - the distributed monitors continue checking the host for 6 times like they should. But for some reason, the centralized console just immediately calls it a hard state. Definitions on the distributed monitoring host (the one running the actual host and service checks for this host 'hostname': define host { host_namehostname aliasOld production Nagios server address a.b.c.d action_url /pnp4nagios/graph?host=$HOSTNAME$ icon_image_alt Red Hat Linux icon_image redhat.png statusmap_image redhat.gd2 check_commandcheck-host-alive check_period 24x7 notification_period 24x7 contact_groups linux-infrastructure-admins use linux-host-template } The linux-host-template on that same system: define host { name linux-host-template register 0 max_check_attempts 6 check_interval 5 retry_interval 1 notification_interval360 notification_options d,r active_checks_enabled1 passive_checks_enabled 1 notifications_enabled1 check_freshness 0 check_period 24x7 notification_period 24x7 check_commandcheck-host-alive contact_groups linux-infrastructure-admins } And said command to determine up or down: define command { command_name check-host-alive command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 5000.0,80% -c 1.0,100% -p 5 } Definitions on the centralized console host (the one that notifies): define host { host_namehostname aliasOld production Nagios server address a.b.c.d action_url /pnp4nagios/graph?host=$HOSTNAME$ icon_image_alt Red Hat Linux icon_image redhat.png statusmap_image redhat.gd2 check_commandcheck-host-alive check_period 24x7 notification_period 24x7 contact_groups linux-infrastructure-admins use linux-host-template,Default_monitor_server } The Default monitor server template on the centralized server: define host { name Default_monitor_server register 0 active_checks_enabled0 passive_checks_enabled 1 notifications_enabled1 check_freshness 0 freshness_threshold 86400 } And the linux-host-template template on that same centralized host: define host { namelinux-host-template register0 max_check_attempts 6 check_interval 5 retry_interval 1 notification_interval 360 notification_optionsd,r active_checks_enabled 1 passive_checks_enabled 1 notifications_enabled 1 check_freshness 0 check_period24x7
Re: [Nagios-users] Nagios v3.5.0 transitioning immediately to a HARD state upon host problem
I ran into a similar problem, because my template set the service to * is_volatile=1*. http://nagios.sourceforge.net/docs/3_0/volatileservices.html Hrmmm. Good point... However, is_volatile does not appear in any of my configuration files, for any of the Nagios servers. It isn't set by default, is it? The Nagios config.cgi page doesn't even list it, and livestatus (what I use to query my running daemon) doesn't give it as a column it can query. I can't imagine it's on by default in v3.5.0, but I can't really tell if it is or not. I can try explicitly *disabling* it in all hosts, but I can't really test that at the moment - out of here for a long weekend in a few minutes. If it gets annoying enough over the weekend, I might *have* to test that theory. Thank you very much. I will still appreciate any input others can give on this question - it just doesn't seem to be behaving as it's configured! Benny -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Help with CPU Check Thresholds
This is how I configured the service, my aim is to get an alert when the CPU load ( uptime ) reaches 10% and a critical when there is a 20% check_command check_nrpe!check_load!10,4,3!20,15,10 flap_detection_enabled 0 notifications_enabled 1 notification_optionsw,u,r,c notification_period 24x7 check_period24x7 check_interval 1 max_check_attempts 2 first_notification_delay0 notification_interval 1 } The problems is that I get WARN when the load is less than that: WARNING - load average: 1.77, 1.94, 3.04 WARNING - load average: 2.11, 2.23, 3.45 WARNING - load average: 1.90, 3.59, 4.34 WARNING - load average: 5.65, 5.05, 4.86 You configured it to warn when the 15-minute average is 3.00, and in your above four examples, the 15-minute averages are all 3.00. It is working like you configured it to. The plugin's output is 1-minute, 5-minute, and 15-minute average. Benny -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- Own the Future-Intelreg; Level Up Game Demo Contest 2013 Rise to greatness in Intel's independent game demo contest. Compete for recognition, cash, and the chance to get your game on Steam. $5K grand prize plus 10 genre and skill prizes. Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_nt - MEMORY USAGE - incorrect results
Not entirely accurate. I just started troubleshooting a Win2008 R2 system yesterday - it has 16GB of physical RAM + 16 GB pagefile for a total of 32GB of virtual memory. The system is using 10.9GB of physical RAM, yet check_nt tells me it's using 2.69GB. Completely wrong, even if check_nt was only talking about physical, only talking about virtual, or talking about the sum. Solution? Remove yet another checkcommand using that outdated program. Benny this because in your server 2008 you will see that there is a virtual memory activated, go to Computer proprieties and see in performences you will have for exemple for R2 (x64) server box (has SQL installed on it) – 12GB ram installed 12GB of virtual memory.Finaly, Nagios take the some of memories ( virual memory + RAM). 2013/1/9 Andrew Thompson and...@fulgent.co.uk Hi all, ** ** Using the supplied check_nt plugin to check Memory Usage on Windows servers. ** ** Some report correctly, others report a complete load of old tosh!!! ** ** I have tried 3 different versions of Windows OS, the version seems to make no odds. Doesn’t matter if 32 or 64 bit either. ** ** ** ** ** ** Some examples ** ** ** ** ** ** MY primary domain controller – Windows Server 2008 R2 (x64) – 8GB ram installed ** ** Output from the check appears correct: Memory usage: total:8205.64 Mb - used: 2902.96 Mb (35%) - free: 5302.67 Mb (65%) ** ** ** ** ** ** Another 2008 R2 (x64) server box (has SQL installed on it) – 12GB ram installed ** ** Output thinks its got 24GB: Memory usage: total:24573.16 Mb - used: 1796.71 Mb (7%) - free: 22776.45 Mb (93%) ** ** ** ** ** ** ** ** A Server 2003 Standard (x86) box (an internal test web server) – 512MB ram installed ** ** Output thinks its got over 1GB: Memory usage: total:1257.50 Mb - used: 333.30 Mb (27%) - free: 924.20 Mb (73%) ** ** ** ** ** ** A Server 2012 (x64) box (with HyperV installed) – 28GB ram installed ** ** Output thinks tis got 32GB: Memory usage: total:32500.80 Mb - used: 16709.37 Mb (51%) - free: 15791.43 Mb (49%) ** ** ** ** ** ** ** ** Anybody any ideas as to why check_nt is returning incorrect info. I know its incorrect but Nagios doesn’t so where exactly is it reading these values from? ** ** Thanks in advance for anybodies input. ** ** Regards ** ** ** ** ** ** -- Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery and much more. Keep your Java skills current with LearnJavaNow - 200+ hours of step-by-step video tutorials by Java experts. SALE $49.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122612 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Cordialement, Omar SADDIKI Master Réseaux et Systèmes -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnmore_122712___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnmore_122712 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Inconsistency of Nagios
I think you'll have to write one... check_procs is not helpful in this case, as the daemon forks off many processes to run plugins. Just checking for number of Nagios processes won't help, as it won't be aware of parent/child relationships. Now, mind you, I *do* run check_procs on my Nagios servers just to make sure I don't have runaways. But it won't tell me if I have more than one daemon running. If you really want to work on this, you'll have to write a plugin that is able to follow the parent/child relationships (take a look at the ps man page) and is able to determine if there's more than one parent process. I *think* that is a decent direction to go. Slightly off topic, but how best to write nagios check that checks for this specific behavior (multiple instances of nagios running) ? - Original Message - From: Mike Guthrie mguth...@nagios.com To: Nagios Users List nagios-users@lists.sourceforge.net Sent: Wednesday, January 2, 2013 3:23:58 PM Subject: Re: [Nagios-users] Inconsistency of Nagios Typically when I've seen behavior like this, it's because there are multiple parent processes of Nagios running, so both instances are launching checks, and reaping each others results. Try killing off all Nagios processes, and then starting it fresh again to see if that resolves the issue. /etc/init.d/nagios stop killall -9 nagios /etc/init.d/nagios start On 1/2/2013 4:38 AM, Srikanth Gumma wrote: Hi, I need some help regarding nagios. We have around 500 Linux servers for which we are doing a ping and ssh monitoring only. The entire functionality is based on remote and no NRPE service is deployed. However I see very inconsistency on nagios functionality. sometimes I don't see any updates on the nagios console for more than one week. Our Nagios is installed on CentOS6.2 OS and it's the latest version Nagios Core 3.4.3. and I could only see some messages like below in /var/log/messages 'SSH' on host 'xyz' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the service... any help is highly appreciated. Regards Srikanth -- Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery and much more. Keep your Java skills current with LearnJavaNow - 200+ hours of step-by-step video tutorials by Java experts. SALE $49.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122612 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Mike Guthrie Technical Team ___ Nagios Enterprises, LLC Email: mguth...@nagios.com Web: www.nagios.com -- Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery and much more. Keep your Java skills current with LearnJavaNow - 200+ hours of step-by-step video tutorials by Java experts. SALE $49.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122612 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnmore_122712 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn
Re: [Nagios-users] Inconsistency of Nagios
If I understand correctly, I should create some plugin to kill all dependency process on a periodic interval. In my observation I did not see multiple parent process. My recommendation was to write a plugin to *detect* multiple parents, not kill them. Currently what I observe is whenever I see such sluggishness then I stop nagios service cleanup checkresult directory and start nagios again. However, with your further note above, I don't think you're getting multiple daemons running. Forgive me, I don't recall the full details of your installation - are you running any sort of NDO module? NDOUtils? Are you processing perfdata? If so, via what mechanism? -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122812 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Weird Nagios Problem
I have been running Nagios for over a year with no issues. All of a sudden, all of my current loads on my linux servers all go into warning state at the same time, showing the exact same load, which then increments every hour to critical. After a while (3 or 4 hours) they all come back down to normal. Checking on the servers themselves using HTOP shows normal load levels throughout the time period. Hmmm, yeah. Check that service and checkcommand definition. I bet you're actually testing the load on the *Nagios* server, and not the individual servers you think you're testing it on. What's the Nagios server's load during that time? I bet it matches up... -- Unless you're a lawyer, you don't understand Oracle licensing. That applies equally to Oracle employees as well as customers. -- Me, 2012-05-10 -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Distributed monitoring: v3.4.1 not translating host states like it should
Hey folks, I am in the process of implementing a distributed monitoring architecture, and I'm having some problems with host state. Here are the specs: Nagios v3.4.1 RHEL 6.3 Using NSCA to send results to passive collector Yes, I have 'translate_passive_host_checks' set on the collector. :) So, the system is up and running, and I do see host alerts in /var/log/messages on the collector. However, in the web interface, all hosts remain up. I can go into the host details for a host that's offline because of Sandy, and it reports a host status of UP, with the status information PING CRITICAL - Packet loss 100%. Obviously, the host states coming from the passive monitors are not being translated. Active host and service checks are disabled on the collector, and enabled on the monitors. Passive host and service checks are enabled everywhere, and the collector *is* receiving them. I'd appreciate it if someone can help me out here... I'll provide whatever details are necessary... Thanks much! Benny -- Unless you're a lawyer, you don't understand Oracle licensing. That applies equally to Oracle employees as well as customers. -- Me, 2012-05-10 -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios Plugin Log Pattern Notification
http://labs.consol.de/lang/en/nagios/check_logfiles/ check_logfiles is one of the more powerful plugins. Couldn't agree more. The consol.de guys are great! Benny -- Death rays, advanced technology or not, no creature wants to be stabbed in their hoo-hoo.-- Seen on zombiehunters.org -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Dynamic warning/critical thresholds
On 22/06/12 15:11, Jonathan Gazeley wrote: I've got a bunch of Nagios plugins that monitor things like DNS/HTTP/RADIUS hits per second. I've set what I believe to be sensible max/min warning thresholds but what I really want is dynamic thresholds. If some quantity suddenly doubles or halves, I'd like an alert. For example, if I usually serve 10 DNS lookups per second, and suddenly it is doing 20 per second, that isn't a fault but I would like to know about it, because it might mean there is a problem with the network in general. Is there a way of doing this? Any ideas? You've already received two replies, both stating that you'll likely have to write some code to do it. I'm not aware of any common plugins out there that calculate rates of change and alert appropriately. Maybe they exist, but I don't recall seeing any of them. Have you tried any of the plugin sites? -- Death rays, advanced technology or not, no creature wants to be stabbed in their hoo-hoo.-- Seen on zombiehunters.org -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Dynamic warning/critical thresholds
You've already received two replies, both stating that you'll likely have to write some code to do it. I'm not aware of any common plugins out there that calculate rates of change and alert appropriately. Maybe they exist, but I don't recall seeing any of them. Have you tried any of the plugin sites? Oh, I didn't receive any replies. Presumably the mails got lost in the ether. I'm happy to write code - I just wondered if there was a built-in way of doing this. Not to my knowledge, no - the standard Nagios plugins don't know about rate of change, and I haven't run across many (any?) third- party plugins that do. The difficult part is retaining state - yes, it's simple to use a statefile, but if you have a lot of services you could end up with thousands of state files. It can become pretty ugly to deal with them. Your original message (and consequently, the replies you missed) can be found here: http://marc.info/?l=nagios-usersm=134037453807273w=2 -- Death rays, advanced technology or not, no creature wants to be stabbed in their hoo-hoo.-- Seen on zombiehunters.org -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_logfiles
Here's the run of the command I am trying [db:~] root% /opt/nagios/libexec/check_logfiles --logfile=/u01/app/oracle/admin/ecom/bdump/alert_ecom1.log --tag=oracle --rotation=linux --criticalpattern='ORA-00600' --warningpattern='ORA-*' OK - no errors or warnings|oracle_lines=0 oracle_warnings=0 oracle_criticals=0 oracle_unknowns=0 This is what is in that logfile - [db07:~] root% grep 'ORA-00600' /u01/app/oracle/admin/ecom/bdump/alert_ecom1.log ORA-00600 - This is only a test.. please disregard Try using the allyoucaneat option to test on the command line... IIRC, check_logfiles will only check a reasonable number of lines in the log file the first time, and from that point on only new ones. If that ORA-00600 is a long ways back, check_logfiles may not grok it. The allyoucaneat option should force the plugin to check *all* lines in the file. Benny -- Death rays, advanced technology or not, no creature wants to be stabbed in their hoo-hoo.-- Seen on zombiehunters.org -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Dynamic warning/critical thresholds
I've got a bunch of Nagios plugins that monitor things like DNS/HTTP/RADIUS hits per second. I've set what I believe to be sensible max/min warning thresholds but what I really want is dynamic thresholds. If some quantity suddenly doubles or halves, I'd like an alert. For example, if I usually serve 10 DNS lookups per second, and suddenly it is doing 20 per second, that isn't a fault but I would like to know about it, because it might mean there is a problem with the network in general. Is there a way of doing this? There's always a way. :) However, in this case, you're probably going to have to write a plugin to do it. You're asking to alert on a rate of change, and I can't think of any of the stock plugins that do that. Keeping state between polling runs is something that can get a big ugly. Do some rooting around the plugin community (the Nagios Exchange and/or the Monitoring Exchange) to see if you can find some examples of rate-aware plugins. While it's not rate that it's tracking, I know the check_iptraf*.pl plugins will at least keep state between polling cycles, so that might be somewhere to start. Benny -- Death rays, advanced technology or not, no creature wants to be stabbed in their hoo-hoo.-- Seen on zombiehunters.org -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_procs returning wrong data
I finally got it working but it was not that easy. As I am using CentOS 5, by default the requiretty value in the /etc/sudoers file is activated, so I had to edit it like this: #Defaultsrequiretty nagios ALL=(ALL) NOPASSWD:/usr/local/nagios/libexec/check_procs And the command in the .cfg file would be like this: command[check_total_procs]=sudo /usr/local/nagios/libexec/check_procs -w 150 -c 200 It's a bit safer to use this right before the user and command definition: Defaults:nagios !requiretty That way, you're leaving the restriction in place for *other* users, you're just overriding it for the nagios user. Benny -- Death rays, advanced technology or not, no creature wants to be stabbed in their hoo-hoo.-- Seen on zombiehunters.org -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] High Service Check Latency
I've some broker modules to handle sql logging and distributed setup. I bet you're using NDOUtils. I wouldn't recommend that. I couldn't keep a Nagios server with under 6000 services limping along when NDOUtils was running. Eventually, the check latencies would go through the roof and the entire server would get farther and farther behind. I went to Livestatus. It took me all of 20 minutes to adjust my reports to use the new interface, and I haven't restart my Nagios daemon since (other than normal maintenance). -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] does monitoring stop while nagios is flushing queued items?
My installation is centreon + nagios. Sometimes I need to do maintenance on mysql so I stop ndo2db and let nagios cache the result first. And when I start ndo2db, nagios will start flushing the items. I notice from the service perf data file, the data stops coming (or nagios not polling new data) while nagios is flushing queued item. I just want to confirm whether it is the behavior of nagios? If yes, any workaround for this? This is one of the big reasons I stopped using NDOUtils - the broker would regularly block the Nagios process. So yes, you're correct - your NDOUtils broker is blocking, and nothing is happening during these periods of maintenance. That, and the check latency. With NDOUtils, I couldn't let my Nagios daemons run a full week without restarting them or the check latencies would shoot through the roof (that's a full restart, not a reload). -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] How many hosts and services are you monitoring with Nagios?
What kinds of numbers of hosts and services are you all monitoring? Which add-ons / distributed frameworks are you using? At my ${CURRENT_JOB}, I'm monitoring around 600 hosts with just under 6000 services on a single VM running RHEL 5. I do process perfdata on the same node, and replicate all config data and state data to a warm standby (also a VM). Replication is done via MySQL replication (for the config data) and NSCA (for the state data). A custom perl program dumps the extended state data (disabled notifications, acknowledgements, etc) for import if needed. Yes, I know, VM bad. :) Just not bad enough to spend real dollars on more physical hosts. This year, I will be bringing up a second pair of monitoring hosts at a secondary data center, with much the same architecture. Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Querying nagios object information through command line
Is there a script or a module that can be called through a command line and can retrieve nagios object definition , host , service? I am thinking of calling from php program. I found that config.cgi has an ability of fetching the object definition but it seems that it returns html info. It would be helpful if someone can share their thoughts. Livestatus can do this, and it's MUCH quicker/more lightweight/better (IMHO) than NDOUtils. http://mathias-kettner.de/checkmk_livestatus.html Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nagios backup
Backup done successfully. All hosts are imported and being monitored, but a few things are not working, like SMS messages and mail messages. Does it need to be backed up from some directory? You need to examine your notification commands. You may have used some third party software to send SMS messages, and that may not be installed on the new system. Also, your email configuration may not be the same or may be incomplete on the new system. -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Performance data not being returned
The plugin is being executed through NRPE. Executing the plugin by hand seems to return valid perfdata: [jg4461@dhcp1 ~]$ /usr/lib64/nagios/plugins/check_dhcpd_pools OK - all pools less than 80% full | 'resnet-wireless-652'=43.769%;80;90, 'resnet-wireless-653'=47.923%;80;90, 'resnet-wireless-654'=46.201%;80;90, 'resnet-wireless-655'=44.681%;80;90, 'resnet-wireless-656'=47.720%;80;90, 'resnet-wireless-657'=47.112%;80;90, 'resnet-wireless-658'=42.452%;80;90, 'resnet-wireless-659'=0.304%;80;90, 'resnet-wireless-ratelimited-660'=1.114%;80;90, 'resnet-wireless-onlinepayment-661'=0.405%;80;90, 'resnet-wireless-onlinepayment-662'=0.405%;80;90, 'resnet-wireless-onlinepayment-663'=0.304%;80;90, 'resnet-wireless-consoles-665'=1.114%;80;90, 'resnet-wireless-message-666'=0.000%;80;90, 'resnet-wireless-instructions-667'=8.056%;80;90 http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN201 I think you might try spaces, not commas. I have developed a number of plugins, and I've never used anything but spaces to delimit the performance data. If Nagios doesn't believe that's valid data, it's going to ignore it. -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Performance data not being returned
I've narrowed it down to a stage where running the plugin directly returns the right results, but running the plugin through check_nrpe on localhost returns this: [jg4461@dhcp1 log]$ /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_dhcpd_pools OK - all pools less than 80% full | What could cause NRPE to truncate the results in such a way? Too much data? Are you using SSL? I don't know that I've seen this behavior before - it's always been *invalid* perfdata that have caused this issue for me. -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] 2 Nagios boxes running together in different locations
We have a bit of a tempermental firewall at the moment that keeps going down thus resulting in everything appearing down to Nagios in Location A and it alerting like a loonatic for all hosts/services (88/156) You could monitor the firewall, and configure it to be the parent of the hosts behind it. That way, when it goes down, you only get the alert for the firewall crapping out, and not all of the hosts that depend on it. -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] 2 Nagios boxes running together in different locations
Interesting - How does it work though - I mean if the firewall plays up at Site A, it thinks everything in Site B is down - so Nagios GUI marks everything as down - what happens then if say a server in Site B does actually go down - we will not get alerted to that? That's correct. But, your proposed configuration wouldn't solve this problem - if the firewall fails, the Nagios servers can't contact each other anyway, so they could never agree on what's up and what's down. I made a slight error in my original description - when the firewall goes down it cant contact anything at both locations, not just Site A, due to the fact that the protected interface stays up but just denies all traffic. We are currently working on this with GTA but im losing the will to live with 300 texts virtually every night!! I've dealt with this situation before, and I've ended up implementing two mostly standalone Nagios systems. They each check their own site, so if their external network goes away they are still able to monitor and alert for the things they're responsible for (you have to use out-of-band notifications of course). They also each check each other's *site*, ala the other site's firewall, so the Nagios server at site A can alert and let you know if site B goes away, but it *doesn't* try to alert you for all of the hosts and services at site B. -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] notifications
I am using Nagios 3.3.1 I have got notifications by SMS working now Is there a way of defining what notifications go to email, what go to SMS and what can go to both. I would like this to apply to escalations as well if possible I create two Nagios contacts for each person at my site, one for email alerts and one for SMS alerts. I then place the appropriate contacts in each contactgroup, according to which type of alert should be sent. Then, for each host/service, I include the appropriate contactgroups. For example, my Exchange servers' CPU services get the exchange-admins-email contactgroup, which only sends email to their contacts. The Exchange servers' database services, however, get the exchange-admins-pagers group, so they get SMS'ed for database problems. Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] use_large_installation_tweaks
Does anyone has configured it ? Is it really a good way to follow to reduce memory usage ? For me, it was a good way to reduce memory and CPU, and it helped with check latencies. Although, the absolute best way to reduce check latencies for me has been to dump NDOUtils. Good lord, that was awful, had to restart Nagios three times a week. Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Building a reliable uptime monitoring model
So I was wondering how is everyone reliably checking and notifying the intended audience of server reboots with high rate of success. I use check_logfiles from the Consol.de guys to watch for the actual event or log entry specifying a reboot. I don't count on the server being down long enough to trigger a host down/host up alert. Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Can someone at Nagios Enterprises please take a look at old.nagios.org?
I've been trying to get to the external commands reference for several hours, keep getting Error connecting to MySQL server... Thanks! Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Can someone at Nagios Enterprises please take a look at old.nagios.org?
Nope, the actual list of commands linked at the bottom of that page. Benny This one? http://nagios.sourceforge.net/docs/3_0/extcommands.html On Fri, Mar 16, 2012 at 7:40 PM, C. Bensend be...@bennyvision.com wrote: I've been trying to get to the external commands reference for several hours, keep getting Error connecting to MySQL server... Thanks! Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Performance data not being written to file with 3.3.1
Hey folks, So, I have the following setup after some re-architecting this past weekend: * Primary Nagios server running 3.2.3 * Secondary Nagios server running 3.3.1, receiving all check results via NSCA Everything should be identical between the primary and secondary servers, other than the secondary system not running active checks and having notifications disabled. Each system should process its own performance data. However, I'm wrestling with the new secondary server... I have it configured to write host and service perfdata to a file, and then npcd processes that perfdata from there. Unfortunately, the host and service perfdata files are being written with no data in them (0 bytes). I'm *getting* the perfdata from the primary host - I can view it in the secondary hosts' web interface. It's there. But for some reason, the secondary Nagios daemon isn't writing that data into the file, but it *is* creating the file. I don't know of any reason this shouldn't work... Does anyone with more knowledge of the nuts-n-bolts know why a passive Nagios daemon (no active checks, all data received via NSCA) wouldn't write the perfdata it receives? It thinks it has data - the host and service perfdata files are created and removed as the Nagios daemon creates them, and the process_* commands process them. I'll provide whatever details are necessary, I just want to verify the basic premise of my setup before flooding you with information. :) Thanks much! Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios 3.2.3 - 3.3.1 upgrade path
I just want to make sure my 3.2.3 system and my 3.3.1 system will be able to talk. :) They will, so no worries there. Fantastic. Thanks, Andreas! Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Nagios 3.2.3 - 3.3.1 upgrade path
Hey folks, I'm planning a migration to 3.3.1, and I had a quick question for those of you that have done it. I have a manual failover setup, with one monitoring node that sends all results to another warm standby system via NSCA. If I rebuild one system to 3.3.1 and the active monitoring node remains on 3.2.3 for a week or two, are there going to be any issues? I want to be sure they're compatible enough to run for a short time, so I'm not rebuilding my entire environment in an afternoon. Normally, I'd just upgrade the software and go, but I'm taking this opportunity to make some other adjustments to my system, so I'll be doing bare-metal installs from the OS up. I just want to make sure my 3.2.3 system and my 3.3.1 system will be able to talk. :) Thanks much! Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NRPE allowed_hosts directive
tried putting the IP addresses of all the hosts in the network. However, when I assign this variable to all the IP addresses (which is very long), U... Just how many Nagios servers do you HAVE? That configuration option is to list the Nagios servers that will be polling your NRPE daemon, not all the hosts in the network. Just wanted to make sure you're understanding the option correctly... Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Monitoring log files of oracle DB on windows server using nagios
Need the best solution to monitor log files of the DB server actually oracle DB log files on windows server. Please suggest what can be the best way to achieve this. check_logfiles from the Consol.de guys is your friend. It even groks the Oracle log file formats. Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Eventlog monitoring through NSClient++
We actually use check_logfiles with NSClient so haven't seen this, and we have tons of rules. Might be worth looking at. Not that anything is wrong with NSClient :) just check_logfiles also has more regex and options. +1 The consol.de guys are awesome, and check_logfiles is another example of their excellent contributions to the community. Benny -- The problem with quotes on the internet is that it's very hard to verify their authenticity. -- Abraham Lincoln -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] monitoring dhcp
We've a Windows 2008 server with DHCP role. There is an option to display the statics of the scope. Total Addresses .. In Use .. % Available ..% Is it possible to gt these information available in Nagios. I haven't had much luck with this. Microsoft doesn't expose hardly any of this data via WMI or any other interface that I've found. I haven't looked at Powershell yet, mostly because many of my servers do not have it installed (2003 -vs- 2008). The best I've been able to do is watch the event log for DHCP server complaints about a scope getting close to consumed. Even *that* has been problematic, as the DHCP server service seems to arbitrarily decide when it wants to complain. I ended up writing a custom plugin that watches the event log for those events, parsing the output, and deciding on whether it's appropriate to alert. Benny -- Cats land on their feet. Toast lands peanut butter side down. A cat with toast strapped to its back will hover above the ground in a state of quantum indecision. -- Unknown -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] monitoring dhcp
I don't think you'll have much trouble getting this via SNMP. It is defined in the MIB on a per-scope basis, suggest you go a snmpwalk on the OID I gave earlier and see what you get. MIB excerpt: scopeTable OBJECT-TYPE SYNTAX SEQUENCE OF ScopeTableEntry ACCESS read-only STATUS mandatory DESCRIPTION A list of subnets maintained by the server ::= { dhcpScope 1 } scopeTableEntry OBJECT-TYPE SYNTAX ScopeTableEntry ACCESS read-only STATUS mandatory DESCRIPTION This is the row corresponding to a subnet INDEX { subnetAdd } ::= { scopeTable 1 } ScopeTableEntry ::= SEQUENCE { subnetAdd IpAddress, noAddInUse Counter, noAddFree Counter, noPendingOffers Counter } subnetAdd OBJECT-TYPE SYNTAX IpAddress ACCESS read-only STATUS mandatory DESCRIPTION This is the subnet address ::= { scopeTableEntry 1 } noAddInUse OBJECT-TYPE SYNTAX Counter ACCESS read-only STATUS mandatory DESCRIPTION This is the no. of addresses in use ::= { scopeTableEntry 2 } noAddFree OBJECT-TYPE SYNTAX Counter ACCESS read-only STATUS mandatory DESCRIPTION This is the no. of addresses that are free ::= { scopeTableEntry 3 } noPendingOffers OBJECT-TYPE SYNTAX Counter ACCESS read-only STATUS mandatory DESCRIPTION This is the no. of addresses that are currently in the offer state ::= { scopeTableEntry 4 } END Thank you, Giles! This doesn't help me (I don't have SNMP enabled on my hosts and don't plan on doing so), but it's good to know for the future... That's certainly better than what is exposed elsewhere... Benny -- Cats land on their feet. Toast lands peanut butter side down. A cat with toast strapped to its back will hover above the ground in a state of quantum indecision. -- Unknown -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] GSM gateway - virtualization ...
I'm running Nagios on OpenSUSE. OpenSUSE is a virtual machine under the Windows 2008 HYPER-V. I'd like to send SMS messages instead of e-mails, because mails can't be delivered in case the SMTP server is down. I don't know any specific hardware for GSM gateway and I have no idea how to use it in virtual environment, how to connect it to RS232? port in the host machine and forward it to the Nagios on the SUSE virtual?? Would anyone be so kind and give me an advice and will share his opinions? We run Nagios servers as virtual machines on mid-sized ESX clusters, so our VMs might end up on any given ESX host at any given time. As a result, I virtualized our serial modem using the IOLAN product from Perle. It is basically a network-to-serial adapter. Our serial USR modem plugins into it, and it plugs into the network. Then, on the Nagios servers (RHEL), the Perle software is installed and creates a serial port that Nagios talks to. That virtual port simply talks to the IOLAN device over the network, and acts as a local serial modem. Now, our Nagios servers can end up anywhere they like, and the modem always stays local as far as Nagios is concerned. It's not infallible, every now and again the two components lose communications, but I check for that via Nagios. There certainly exists the possibility that we could: 1) Lose communications with the virtual modem AND 2) Have a widespread network outage at the same time that would completely clobber notifications, but the chances are pretty minor, and the costs are very reasonable. Benny -- Cats land on their feet. Toast lands peanut butter side down. A cat with toast strapped to its back will hover above the ground in a state of quantum indecision. -- Unknown -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Scheduled downtime for mass hosts?
Tonight I will be forming maintenance on over 50 of my servers and will be taking firewall and routing links out. I have 86 hosts that this will affect. Im going to put them into scheduled downtime in Nagios. I have my hosts divided into hostgroups. Is there a quick way to schedule all 86 hosts into Nagios downtime rather than having to click on each host in the web GUI and doing them individually? This is precisely why I have a hostgroup that contains *all* my hosts. I just issue a single downtime command for the host*group*, and voila. Benny -- Cats land on their feet. Toast lands peanut butter side down. A cat with toast strapped to its back will hover above the ground in a state of quantum indecision. -- Unknown -- RSA(R) Conference 2012 Mar 27 - Feb 2 Save $400 by Jan. 27 Register now! http://p.sf.net/sfu/rsa-sfdev2dev2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Scheduled downtime for mass hosts?
Ah Benny, I didn't know you could schedule maintenance for each individual host group I have my 86 hosts arranged into 4 hostgroups so I will just do this. 4 clicks and job done, thought I was in for the long haul by clicking all 86 hosts 1 by 1. Thanks, you have saved me a load of time! You can issue commands to both hostgroups and servicegroups... It makes it much easier to deal with outages and scheduled maintenance. And because hosts and services can be in more than one group, you can arrange them however you like according to roles, network placement, or even physical site. :) Benny -- Cats land on their feet. Toast lands peanut butter side down. A cat with toast strapped to its back will hover above the ground in a state of quantum indecision. -- Unknown -- RSA(R) Conference 2012 Mar 27 - Feb 2 Save $400 by Jan. 27 Register now! http://p.sf.net/sfu/rsa-sfdev2dev2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Changing the version of nagios that appears in the e-mail notifications
I would like to change the version that is displayed to reflect that of the release that is currently on the server. What file(s) do I need to modify in order to accomplish this? Your email notifications are just another command, so you need to update the definition of that command. It may be in misccommands.cfg, but really, you're the only one that can answer that one as it could be in *any* of the config files. :) 'grep -r' would be your friend here. Benny -- Cats land on their feet. Toast lands peanut butter side down. A cat with toast strapped to its back will hover above the ground in a state of quantum indecision. -- Unknown -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Check
It works fine, but I prefer to use an other method, most lighter than the check_by_ssh. Do you know an other way to do that, via SNMP for exemple. I run NRPE on my Linux systems... It is much lighter than using check_by_ssh. Benny -- Open your door, or I open your wall. -- Seen on an image on fukung.net -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Suggestions for checking DHCP Scopes
Does anyone have any suggestions for checking DHCP scopes on Windows servers. I saw one util on Nagios Exchange that uses a vbs script but I have no idea how I would set that up. Define checking DHCP scopes? Do you need to make sure DHCP is running? Do you need to make sure DHCP clients can get addresses? Do you need to check how many IPs are free in the scopes? Each of the above could be a different failure mode. Personally, I gave up on checking for lease availability, we have way too many scopes, VLANs, etc. I now do the following: 1) Check to make sure the DHCP Server service is running 2) Via a custom setup between consol.de's excellent check_logfiles plugin and a perl wrapper I wrote, check for an event 1020 in the system event log and parse the output #2 was a pain, as Windows apparently has no hard-and-fast way to check on IP availability in the scopes, and randomly logs a 1020 whenever it feels you just don't have enough addresses left. With the wrapper program, I parse the 1020 events and apply my own thresholds to determine good, bad, or ugly. Microsoft: why oh why do you not expose the IP address utilization via performance counters or WMI? Benny -- Open your door, or I open your wall. -- Seen on an image on fukung.net -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] muti-sessions in Nagios
is there a possbility to ceate multi-sessions in nagios my aim is create many sessions for administrator and i want that an administrator (central) look all the maps but the ohter look just thare maps for example : admin central : in site 0 : supervise all sites admin Nubmer1: in site 1 ; supervise the parc of a thos site 1 admin Nubmer2: in site 2 ; supervise the parc of a thos site and so on If I understand your question correctly, Nagios pretty much does that out-of-the-box. By default, if you use authentication, authenticated users will only be able to see/issue commands to hosts and services for which they are contacts. So, if admin1 is a contact for hosts A, B, and C, while admin2 is a contact for hosts D and E, then admin1 will only see his/her three hosts, and admin2 will only see his/her two. Benny -- Open your door, or I open your wall. -- Seen on an image on fukung.net -- BlackBerryreg; DevCon Americas, Oct. 18-20, San Francisco, CA The must-attend event for mobile developers. Connect with experts. Get tools for creating Super Apps. See the latest technologies. Sessions, hands-on labs, demos much more. Register early save! http://p.sf.net/sfu/rim-blackberry-1 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios force host/service check
Thanks for the response. Below are entries that are made in ssl_access_log: When I click on Re-schedule the next check of this service, it creates the following entry: 139.222.121.213 - xca10...@uea.ac.uk [12/Jul/2011:11:03:10 +0100] GET /nagios/cgi-bin/cmd.cgi?cmd_typ=7host=cn001service=cpu HTTP/1.1 200 3143 When I click on the commit button after clicking the force check tick box, it then creates the following two entries: 139.222.121.213 - - [12/Jul/2011:11:03:14 +0100] POST /nagios/cgi-bin/cmd.cgi HTTP/1.1 401 490 139.222.121.213 - xca10...@uea.ac.uk [12/Jul/2011:11:03:14 +0100] POST /nagios/cgi-bin/cmd.cgi HTTP/1.1 200 1314 It's not clear what the URL is. OK, so, it's not a GET, it's a POST. I'll let someone more familiar with that comment... Doing this via the web interface seems cumbersome, but it might be possible to do it via the command file somehow. Not like that's much better... http://old.nagios.org/developerinfo/externalcommands/commandlist.php http://nagios.sourceforge.net/docs/nagioscore/3/en/extcommands.html Good luck! Benny -- You were doing well until everyone died. -- God, Futurama -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios force host/service check
Does anyone know the HTTP(S)-GET command to force check a host/service? I would like a host to execute the HTTP(S)-GET command to force Nagios to check the status as it is booting up. Any help will be greatly appreciated. Thanks in advance. Watch your http access log, and execute the same command with a web browser. Take that URL and use wget with the --http-user and --http-password options. Voila! Benny -- You were doing well until everyone died. -- God, Futurama -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios Issue of not detecting down/up status of server and delay of mails notifications
Is this a valid issue with nagios or is there any way to scale it up. How can a network/sever admin can believe on it if this works like this. This is a configuration issue... I monitor some 700 hosts and 6000+ services on a single host and my notifications go out instantly (once max_check_attempts has been hit). Choose a host that has shown this problem, and grep your Nagios log for it... Copy-n-paste the entries here. That will show what Nagios thinks about the situation. Benny -- You were doing well until everyone died. -- God, Futurama -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] How to monitor specific windows services using nsclient++
Actually i want to monitor specific windows services using nagios and nsclient++ agent installed on Windows servers... OK. $USER1$/check_nrpe -H $HOSTADDRESS$ -u -c CheckServiceState -a ShowAll $ARG1$=$ARG2$ Then, in your Nagios service command definition, call that command with two arguments: 1) The service name from the Windows services snap-in 2) started or stopped, according to what state you want the service to be in during normal operation Also i don't know which critical windows services to monitor exactly but my boss says it should be done... Can you people give me some help regarding this.. On our Windows systems, we monitor a basic set of standard services: * Disk volume space * CPU utilization * Memory utilization * My NSClient++ version * My NSClient++ configuration version * My plugins version * Windows version (for informational purposes only) Benny -- You were doing well until everyone died. -- God, Futurama -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] How to monitor specific windows services using nsclient++
Thanks Benny,..but still i couldn't understand is check_nrpe is used for monitoring windows servers because what i know it's for monitoring remote linux servers only.. If yes do i need to install check_nrpe on my Nagios Server.. Also i am already monitoring these basic things but i want to monitor specific services for e.g say mssql running or down..Similarly other important windows services... NSClient++ listens for NRPE requests as well, on TCP port 5666. Hence, if you have NSClient++ installed on your Windows systems, you can use check_nrpe to talk to them. And yes, then you'd need to install the check_nrpe tool on your Nagios server. I prefer using check_nrpe, I only use check_nt for a very small number of services. The command definition I gave you will check a service on a remote Windows server to see if it's running or not. So, open up your Windows services snap-in, and you can check any of the services listed the same way. -- You were doing well until everyone died. -- God, Futurama -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Getting Started with Nagios
I then restart the web server, expecting to see a new host inf1 But the host count has not increased and I can't see any reference to the host. I also can not see the new host group I defined. So obviously I am missing something fundamental. Thanks for any incite you care to share :) I then restart the web server... If you mean that literally, as in you restarted Apache, that won't change anything for Nagios. Apache only provides the web server for the UI, it has nothing to do with Nagios. The Nagios daemon is the one you need to restart (or more accurately, you can send it a SIGHUP signal) to pick up on your configuration file changes. Now, if you *did* restart Nagios and your changes aren't appearing in the web interface, do the following: 1) Stop the Nagios daemon 2) Now, go stop the *other* Nagios daemon(s) It is a *very* common problem, especially when people are just starting out, to accidentally start more than one Nagios daemon. Changes are made, *one* of the Nagios daemons are restarted, while the other continues to happily run the old configuration (and show up in the web interface). Benny -- You were doing well until everyone died. -- God, Futurama -- Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Data protection magic? Nope - It's vRanger. Get your free trial download today. http://p.sf.net/sfu/quest-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Default Acknowledge Behavior
Can the default behavior for acknowledging an event be changed. As in can the default be changed from the Sticky Acknowedgement being always checked, to always unchecked? I have read a post on the internet that this is hard coded and you would have to change the source and recompile in order to accomplish this. http://article.gmane.org/gmane.network.nagios.user/54147 This post is very old...maybe this has changed in 3.2.3? This behavior cannot be changed without hacking cgi/cmd.c. It's a simple change: on lines 951 and 977 (this is Nagios 3.2.3), remove the CHECKED from that line. That will make the checkboxes default to *not* checked. I believe this is a much more sane default than checked. Benny -- You were doing well until everyone died. -- God, Futurama -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Why is check_openmanage so slow on PowerEdge R510?
At one time we had a battery that didn't finish charging for a week, called Dell and got a replacement battery. This was during a regular charge cycle. In your case I would give it a few more days. ... But, as we in fact did experience a case where the battery never finished charging I would advice against this. We just ignore the battery charge warnings unless they persist for days. It can be annoying, but we decided that we can live with it :) Trond, Is there anything in OMSA that tells how *long* a battery has been charging? I simply got so tired of the charging warnings that I blacklisted the bat_charge totally, but I'd still like to detect that type of error - where the battery never finishes charging. If OMSA has it, it would be great to have the option within check_openmanage to specify a length of time threshold for battery charging. :) Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Why is check_openmanage so slow on PowerEdge R510?
Unfortunately OMSA has no info on when the charge cycle is expected to be finished, or how long it has been in its current learn/charge state: # omreport storage battery controller=1 Battery 0 on Controller PERC 6/E Adapter (Slot 1) Controller PERC 6/E Adapter (Slot 1) ID: 0 Status: Non-Critical Name : Battery 0 State : Charging Recharge Count: Not Applicable Max Recharge Count: Not Applicable Predicted Capacity Status : Ready Learn State : Requested Next Learn Time : 0 hours Maximum Learn Delay : 7 days 0 hours Learn Mode: Auto I could make the plugin record it, but then I would violate my principle that the plugin should be stateless... Introducing state in the plugin complicates things. Hmmm, that's unfortunate that they don't track a duration or start time. :( And no, I fully agree - plugins should be stateless. Keeping track of state is an ugly, error-prone business. Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] pnp4nagios?
Anyone here using pnp4nagios for graphing? I'm having some configuration issues and wanted to see if there was someone who could assist? Sure, I know several of us use it. What issues are you having? Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] pnp4nagios?
I've configured pnp and it has been successfully graphing data since the actual installation. However, I have nagios perfdata logs from the past year (host_perfdata.log, service_perfdata.log) that are not being parsed that I want to be included. ... Is there a certain method, configuration I need to follow if I want to include this historical data? H... RRD databases expect to be updated in a sequential fashion, on regular-ish intervals. I'm not sure that you can go back and add the past data. I will defer to those on the list that are more familiar with RRD - is that even possible? Benny -- You were doing well until everyone died. -- God, Futurama -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NSClient++. Monitoring the devices behind the Firewall.
The question I have is the same of already reported in the link http://nsclient.com/nscp/discussion/topic/466#-1. The diagram and scenario is the same reported in the link http://nsclient.com/nscp/wiki/doc/usage/nagios/nrpe but with a second remote Firewall. Basically, I know how to configure a remote Windows computer with a fix TCP-IP address but I have no idea how to configure a remote Windows NSClient or an NRPE UNIX client installed behind a remote Firewall. The remote subnet has a NAT in this case and how the Nagios server can reach a remote client in this scenario? Any idea? Well, each of the clients behind the firewall needs to be individually addressable somehow. You can do this in several ways, here are two: 1) Assign ports on the firewall to NAT to the individual clients behind it. Ie, assign port 45000 to be NATed to client 1, port 5666. Assign port 45001 to be NATed to client 2, port 5666, etc. Then, on your Nagios server, use the IP of your firewall and the individual ports to communicate with the clients. 2) Assign multiple IPs to the firewall, and NAT each IP and port X (by default, 5666) to the clients behind it. If you're looking to do this without cooperation from the client and their security folks, you're going to run into problems. If they want you to monitor their hosts, they have to provide some manner of accessing them. In either of the examples above, I would strongly recommend that they assign firewall rules to allow connections to the clients' NSClient++ services *only* from your Nagios server. Don't leave those ports open to the unwashed masses. A VPN between your sites is also an option. Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NSClient++. Monitoring the devices behind the Firewall.
If you're looking to do this without cooperation from the client and their security folks, you're going to run into problems. If they want you to monitor their hosts, they have to provide some manner of accessing them. Just to be thorough, passive monitoring is also a possibility. In that case, each of the clients would be configured to send the service check results to the Nagios server, and would probably not require any changes to the firewall. However, I choose to use active monitoring, so I cannot help with that setup, nor would I necessarily recommend it. Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Verifying the email delivery completion
I had been successfuly using check_smtp to verify the SMTP service. Few days ago, one of our SMTP servers was still listening on 25 but messages where all rejected with a 451 error. (451 mail server temporarily rejected message (#4.3.0)) Is there any way to verify the email delivery completion? Check out check_email_delivery: http://exchange.nagios.org/directory/Plugins/Email-and-Groupware/check_email_delivery/details It will check email *delivery*, not just listening on a port - from email submission via SMTP through the reception of the same email to a mailbox via POP3 or IMAP, and will alert upon problems at any phase in the process. Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Availability Report
I need to generate availability for a service in a particular timeperiod. I have created one timeperiod in nagios from 06:00 to 22:00 every day. While creating availability report in report time period i am selecting that timeperiod but the report always generating from 00:00 to 24:00. Can anyone please help me on this. If you select a custom report period, you can select not only the days you want included, but the timeperiod. Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] A question on nagios group nagcmd
My question is, the docs say to create the group nagcmd and add nagios wwwrun to the group in order to allow external commands to be submitted thru the web interface. What external commands are we talking about here? Are we talking about the service commands from the check screen ( disable checks, schedule downtime, etc) . Is is safe to assume that if wwwrun was not in the nagcmd group baaa things will happen in the web console? Not anywhere near my system so I cant try to see what would happen. Any thoughts before I rewrite the central password system to put both a regular user a daemon user in the same file ? What regular user? nagios? Because really, from your description above, both nagios and wwwrun users should be daemon users, so you should be able to have them in the same file and avoid the problem. After all, the nagios user is the one that Nagios daemon will be running as... Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Ho to Deploy massively Nagios on more than 200 Windows servers?
Hi Gurus,I have to deploy Nagios plugins on more than 200 Windows servers. Configuration: Nagios Server runs Nagios3.06 on linux Centos 5.5.So i would like to know how to do it massively instead of server by server ?Thanks for your help. The same way as you deploy any other software to your more than 200 Windows servers... How do you apply your patches? How do you distribute other software? I use SCCM to do it on my network. You could also script it from one of the Windows machines, or any number of other methods. Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- What You Don't Know About Data Connectivity CAN Hurt You This paper provides an overview of data connectivity, details its effect on application quality, and explores various alternative solutions. http://p.sf.net/sfu/progress-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Error in performance-data-output
Could it be that this is a Windows issue, or perhaps NSClient++? Any NSClient++ users here who can confirm if this is the case? I'm thinking that perhaps the underscore character '_' is throwing off Windows or NSClient++. I use NSClient on hundreds of hosts, and I haven't noticed any issues with underscores yet... Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Free Software Download: Index, Search Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Which GUI to configure Nagios 3 ?
I know that there are nice GUI to configure NagiosÂ…which one do you know/use ? I'm a big, big fan of NConf. Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Free Software Download: Index, Search Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Hostgroups: if not contact for one host, none are available
this isn't a bug. by reviewing the source codes, you can find that Nagios(more precisely, the CGIs) just do this way. i have no clue why Nagios won't show partial hostgroups if one has no access to all host members. maybe for performance issue? If this behavior is intention, I'd love to know why... It seems utterly broken to me, and while I have no vote to cast, I'd love to see it changed. While this seems like just a logic thing, I'd have to dig into the code to see if I'm smart enough to come up with a diff. :) Any developers have an opinion/comment? Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Hostgroups: if not contact for one host, none are available
Hey folks, This has bitten me a few times now, so I figured I'd better report it... If I have hostgroup bob: host1 host2 host3 host4 host5 and contact frank is a contact for hosts 1, 2, 3, and 4 (but NOT 5), frank will not be able to view the *hostgroup*. It gives the usual It appears you do not have permissions ... error. *Surely* this can't be intentional, can it? Why the heck would you _want_ that behavior? I would expect it to display the hosts in the group (viewing a host you're a contact for will show all services, even if you're not a contact for all), or at worst just the members of the group the user is a contact for, but not deny access to the entire hostgroup. In my environment, I have accidentally added a host to the wrong hostgroup. When I do this and the users of the hostgroup aren't contacts for this new one that I misplaced, the users lose access to the entire hostgroup. Am I being dense, or is this a bug? 3.2.3 on RHEL 5.5, BTW. Thanks! Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Monitoring unmounted partition
Im having a problem with check_nrpe. Im monitoring a partition /mnt/2 f.e. If i dont have this partition mounted, it just returns the value of / witout sending any error. br br How can i get an alert when the partition isn`t mounted.br Oooof, plain text, please. I don't know that what you want is possible - if the partition isn't mounted, the OS can't read any information about it. Is this a local partition or a remote filesystem (ala NFS)? If it's remote, you might use the -X flag to check_disk to exclude any of the local filesystem types, so at least you'd get an error if it's not mounted instead of returning the information for /... Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] monitoring Windows 2008 event log?
Anybody know a good way to monitor Windows 2008 event logs? Steve Shipway's beta NagEventLog for win2k8 to run on my server http://www.steveshipway.org/software/nagevlog-setup-1.9.2.exe Any ideas would be most appreciated I found NagEventLog to be unreliable, and Steve stopped answering my questions. NSClient++ is very reliable, and I haven't looked back. Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- The modern datacenter depends on network connectivity to access resources and provide services. The best practices for maximizing a physical server's connectivity to a physical network are well understood - see how these rules translate into the virtual world? http://p.sf.net/sfu/oracle-sfdevnlfb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] monitoring Windows 2008 event log?
forgive my ignorance, but nsclient can check the event log? I wouldn't blame Steve, I think he had a baby not so long ago NSClient++ can, yes. *shrug* This was like a year ago or so... If he's busy, that's fine and understandable. Just *say* so, don't just ignore your users, especially when they're trying to point out problems. I'm not mad at the guy or anything, his software just wasn't usable for me. I have to correct myself - I use Consol.de's check_logfiles.exe for my event log stuff. My bad - I found the built-in NSClient++ eventlog stuff a bit cumbersome. http://labs.consol.de/lang/en/nagios/check_logfiles/ Lausser has been great in helping as well as adding features and fixing bugs. Sorry for the confusion with NSClient++... I use NSClient++ to execute check_logfiles.exe on the remote clients. Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- The modern datacenter depends on network connectivity to access resources and provide services. The best practices for maximizing a physical server's connectivity to a physical network are well understood - see how these rules translate into the virtual world? http://p.sf.net/sfu/oracle-sfdevnlfb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Monitoring temperatures on Cisco equipment
I think you misunderstand. Those two plugins return WARNING or CRITICAL if one of the two things occur: 1) If the ciscoEnvMonTemperatureState is not normal. 2) If the passed -w and -c values are less than ciscoEnvMonTemperatureStatusValue. What I'm asking is why #2 is _required_. I can understand it as an optional check if you want to override the device's defaults, but not as mandatory behavior. Cisco devices are smart and know when they're warm or hot. That's the purpose of the ciscoEnvMonTemperatureState. I'm just trying to find out why folks feel that overriding Cisco's defaults is necessary behavior. While I don't have any insider knowledge into *why* it is the way it is, I'll take a guess - most third-party plugins come into existence because they satisfied someone's specific needs. Perhaps the original author needed to further narrow the range of good-vs-bad, who knows? I'd say modify it to your needs. :) Benny (yes, *that* Benny, hi Jeffrey) -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Problem with check_openmanage
$ check_openmanage -H myserver -C public Power Supply 0 [AC] needs attention: Presence detected, Failure detected, AC lost You have a power supply #0, it is plugged in, but it has no AC input. Someone tripped over a cable. Voltage sensor 14 [PS 2 Voltage 2] is INTERNAL ERROR: Use of uninitialized value $reading in sprintf at /usr/lib/nagios/plugins/check_openmanage line 3565. Whoopsie, that looks like a bug in check_openmanage. Trond is excellent about fixing issues, I'd expect to hear from him shortly. Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Writing nrpe commands
I'm in need of using nrpe to get information from log files, and I'm stumped on where to find guidance on doing so. Any pointers to the right information or the right place to ask (if this isn't it)? I would take a look at the sample nrpe.cfg that comes with NRPE for examples of commands, and then check out consul.de's excellent check_logfiles plugin to do the work with the logs. Benny -- Hairy ape nads.-- Colleen, playing Neverwinter Nights -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] qpage - OT
It would have been nice to see your qpage.cf file... ;) That seems obvious, see below Be sure you have 'parity=even' in your config. When you run a test with verbose and interactive flags set, do you fail five or six times before you get that message? I've never tried the interactive flag, I will do so. As far as the failures go when I had the retry set to 20 it would to fail 5 times in a row and then reset the modem or something, I can't fully interpret the logs, and then retry again possible 20 times? as in 20 sets of 5. The interactive (-i) option seems to require a page to be sent right now. As of yet I have been unable to get a failure when sending a page manually but I think I've really only sent a small number 10-20 pages manually. The only times it has failed so far is when it's running in daemon mode. Do you guys use USB modems with qpage? These problems got much worse after switching to a USB modem. I didn't notice anything glaringly incorrect with your config... The reason I asked about parity is because I got the exact same error with Verizon and Sprint, except that qpage would decide that the page was not sent (when it had been), so it would retry five times (thereby sending five identical pages). That issue went away when I had a palm-forehead moment and added the 'parity=even' to my config. Benny -- I'm no meteorologist, but I'm pretty sure it's rainin' bitches! -- Cleveland, Family Guy -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] qpage - OT
qpage error: 502 MESSAGE REJECTED - STX OR EOT EXPECTED It would have been nice to see your qpage.cf file... ;) Be sure you have 'parity=even' in your config. When you run a test with verbose and interactive flags set, do you fail five or six times before you get that message? Benny -- I'm no meteorologist, but I'm pretty sure it's rainin' bitches! -- Cleveland, Family Guy -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] high latency
Yeah, for giggles I went back further through the archives last night and found stuff back to 2.x series, and not much has seemed to help. I killed some of my mis-behaving active checks, and that dropped to about 20 seconds, then went up to about 35-50. So while that's better, I have A LOT more hosts and service checks to add, and am afraid it'll go nuts when I dump more on. I think I've tried about all the config options I could find and some helped, some didn't seem to, but there should be plenty of horsepower on the machine to run this much faster so not sure why it's not. Hey Dan, I too have been wrestling alligators with service and host check latencies averaging around 60s, and increasing to 100+ (sometimes to 300) after a few reloads during the day. This morning, I enabled the use_large_installation_tweaks option. As of a minute ago, my host check latency is now averaging 2.116s, and service check latency is averaging 0.748s. I didn't see if you had tried this yet, it might be something to consider. Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Change Procs Critical threshold
From the help for check_load (which I'm assuming you're using in the command definition): Usage:check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15 So, in your service definition, you're telling check_load that you want to trigger a critical condition if the 15 minute average is 4.0. Yours is 4.06. So, yes, it's critical. :) WLOAD1 = 1 minute average warning threshold WLOAD5 = 5 minute average warning threshold WLOAD15 = 15 minute average warning threshold CLOAD1 = 1 minute average critical threshold CLOAD5 = 5 minute average critical threshold CLOAD15 = 15 minute average critical threshold If you want your 15 minute average to *not* trigger a critical, you need to adjust that last value (4.0) to something higher. Benny That sounds logical, and this is what I've adjusted: check_command check_local_load!8.0,5.0,4.0!12.0,7.0,6.0 but I've restarted the nagios process and the alert still persists. I dont see anything in nagios.log or /var/log/messages related to this either. what could I be missing? Kill your Nagios daemon. Now, kill the *other* Nagios daemon you have running. If you make changes to your config file and send Nagios a SIGHUP (or restart it) and the changes don't seem to stick, you might have multiple Nagios daemons running, one with an old config (that still thinks 4.00 is a critical threshold), while the new daemon is receiving the changes you mean to make. This is a common issue, and it's easy to fix. Shut down your daemon via whatever method you have (service nagios stop, pkill, etc). Then, wait 30 seconds or so to allow outstanding service checks to wrap up, and see if there are still Nagios processes hanging around. If they are, kill them too. Wait another 30 seconds, rinse and repeat until there are no more Nagios processes. At that point, restart Nagios. Do your changes take affect now? Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Change Procs Critical threshold
I had another question regarding adjusting these thresholds, this time on localhost. It regards the Current Load parameter, which is giving me a Critical Load average of -- 2.47, 3.43, and 4.06 in localhost.cfg, /usr/local/nagios/etc/objects/localhost.cfg, I have this define service{ use local-service ; Name of service template to use host_name localhost service_description Current Load check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0 } which I actually went and adjusted to : check_command check_local_load!7.0,4.0,3.0!10.0,6.0,4.0 I restarted the Nagios service..but this didn't have any effect -- the status information still reads the same -- Critical Load Average - 2.47, 3.43, 4.06 From the help for check_load (which I'm assuming you're using in the command definition): Usage:check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15 So, in your service definition, you're telling check_load that you want to trigger a critical condition if the 15 minute average is 4.0. Yours is 4.06. So, yes, it's critical. :) WLOAD1 = 1 minute average warning threshold WLOAD5 = 5 minute average warning threshold WLOAD15 = 15 minute average warning threshold CLOAD1 = 1 minute average critical threshold CLOAD5 = 5 minute average critical threshold CLOAD15 = 15 minute average critical threshold If you want your 15 minute average to *not* trigger a critical, you need to adjust that last value (4.0) to something higher. Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_openmanage -- question about battery check
the problem i'm having is that the check is reporting battery charging WARNINGS even though I'm blacklisting that check. === r...@nagios:/opt/plugins# perl ./check_openmanage-3.6.1 -H server1 -C public -e -s -i -b bat_charge No, you're not... Not quite, anyway. :) Re-visit the documentation for blacklisting - you need to specify *which* battery you're blacklisting. This is the case for all blacklist directives. Hint: I use '-b bat_charge=ALL' in my service definition. Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Change Procs Critical threshold
I have a couple of systems that are reporting critical notifications, that when you drill into them, the Service : Total Processes has been triggered. Its showing critical process level of 231, 453, for example. Which on a production server is nothing really --- my question is, how do I change that threshold level to something like 750, or 1500? Perhaps I'm not searching online for the correct term or using the right parlance. Any ideas where this can be modified? If you have a look at the options for the plugin (hint: ./plugin --help), they will reveal their secrets to you. :) Then, using that knowledge, you can adjust your service check to use the appropriate values for your environment. Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- The Next 800 Companies to Lead America's Growth: New Video Whitepaper David G. Thomson, author of the best-selling book Blueprint to a Billion shares his insights and actions to help propel your business during the next growth cycle. Listen Now! http://p.sf.net/sfu/SAP-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] host_port objects - Enhancement Request
parent_service is actually a pretty good idea, I hadn't thought of that. I very, very much wish Nagios had the concept of parent_services (notice the plural, no reason not to give a parent service the same multi-parent capabilities as a regular parent host scenario). That way, you could get finer-grained control (it's just the switch port) and deal with multi-homed hosts with that same level of granularity (host A depends on switch 1 port 6 AND switch 2 port 12). Oooo, that would be nice... Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Scheduled checks falling far behind
OK, well, I hope I'm not embarrassing myself with this. It's a perl script and uses Ton Voon's nifty Nagios::Plugins module. I run checks against things I want to know about. Thinking about it, I guess it would be nice to have the failed hosts/services check alert on percentage of failures. Maybe someday. Fantastic... Thanks, Mark! I'll take a look at this when I have a bit of time. Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Scheduled checks falling far behind
You can also run, if memory serves, the nagiostats command located in your Nagios bin directory to see this information as well. I actually use that nagiostats data in a custom check and graph a lot of those latencies and other Nagios performance related info. Boy, would I *love* to see your method for that! I personally hacked the source of nagiostats to create a custom plugin, but it's a horrible, horrible hack and I'd like to see a cleaner, more scalable method. Can you share? Benny -- No matter how many shorts we have in the system, my guards will be instructed to treat every surveillance camera malfunction as a full-scale emergency. -- Peter Anspach's Evil Overlord List, #67 -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] FTP Server is shown alive, but Nagios says DOWN!
The check_command in the *host* definition, not the service definition. Is this a Windows 2008 FTP server? If so, Win2008 disables pings out of the box, and a custom rule must be added to the Windows Firewall to allow ICMP ECHOREQ. Or use a check_command that would indeed show the correct status of the host. Do *not* blindly disable the Windows Firewall to fix this, if this is what's going on. Benny Check_command is set to check_ftp and that is shown as UP, can't figure out why Nagios is showing ftp server as DOWN!! Please see the configuration file below. # Define a service to check host alive the remote machine define service{ use local-service ; Name of service template to use host_name ftpsrv service_description Check FTP Service check_command check_ftp! } To which service check can we change to, so that Nagios doesn't show an UP server as DOWN. Regards Anth. -Original Message- From: Pete Dewell [mailto:p...@stuff-done.co.uk] Sent: 28 September 2010 14:49 To: Nagios Users List Subject: Re: [Nagios-users] FTP Server is shown alive, but Nagios says DOWN! The host up/down status will be defined by the check_command in the host definition. What is this set to ? I would guess that it's been set to check something that the host doesn't repond to. P On 28/09/2010 09:16, Bram Gillemon wrote: Does the server reply to ping checks? As far as i know the server show up if the server reply's to pings. Kr, Bram Gillemon On 28 Sep 2010, at 08:30, IT Toonz wrote: 2 queries Please see the two reports!! How can we rectify this error? FTP server is not down, please see the times of checking.. What is telling Nagios, FTP srv is down? Please advice. image001.jpgimage002.jpg # Define a service to check host alive the remote machine define service{ use local-service ; Name of service template to use host_name ftpsrv service_description Check FTP Service check_command check_ftp! } Please see the below config, we want to see the free disk space in home directory. But seems it's not working # Define a service to check the disk space of the home partition define service{ use local-service ; Name of service template to use host_name ftpsrv service_description Home Partition check_command check_local_disk!20%!10%!/home } We are using FAN 2.0 NagiosR 3.0.6 Thanks and regards Anth. -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Pete Dewell | Stuff Done p...@stuff-done.co.uk ** The information contained in this message, including any attachment, is confidential and may be privileged or otherwise protected from disclosure. The information is intended only for the person or entity to which it is addressed. If you are not the intended recipient, please contact the sender and delete this message from your system. Any review, re-transmission, distribution or other use of this information by persons or entities other than the intended recipient is prohibited. * -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ Nagios-users mailing list
Re: [Nagios-users] Acquiring data from a log
I need to acquire data from a log, parse if and, if a particular condition is met, notify the problem, otherwise just have on nagios the condition. I.e. if number of record (parsed from log) exceeds 1000, trigger an email, otherwise just have on nagios web page Processed 123 records I have written a script that extracts and parse the data, so I can do inside it Wow. Check out Consul Labs' check_logfiles plugin. Don't re-invent the wheel if you can avoid it. if [ $RESULT = 0 ] then echo No records processed exit 0 elif [ $RESULT -gt $THRESOLD ] then echo CRITICAL: $RESULT records processed exit 2 else echo NORMAL: $RESULT records processed exit 0 fi where RESULT is the result of the computation and $THRESOLD is passed on command line. I have the following questions: 1) How can I see whether the command is run? 2) How can I see which results are passed to nagios? 3) How can I pass both exit values and the string echoed? #3: echo This is output that will show up in the Nagios web UI exit 3 Benny -- Because you have arms like noodles, while I am vigorous and burly. -- Hodgins, Bones -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null