Re: [Nagios-users] Is it possible to recieve a single global notification for all checks?
Alex Flex wrote: Hello. Thank you for this,definately it looks like the solution for me.. although mk-livestatus looks much unknown . Alex Maybe Nagios BPI will also work for you. http://exchange.nagios.org/directory/Addons/Components/Nagios-Business-Process-Intelligence-%28BPI%29/details -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Running a plugin at specific times
On 24 August 2012 22:10, Tech Support supp...@voipbusiness.us mailto:supp...@voipbusiness.us wrote: Hello; I am fairly new to Nagios, and this is my first project using it. What I would like to do is run a plugin at specific times of the day. This particular plugin is pretty intrusive, so I would like to run it only at 7:00am and 7:00pm daily. Is there an easy way of doing this? I’m thinking that I can run the script out of CRON, then passively send the data to Nagios via its command pipe, but I’m not sure if that’s the best way to go. Stu Watts wrote: Nagios does time periods itself, so no need for cron: http://nagios.sourceforge.net/docs/nagioscore/3/en/timeperiods.html The Nagios documentation is pretty good - have a check through. Chance are it can do what you want.. ;-) I don't think setting time periods will ensure that a check is run at specific times. Best they can do is specify time periods in which they may run. Using cron may be the way to go. -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Service and host notifications: best practise
Keith Edmunds wrote: Escalations are your friend. Thanks for the quick and helpful response. Unless I've misunderstood, we would need to configure a service escalation for each service and each of the host groups - is that right? What we really need it notification rather than escalation, although I realise we can use escalations in a similar way to notifications. I'd like to be able to say, If any service fails on any host in hostgroup A, notify these people. Thanks, Keith The only way I can think of to do this is to use a template for services that belong to a particular hostgroup: define service{ namehostgroup_A_service register0 contactsmanagerA ... } Have all the services on hosts in hostgroup A use that template. Note that using the 'contacts' directive won't override any contacts that you specify with the contact_groups directive in other templates. I use 'contacts' to specify contacts in addition to the default ones that I specify with 'contact_groups'. So you can specify the IT_team as a contact_group for all services and hosts and use 'contacts' to specify the manager for particular services. Hope this helps, Paul Dubuc -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] [Nagios-devel] timeperiod definition for election day?
Jochen Bern wrote: Am I missing something, or would this On 12/06/2011 12:44 AM, Andreas Ericsson wrote: On 12/05/2011 09:31 PM, Paul M. Dubuc wrote: For example, election day in the U.S. is on the 1st Tuesday after the 1st Monday of November. be equivalent to the Tuesday between 02-Nov and 08-Nov, which, in turn, I couldn't even imagine what the syntax would look like to support it should (!) be equivalent to define timeperiod { timeperiod_name Election Day alias Shouldnt you be out there voting for someone november 2 - 8 00:00-24:00 exclude AllButTuesdays } define timeperiod { timeperiod_name AllButTuesdays alias Everyone can hate MONDAYS ... sunday 00:00-24:00 monday 00:00-24:00 wednesday 00:00-24:00 thursday00:00-24:00 friday 00:00-24:00 saturday00:00-24:00 } ? Kind regards, J. Bern Amazing. Thanks! But until the problem with the 'exclude' directive is fixed (see the known issue under 3.2.0 - 08/12/2009 at http://www.nagios.org/projects/nagioscore/history/core-3x), we might want to do it this way: define timeperiod { timeperiod_name Election Day alias Shouldnt you be out there voting for someone november 2 - 8 00:00-24:00 use AllButTuesdays } define timeperiod { nameAllButTuesdays # so 'use' will work above timeperiod_name AllButTuesdays alias Everyone can hate MONDAYS ... sunday 00:00-00:00 monday 00:00-00:00 wednesday 00:00-00:00 thursday00:00-00:00 friday 00:00-00:00 saturday00:00-00:00 } Do you think this will also work? -- Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] timeperiod definition for election day?
I didn't see this in the documentation, but I wonder if there is a way to specify a timeperiod for the first weekday after another weekday. For example, election day in the U.S. is on the 1st Tuesday after the 1st Monday of November. We have a similar need do define a timeperiod for the 1st Sunday after the 1st Saturday of every month. Must we do this by entering all the specific dates for these in the coming year(s), or is there a simpler, no maintenance way of doing it? Thanks, Paul Dubuc -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] disabling e-mail notifications for nagiosadmin account
Kaplan, Andrew H. wrote: Hi there -- We are running Nagios 3.3.1, and have a two contacts set up for the e-mail notifications. One of the contacts is the nagiosadmin user. This is the user account that was first setup during the initial installation of the application. When the account was set up it was configured with the e-mail address of one of our network administrators. A second account was set up that was based on the administrator's login account along with his e-mail address. When notifications are sent out, he gets two notifications for each event due to both contacts having the same e-mail address. We want to prevent the e-mail notifications being sent to the nagiosadmin account with the administrator getting only one notification per event as the intended result. One thought was to set up a dummy account on the Nagios server as a solution, and another idea was to set up a flag in the contacts.cfg file, but we are not sure what the would be the correct syntax for the latter. What would be the best solution here? Thanks. nagiosadmin doesn't need to be a notification contact. You can remove it from any contact lists in your contacts.cfg. If you want that user to still be able to see everything and run all commands from the Nagios display you can put it in the authorized_for_* lists in your cgi.cfg if its not already there. Paul Dubuc -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] us-holidays timeperiod error
I apologize if this has already been caught and fixed, but I just noticed that the timeperiods.cfg file that comes with Nagios 3.2.3 has an error in the us-holidays timeperiod: thursday -1 november00:00-00:00 ; Thanksgiving (last Thursday in November) should be thursday 4 november 00:00-00:00 ; Thanksgiving (4th Thursday in November) November 2012 has 5 Thursdays. -- RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] escalations question
Michael Barrett wrote: I was wondering - if a contact is only set to receive critical alerts, and via escalations the service is only set to contact that contact with a first_notification set as 3, what could cause that contact to get notified at the first notification? If the service has been in a warning state for a while (more than 3 notifications, but none of them going to the critical only receiving contact since they aren't configured to get warnings) do those notifications count towards the first_notification count? Yes. Each time Nagios generates a notification for any state, the notification count is incremented. After the recovery (OK state) notification is sent, the count is reset to 0. I thought we had a pretty cool setup going where our secondary pager would only be notified if the service went critical and only after it's third critical notification - but this morning both the primary pager secondary pager were notified at the same time for a disk space issue that had been in a warning state for a few hours and then went to critical. All the warning notifications incremented the count, so the count was greater than 3 when the the service went critical. There is no way to specify the 3rd CRITICAL notification with escalations. Notification counts do not take the state into account. Is there anyway to get that sort of setup working btw? You might re-think why you want to do this. If there has been a problem at the warning level for 2 or more notification intervals without it being acknowledged (which stops notifications) or fixed, maybe your secondary contact should be notified anyway when the critical threshold is exceeded. If you really want it to work the way you describe then the best solution I can think of is to have 2 separate services with different contacts. One that issues only warnings and the other only critical problems. But then you've doubled the number of checks you are doing for the same problem. Paul Dubuc -- The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] howto ignore down hosts
On Sep 28, 2011, at 10:37 AM, Albrecht Dreß wrote: Hi, a dumb question - is it possible to ignore hosts which are down, i.e. no messages are sent if the machine is down, or goes up again, and no service checks are performed while the machine is down? When the box comes up again, the service checks should be run soon if possible. This would be nice for boxes which are down regularly (but not according to a pre-defined schedule), but have some services which shall be monitored, without sending too many mails to the person in charge for it... I'm running Nagios 3.2.3, self-compiled on Ubuntu 8.04, if that matters. Thanks in advance, Albrecht. Michael Barrett wrote: In your host definition set the notification_options so that it doesn't notify you when hosts go down/recover: notification_options d,u,r (remove the d r) Service checks would still be run when the host is down though. If you don't want them to run then I think you need to define a servicedependency for those services, making them dependent on a master service that monitors the host state and setting the execution_failure_criteria to the failure state of the master service. -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Service Availability Report Question
I have a question about the Service Availability Report in Nagios 3.2.3 to which I can't find an answer in any documentation: Under the Service State Breakdowns there is a colored horizontal bar which dotted lines and colored segments. The colored segments represent the duration of the corresponding state changes over the specified time period. What do the dotted lines represent? Most of the time they appear at the beginning of colored, non-OK state segment, but I see some that appear alone. Thanks, Paul Dubuc -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Hostgroup Members
Brandon Phelps wrote: Thanks Dan. I was aware of the hostgroups directive in the host {} block, but for some reason my brain never connected the dots. In that case, does anyone know when support was added for host { hostgroups = ... }, or simply whether or not it is available in version 1.4? I have googled a bit but can't seem to find the online manual for 1.4.x. Thanks again, Brandon I don't know when it was added, but could you try it and see if it works? If nagios 1.4 supports the -v option it would tell you if the hostgroups directive isn't recognized. -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] servicedependency not working properly
Steve Glasser wrote: Hi list, We often have nagios checks time out when servers are under heavy load. One check tests nrpe, if that fails or times out I want notifications for other services on the same host to be suppressed. To do this I am using servicedepenency. Looking at nagios logs I can see that all other checks, both nrpe and remote, are running before test_nrpe. That means, at least for the first cycle of failed checks, that notifications for all services will be sent. Is it possible to control the order in which nagios checks run? Or am I just doing something wrong? Please see sample config below: define servicedependency { host_name vm-foo2 service_description test_nrpe dependent_host_name vm-foo2 dependent_service_description nrpe_check_load,nrpe_check_ntp_time,nrpe_check_root,nrpe_check_swap,nrpe_check_ro_mounts notification_failure_criteria c,u execution_failure_criteria n } Thanks, I think the problem is that you have 'n' set for the execution_failure_criteria. That means the dependent services will always be checked. Try setting this to 'c,u' instead (same as notification_failure_critera) From the documentation: http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html#servicedependency execution_failure_criteria: This directive is used to specify the criteria that determine when the dependent service should not be actively checked. If the master service is in one of the failure states we specify, the dependent service will not be actively checked. Valid options are a combination of one or more of the following (multiple options are separated with commas): o = fail on an OK state, w = fail on a WARNING state, u = fail on an UNKNOWN state, c = fail on a CRITICAL state, and p = fail on a pending state (e.g. the service has not yet been checked). If you specify n (none) as an option, the execution dependency will never fail and checks of the dependent service will always be actively checked (if other conditions allow for it to be). Example: If you specify o,c,u in this field, the dependent service will not be actively checked if the master service is in either an OK, a CRITICAL, or an UNKNOWN state. -- Special Offer -- Download ArcSight Logger for FREE! Finally, a world-class log management solution at an even better price-free! And you'll get a free Love Thy Logs t-shirt when you download Logger. Secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsisghtdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] $NOTIFICATIONRECIPIENTS$ macro contents inaccurate
Nagios 3.2.3 documentation says the notification macro: $NOTIFICATIONRECIPIENTS$ is A comma-separated list of the short names of all contacts that are being notified about the host or service. Instead this macro contains all contacts for the host or service regardless of whether the particular notification is actually being sent to them. I have one contact for all services that only gets CRITICAL (c) notifications according to its service_notification_options setting, but the $NOTIFICATIONRECIPIENTS$ macro includes this contact along with others that get WARNING notifications when the WARNING notification is sent. This would imply that the critical only contact also got the notification but this isn't true. Paul Dubuc -- uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Bug?: Custom notifications ignore contact notification options
I just discovered what looks like a possible bug in the Send custom host/service notification command in Nagios 3.2.3. When I use this command to send an OK status notification, it goes to all contacts, even ones that are only supposed to receive CRITICAL notifications. The command description says that Custom notifications normally follow the regular notification logic in Nagios. Selecting the Forced option will force the notification to be sent out, regardless of the time restrictions, whether or not notifications are enabled, etc. Selecting the Broadcast option causes the notification to be sent out to all normal (non-escalated) and escalated contacts. These options allow you to override the normal notification logic if you need to get an important message out. I didn't use either the Forced or Broadcast option and the OK status notification goes to contacts that have only 'c' (critical) or 'd' (down) for their service/host_notification_options and escalation_options. Is this a bug? Seems like this should only happen if the Broadcast option is checked. -- Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] different escalation intervals possible?
Michael Barrett wrote: Hi - I am trying to get alerts for my services to work like this: - all alerts (warning, critical, unknown, and recovery) go to the 'email-ops' contact every 2 hours - on some services (the ones deemed critical) in addition I want them to send an email to the 'primary-pager' contact every 15 minutes I thought I had the configuration setup appropriately for this, but now I'm not sure it's possible without a better understanding of how escalations work (and it may not even be possible then) since I read this about overlapping escalations: Since it is possible to have overlapping escalation definitions for a particular hostgroup or service, and the fact that a host can be a member of multiple hostgroups, Nagios has to make a decision on what to do as far as the notification interval is concerned when escalation definitions overlap. In any case where there are multiple valid escalation definitions for a particular notification, Nagios will choose the smallest notification interval. Anyway, is there anyway to make that work? The way its working now is that it seems to email the email-ops list every 15 minutes on critical services, and for email we'd like to get less alerts. Thanks in advance! I don't think this can be done with escalations. If Assaf has a way to do it, I'd be very interested. Like the documentation says, any notification that matches multiple escalations can only have one notification interval and it chooses the smallest. The way we get around the problem is to put a wrapper around the email notification command so that it only sends the first notification of a state change. This is what it looks like: define command{ command_namenotify-host-by-email command_line\ if [ $HOSTNOTIFICATIONNUMBER$ -le 1 -o $HOSTSTATEID$ -ne $LASTHOSTSTATEID$ ]$USER9$ then \ /usr/bin/printf %b\n\n-- \ * Nagios *\n\n\ Notification Type: $NOTIFICATIONTYPE$ Number: $HOSTNOTIFICATIONNUMBER$\n\n\ host=$HOSTNAME$\n\n\ Host: $HOSTNAME$\n\ Address: $HOSTADDRESS$\n\ State: $HOSTSTATE$\n\ Last State: $LASTHOSTSTATE$\n\ Info: $HOSTOUTPUT$\n\n\ $LONGHOSTOUTPUT$\n\n\ Date/Time: $LONGDATETIME$\n\n\ Comment: $NOTIFICATIONCOMMENT$ \ | /usr/bin/mail -r nagios -s \ ** Nagios $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ ** $CONTACTEMAIL$ \ $USER9$\ fi } The $USER9$ macro is defined as a semicolon ';' to keep it from being interpreted as the start of a comment. The command for service e-mail notifications looks similar. So, for e-mail notifications, it doesn't matter what the interval is. Only one e-mail will actually be sent per state change. -- AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on Lean Startup Secrets Revealed. This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_interval or normal_check_interval ?
Malcolm Cowe wrote: Hello All, I have a quick question arising from a discrepancy between the Nagios 3 documentation and the service templates supplied with the distribution. When defining services, should one use check_interval or normal_check_interval? I'm currently using Nagios 3.1.0 but will likely be upgrading to the latest release in the near future. They are equivalent. According to Barth, check_interval was introduced in 3.0 as an alternative to normal_check_interval. They mean the same thing. -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Wildcards in service escalations query
Mohit Chawla wrote: Just tried this: added all hosts to the host_name field, except the ones which don't have any services associated, and it works. So yeah, using the * wildcard with !hostx doesn't work. But clearly, this is not ideal, since I have had to add around 350 hosts in the host_name directive. I agree. It would be nice if the serviceescalation definition would automatically exclude hosts which don't have services specified by its service_description. Instead of adding all those host names there, you could use a host group as I described here: http://sourceforge.net/mailarchive/message.php?msg_id=27615125 It's a little more work initially, but it's easier to maintain, I think. You won't have to remember to change the escalation every time you add a host. It's easier to include a host in the hostgroup you use for the escalation when you define the host. -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Wildcards in service escalations query
Mohit Chawla wrote: Hi, If we have: define serviceescalation { host_name * service_description* ... } , then, if there is no service associated with a host, this definition will be regarded invalid. But what about if a particular service is not associated with any host ? Will it fail in that case as well ? I was able to find hosts which don't have any services defined, and I used: define serviceescalation { host_name *, !foo.com, !bar.com service_description* } , where foo and bar are the hosts with no services defined. But I still get 'could not expand services ' error on this escalation definition. Any clues ? As long as any hosts that match the host_name directive have no services defined, you will get this error. The escalation apparently wants to have host/service pairs. It's a service escalation and all services must be assigned to a host. It doesn't automatically discard hosts that have no services. To get around this you can use a hostgroup that contains only hosts with services assigned. I've given an example here: http://sourceforge.net/mailarchive/message.php?msg_id=27615125 -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Wildcards in service escalations query
Mohit Chawla wrote: Hi, On Tue, Jul 5, 2011 at 11:55 PM, Paul M. Dubucw...@paul.dubuc.org wrote: As long as any hosts that match the host_name directive have no services defined, you will get this error. The escalation apparently wants to have host/service pairs. It's a service escalation and all services must be assigned to a host. It doesn't automatically discard hosts that have no services. But as you can see in the above config I posted, I am explicitly excluding those hosts which do not have any services associated with them ( foo.com and bar.com ). Hence, the config should be valid. Unless ofcourse: host_name *. !host1, !host2 is not the right way to include all hosts except host1 and host2 or some other bad logic. It could be that the exclusion (!) doesn't work when combined with the * wildcard in that way. It's equivalent to host1, host2, ... hostN, !host1, !host2. Try putting the wildcard at the end of the list and see if that works. Also, make sure that the hosts you exclude are really the only ones that have no services. Nagios will put warnings in the log file about hosts with no services assigned after it is restarted. You can look there for any you might have missed. Paul Dubuc -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Getting re-notified while in a HARD state
Frank Bulk wrote: I have a few existing and self-developed plugins that output details of the HARD state: CRITICAL: critical 1, warning 1 Detail 1 Detail 2 What I'd like to do is to be able to be re-notified if, while in the HARD state, the number and/or details change. For example, if the above would go to: CRITICAL: critical 2, warning 1 Detail 1 Detail 2 Detail 3 Anyone have an approach that works? The documentation doesn't indicate it's possible, but I'm sure others have encountered this before and perhaps they've worked through a solution. Kind regards, Frank I don't think there's a simple way to do this without having your notification command store the value of the $SERVICEOUTPUT$ macro for the host + service for comparison on the next try. Then you would have to set is_volatile on the service and have the notification command suppress the notification if the $SERVICEOUTPUT$ doesn't change. Another thing you can do is tell Nagios to log the hard state status when only the $SERVICEOUTPUT$ changes by setting the stalking_options in the service. Then, if you have something that watching the log file you can trigger notifications with that. If only this state stalking feature would have an option to send notifications in addition to logging you would be set. Hope this helps, Paul Dubuc -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Expand service group error in 43 line test config (why?)
First, your service has no service_description specified. This is required. Second, your serviceescalation must include the host_name that the service is assigned to. Add the line: host_name admin.qa and it will work. You can also use a hostgroup_name instead of a host_name, but every host you specify must have a service with a service_description that matches that specified in the escalation. See the documentation for details: http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html Eric B. wrote: This has me stumped. I whittled my ugly config down to 35 lines, and was still able to re-create the error. Any ideas what is wrong? I'm running Nagios Core v. 3.2.3. Much thanks in advance! -Eric Error is: Error: Could not expand servicegroups specified in service escalation (config file '/home/opsmon/etc/nagios/objects/qbo/foo.cfg', starting on line 13) Error processing object config files! Here's the config: define servicegroup { servicegroup_name group-1 alias All Services register0 } define contact { contact_nameprimary-oncall alias Primary Oncall email f...@bar.com mailto:f...@bar.com } define serviceescalation { servicegroup_name group-1 first_notification 1 last_notification 6 notification_interval 5 contactsprimary-oncall } define service { servicegroups group-1 host_name admin.qa http://admin.qa check_command check_foo } define host { host_name admin.qa http://admin.qa address 127.0.0.1 } define command { command_name check_foo command_line /bin/true } -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Expand service group error in 43 line test config (why?)
Eric B. wrote: My real problem I think is that I 'whittled' it down in the wrong way (thanks for the help, everyone). Below is what I was hoping to do, but realize that b/c I HAVE to define a host w/ the escalation, I have to retool how my monster config is done (which will really suck). Here's what I was hoping to accomplish: 1) Create a generic service template that all service checks inherit that adds them to the 'all-services' group. 2) Create escalation rules that apply to the 'all-services' group. This worked (basically a more complicated example of the config I gave) until I added a 'all-services-foo' group (same method mentioned in #1 and #2) with different escalations. From a design perspective, I know Nagios does a great job w/ templating, and object inheritance, but it really sucks that I have to specify a host; that just increased the amount of objects easily by an order or so of magnitude. I don't see why. All services have to be assigned to hosts anyway. You can specify a comma separated list of hosts in your escalation or use hostgroups. I think you only need 2 additional objects to do what you want: A hostgroup that consists of all hosts with services assigned and a host template to assign hosts to that group. There's an example that might help here: http://sourceforge.net/mailarchive/message.php?msg_id=27615125 -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] [Nagios-devel] Nagios retries checks too soon.
Jochen Bern wrote: On 06/09/2011 08:14 PM, Paul M. Dubuc wrote: Andreas Ericsson wrote: I'm not sure. I'm also not sure which behaviour is intended. Arguably, either is correct and Nagios is doing one of two right things. I'm not sure. If a test times out and Nagios tries again 10 seconds later instead of the 60 seconds specified, that could cause problems; load related problems when you have many of these tests running and timing out and problems for the system under test not having sufficient time to recover before the next check is done. True, but *if* someone has the latter kind of problem, I'd expect him to keep it in mind while writing the configuration, too. IIRC, the actual code adds check_interval/retry_interval to the variable that holds the (previous) scheduled check time - i.e., the time when the previous check supposedly was *started* (assuming negligible check latency). Configuring a retry_interval of one minute for a service whose sustained request rate may be *less* than one per minute sounds dubitable to me. (And I'm a firm nonbeliever in Unix-ish load figures, as opposed to actual CPU usage etc., but that's a different rant.) Kind regards, J. Bern Thanks for this explanation. It helps quite a bit. The checks we run normally take 5 - 15 seconds to complete, but we allow a much longer value for timeout. I was under the impression that the retry interval was only counted from the time the previous check completes and the status (which is needed to determine if a retry is necessary) is known. Why is the retry time determined before it's know that one is needed? It looks like checks that have longer timeouts need to have longer retry intervals to compensate for the worst case. That's not intuitive to me, but I can live with it. Paul Dubuc -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Nagios retries checks too soon.
Running Nagios 3.2.3, here is an example from the log that shows Nagios retrying a failed check after only 10 seconds. The normal check interval is 7.5 minutes, retry interval is 1 minute, max. check attempts is 3. Note that this test has a timeout of 130 seconds, so it's been running for over 2 minutes when it times out. Does Nagios do retries sooner when the timeout for a check is longer than the retry interval? Is the retry interval measured from the time the previous check starts, or from the time it ends? [06-09-2011 09:16:14] SERVICE ALERT: APS-P55;LoginPage;CRITICAL;SOFT;1;logintest CRITICAL - Timeout (130 sec.) reached [06-09-2011 09:16:24] SERVICE ALERT: APS-P55;LoginPage;OK;SOFT;2;logintest OK -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] [Nagios-devel] Nagios retries checks too soon.
Andreas Ericsson wrote: On 06/09/2011 03:43 PM, Paul M. Dubuc wrote: Running Nagios 3.2.3, here is an example from the log that shows Nagios retrying a failed check after only 10 seconds. The normal check interval is 7.5 minutes, retry interval is 1 minute, max. check attempts is 3. Note that this test has a timeout of 130 seconds, so it's been running for over 2 minutes when it times out. Does Nagios do retries sooner when the timeout for a check is longer than the retry interval? Is the retry interval measured from the time the previous check starts, or from the time it ends? I'm not sure. I'm also not sure which behaviour is intended. Arguably, either is correct and Nagios is doing one of two right things. I'm not sure. If a test times out and Nagios tries again 10 seconds later instead of the 60 seconds specified, that could cause problems; load related problems when you have many of these tests running and timing out and problems for the system under test not having sufficient time to recover before the next check is done. Paul Dubuc -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] service escalation on all services of all hosts
Michael Barrett wrote: Hi, I'm having a problem with an example given in the Tips Tricks documentation page. Currently I'm running: Nagios Core 3.2.0 Anyway, the tip I'm trying is from here http://nagios.sourceforge.net/docs/nagioscore/3/en/objecttricks.html#serviceescalation The particular tip reads: All Services On Same Host: If you want to create service escalations for all services assigned to a particular host, you can use a wildcard in the service_description directive. The definition below would create a service escalation for all services on host HOST1. All the instances of the service escalation would be identical (i.e. have the same contact groups, notification interval, etc.). If you feel like being particularly adventurous, you can specify a wildcard in both the host_name and service_description directives. Doing so would create a service escalation for all services that you've defined in your configuration files. ## So I tried the following: define serviceescalation { nameemail-all first_notification 1 last_notification 0 notification_interval 120 contact_groups ops-group register 0 } define serviceescalation { use email-all host_name * service_description * } And when I go to restart nagios I get the following: Error: Could not expand hostgroups and/or hosts specified in service (config file '/etc/nagios3/conf.d/services.cfg', starting on line 34) Error processing object config files! Anyone know why this is a problem? Am I missing something in the documentation, or is it just incorrect? You probably have some hosts that have no services assigned. Using the wildcard for both host_name and service_description will not work in that case, unfortunately. All hosts specified MUST have a service that matches the given service_description or you will get this error. Paul Dubuc -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] service escalation on all services of all hosts
Michael Barrett wrote: Ahh, ok, that would explain it. That's a bummer. Thanks. One way around this is to create a host group where you put hosts that have services assigned: define hostgroup{ hostgroup_name ServiceHosts alias Hosts with Services Assigned register0 ; hide this hostgroup unless you want it displayed. } Then use a template that assigns hosts to this group: define host{ nameservice-host register0 ; this is a template hostgroups +ServiceHosts; add host to this hostgroup } Make sure every host definition that has services assigned has the use service-host directive in it (or uses a template that does). Alternatively you can just assign the services to the ServiceHosts group in the service definitions instead of using this host template. Then you can define your escalation this way: define serviceescalation { use email-all hostgroup_name ServiceHosts service_description * } Hope this helps. Paul Dubuc On Jun 7, 2011, at 10:29 AM, Paul M. Dubuc wrote: Michael Barrett wrote: Hi, I'm having a problem with an example given in the Tips Tricks documentation page. Currently I'm running: Nagios Core 3.2.0 Anyway, the tip I'm trying is from here http://nagios.sourceforge.net/docs/nagioscore/3/en/objecttricks.html#serviceescalation The particular tip reads: All Services On Same Host: If you want to create service escalations for all services assigned to a particular host, you can use a wildcard in the service_description directive. The definition below would create a service escalation for all services on host HOST1. All the instances of the service escalation would be identical (i.e. have the same contact groups, notification interval, etc.). If you feel like being particularly adventurous, you can specify a wildcard in both the host_name and service_description directives. Doing so would create a service escalation for all services that you've defined in your configuration files. ## So I tried the following: define serviceescalation { nameemail-all first_notification 1 last_notification 0 notification_interval 120 contact_groups ops-group register 0 } define serviceescalation { use email-all host_name * service_description * } And when I go to restart nagios I get the following: Error: Could not expand hostgroups and/or hosts specified in service (config file '/etc/nagios3/conf.d/services.cfg', starting on line 34) Error processing object config files! Anyone know why this is a problem? Am I missing something in the documentation, or is it just incorrect? You probably have some hosts that have no services assigned. Using the wildcard for both host_name and service_description will not work in that case, unfortunately. All hosts specified MUST have a service that matches the given service_description or you will get this error. Paul Dubuc -- Michael Barrett lok...@gmail.com -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Scheduled downtime and host checks
Jeffrey Watts wrote: On Wed, Jun 1, 2011 at 1:27 AM, Kumar, Ashish xml.de...@gmail.com mailto:xml.de...@gmail.com wrote: No, scheduled downtime only affects notifications, and the stats you see in the availability cgi. Service and host checks run as normal during scheduled downtime. Thanks Jim for the explanation but I do not see any rational reason to execute host and service checks while the monitored host is scheduled for fixed downtime. There are plenty of rational reasons. Just because you disagree with the default behavior doesn't mean it's irrational. Many, many, many times I put systems into scheduled, fixed downtime and still want checks to be executed. For example, if I know the netadmins are going to be reconfiguring networking at one of our datacenters I will schedule fixed downtime for the period of their maintenance for the servers/switches/routers affected. However, I do want to see what's up and down during that time so I can tell when they start and finish their work, and what they're affecting. That's a perfectly rational reason to do checks during maintenance. This is useful because it allows you to check the stats of those hosts and services are ok before the scheduled downtime period ends. But if the host/services are offline after the scheduled fixed downtime period ends it will send the notifications anyway (or would it not?) I wish there was a way to disable active checks while a host has scheduled downtime set. If the hosts and services are down after the downtime ends yes it will send notifications, as clearly either: 1) The maintenance window wasn't long enough. 2) Someone broke something, or something died for another reason during maintenance Sounds like proper behavior. As far as your question goes, you can disable active checks manually, or you can write a script that sets downtime and disables active checks at the same time. You could then run it (manually or via 'at' or something else) to re-enable active checks. Or hack the Nagios source code and add that option yourself. I believe in the last week or so someone posted a sample script for setting downtime via a script, so you might search the archives. Jeffrey. You give some very good reasons for Nagios current behavior during a downtime. But I agree with the original request that there be an option to disable checks during a downtime because there are equally rational reasons to do so. There are some cases where we really should not be running service checks during down times because of the extra load they put on our system when they fail. Many of our checks fail in this case by timing out and they use relatively scarce (shared) and resource intensive processes (web browser sessions run under SeleniumRC). Timeouts tend to be long for these checks so there is more contention for these processes when all the checks using them start failing, and they're run more often until they all go into a 'hard' failure state, etc. Maybe we can live with this, but it would be easier on the system to just inhibit checks we know are going to fail during certain regularly scheduled down times. There may be plenty of other examples where running lots of failing tests during a downtime end up using significant system resources. We implement our regular downtimes by using by defining the uptime with a timeperiod and using that for the check_period and notification_period of our services. The problem with that is that all the services get scheduled to run at the exact second that our downtime ends. So we have to define a concurrency limit and rely on nagios nudging checks out when the limit is reached in order to spread the schedule out again. It would be very nice to be able to define regular downtimes with timeperiods and have the option of inhibiting checks as well as notifications during those downtimes without bunching up the scheduling queue when the downtime ends. -- Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Data protection magic? Nope - It's vRanger. Get your free trial download today. http://p.sf.net/sfu/quest-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] [Nagios-devel] Q: Service Escalation Recovery Notifications.
I should have mentioned that whether this works depends on who the default, non-escalated, contacts are for the host or service. In your case, since you have last_notification set to 3, those contacts in your escalation will not get a recovery notification that is numbered 5 or greater unless they also happen to be the default contact for the host or service, which will get the problem notification number 4 and all non-escalated notifications. If you escalate a notification to a contact that is not assigned as a regular contact for the host or service, those contacts don't get the recovery notification (unless they also got the previous problem notification) even if you set up a separate escalation for the recovery notification that specifies all previous contacts. Patrik Båt wrote: Are you sure about that? The documentation says: If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery. In this case, the nt-admins and managers contact groups would be notified of the recovery. On Wed, 2011-05-25 at 13:56 -0400, Paul M. Dubuc wrote: This works as long as the problem doesn't last longer than 3 notification intervals. Recovery notifications that are numbered higher than 4 won't be sent. Patrik Båt wrote: # SMS define serviceescalation { host_name * service_description * first_notification 2 last_notification 3 notification_interval 0 contacts oncall } define hostescalation { host_name * first_notification 2 last_notification 3 notification_interval 0 contacts oncall } # MAIL define serviceescalation { host_name * service_description * first_notification 1 last_notification 1 notification_interval 10 contacts sysadmin.reports } define hostescalation { host_name * first_notification 1 last_notification 1 notification_interval 10 contacts sysadmin.reports } # Recovery define serviceescalation { host_name * service_description * first_notification 2 last_notification 3 notification_interval 0 contacts sysadmin.reports escalation_options r } define hostescalation { host_name * first_notification 2 last_notification 3 notification_interval 0 contacts sysadmin.reports escalation_options r This is working for me, to notify both via sms and email. eg 2 contacts. On Fri, 2011-05-20 at 22:22 +0200, Andreas Ericsson wrote: On 05/20/2011 06:05 PM, Max Schubert wrote: Hi, On Thu, May 19, 2011 at 10:10 AM, Andreas Ericssona...@op5.se mailto:a...@op5.sewrote: On 05/19/2011 03:32 PM, Paul M. Dubuc wrote: OK, but wouldn't it be nice if all contacts who got an error notification were able to get the recovery message instead of just the one last notified? Is there any way to do that? Setting up an explicit serviceescalation for recovery notifications doesn't seem to work. Max Schubert is working on a patch that does something similar to that. If he doesn't complete it, I might take a look at adding it myself. I will send out my partial patch to the list sometime today along with an explanation of my thinking / approach for it - feel free to use it or discard it as you see fit :)! Rest assured, I will ;) Our customers have raised voices about simplifying the notification logic though. This discussion actually spawned that voice-raising, which is nice. Either way, it might be that I end up either taking your patch or implementing theeveryone who gets problem notifications also get recovery notifications. -- vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] [Nagios-devel] Q: Service Escalation Recovery Notifications.
Because they HAVE been informed of the problem by earlier notifications, but not the one notification prior to the recovery. It leaves those contacts wondering if the problem was ever fixed. Patrik Båt wrote: Why just send a recovery to someone who hasnt been informed of problem? :P On Thu, 2011-05-26 at 09:43 -0400, Paul M. Dubuc wrote: I should have mentioned that whether this works depends on who the default, non-escalated, contacts are for the host or service. In your case, since you have last_notification set to 3, those contacts in your escalation will not get a recovery notification that is numbered 5 or greater unless they also happen to be the default contact for the host or service, which will get the problem notification number 4 and all non-escalated notifications. If you escalate a notification to a contact that is not assigned as a regular contact for the host or service, those contacts don't get the recovery notification (unless they also got the previous problem notification) even if you set up a separate escalation for the recovery notification that specifies all previous contacts. -- vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] [Nagios-devel] Q: Service Escalation Recovery Notifications.
This works as long as the problem doesn't last longer than 3 notification intervals. Recovery notifications that are numbered higher than 4 won't be sent. Patrik Båt wrote: # SMS define serviceescalation { host_name * service_description * first_notification 2 last_notification 3 notification_interval 0 contacts oncall } define hostescalation { host_name * first_notification 2 last_notification 3 notification_interval 0 contacts oncall } # MAIL define serviceescalation { host_name * service_description * first_notification 1 last_notification 1 notification_interval 10 contacts sysadmin.reports } define hostescalation { host_name * first_notification 1 last_notification 1 notification_interval 10 contacts sysadmin.reports } # Recovery define serviceescalation { host_name * service_description * first_notification 2 last_notification 3 notification_interval 0 contacts sysadmin.reports escalation_options r } define hostescalation { host_name * first_notification 2 last_notification 3 notification_interval 0 contacts sysadmin.reports escalation_options r This is working for me, to notify both via sms and email. eg 2 contacts. On Fri, 2011-05-20 at 22:22 +0200, Andreas Ericsson wrote: On 05/20/2011 06:05 PM, Max Schubert wrote: Hi, On Thu, May 19, 2011 at 10:10 AM, Andreas Ericssona...@op5.se wrote: On 05/19/2011 03:32 PM, Paul M. Dubuc wrote: OK, but wouldn't it be nice if all contacts who got an error notification were able to get the recovery message instead of just the one last notified? Is there any way to do that? Setting up an explicit serviceescalation for recovery notifications doesn't seem to work. Max Schubert is working on a patch that does something similar to that. If he doesn't complete it, I might take a look at adding it myself. I will send out my partial patch to the list sometime today along with an explanation of my thinking / approach for it - feel free to use it or discard it as you see fit :)! Rest assured, I will ;) Our customers have raised voices about simplifying the notification logic though. This discussion actually spawned that voice-raising, which is nice. Either way, it might be that I end up either taking your patch or implementing the everyone who gets problem notifications also get recovery notifications. -- vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Q: Service Escalation Recovery Notifications.
OK, but wouldn't it be nice if all contacts who got an error notification were able to get the recovery message instead of just the one last notified? Is there any way to do that? Setting up an explicit serviceescalation for recovery notifications doesn't seem to work. Yueh-Hung Liu wrote: by the examples from nagios documentation, only on-call-support will get the 6th and above notifications. On Thu, May 19, 2011 at 4:33 AM, Paul M. Dubucw...@paul.dubuc.org wrote: Here is an example from the Nagios 3.2.3 documentation on service escalations. Recovery Notifications Recovery notifications are slightly different than problem notifications when it comes to escalations. Take the following example: define serviceescalation{ host_name webserver service_description HTTP first_notification3 last_notification 5 notification_interval 20 contact_groupsnt-admins,managers } define serviceescalation{ host_name webserver service_description HTTP first_notification4 last_notification 0 notification_interval 30 contact_groupson-call-support } If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery. In this case, the nt-admins and managers contact groups would be notified of the recovery. My question is who gets the recovery notification after 6 problem notifications? Only on-call-support (the last one notified), or all three contact groups (since all received notifications of the problem)? If only on-call-support (which seems to be the case), how can I ensure that the others get it too? I tried adding a service escalation for the recovery notification, like so in keeping with the above example: define serviceescalation{ host_name webserver service_description HTTP first_notification2 last_notification 0 escalation_options r contact_groupson-call-support,nt-admins,managers } but that doesn't seem to work. I had thought this fixed the problem but the recovery notification only seems to go to the last contact(s) that were notified of the problem. -- What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Q: Service Escalation Recovery Notifications.
Here is an example from the Nagios 3.2.3 documentation on service escalations. Recovery Notifications Recovery notifications are slightly different than problem notifications when it comes to escalations. Take the following example: define serviceescalation{ host_name webserver service_description HTTP first_notification3 last_notification 5 notification_interval 20 contact_groupsnt-admins,managers } define serviceescalation{ host_name webserver service_description HTTP first_notification4 last_notification 0 notification_interval 30 contact_groupson-call-support } If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery. In this case, the nt-admins and managers contact groups would be notified of the recovery. My question is who gets the recovery notification after 6 problem notifications? Only on-call-support (the last one notified), or all three contact groups (since all received notifications of the problem)? If only on-call-support (which seems to be the case), how can I ensure that the others get it too? I tried adding a service escalation for the recovery notification, like so in keeping with the above example: define serviceescalation{ host_name webserver service_description HTTP first_notification2 last_notification 0 escalation_options r contact_groupson-call-support,nt-admins,managers } but that doesn't seem to work. I had thought this fixed the problem but the recovery notification only seems to go to the last contact(s) that were notified of the problem. -- What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Segmentation fault Nagios 3.2.3
Recompiled without embedded-perl option, now it's working fine . Still I am not able to understand why it was happened. /\ dE These messages look suspicious: access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file or directory) open(/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/tls/x86_64/libperl.so, O_RDONLY) = -1 ENOENT (No such file or directory) stat(/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/tls/x86_64, 0x7fff00917240) = -1 ENOENT (No such file or directory) open(/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/tls/libperl.so, O_RDONLY) = -1 ENOENT (No such file or directory) stat(/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/tls, 0x7fff00917240) = -1 ENOENT (No such file or directory) open(/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/x86_64/libperl.so, O_RDONLY) = -1 ENOENT (No such file or directory) stat(/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/x86_64, 0x7fff00917240) = -1 ENOENT (No such file or directory) open(/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/libperl.so, Looks like you're running Nagios on a system that doesn't have (64-bit) perl libs installed. -- What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Flap Detection: Why do only HARD state changes count?
This isn't explicitly stated in the documentation, but it seems that flap detection state changes only apply to HARD states. So it's possible that a service check and toggle back and forth indefinitely between OK and not OK (unless max_check_attempts is set to 1) and flapping will not be detected. I tested this with a service that does this and verified the behavior. The Last State Change time gets updated with each SOFT state change, but the % state change for flap detection remains at 0% until I set max_check_attempts to 1 and let it toggle between hard state changes. Is this a bug or is it by design? Is there a way to include SOFT state transitions in flap detection? I'm using Nagios Core 3.2.3. -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] notification_interval normal_check_interval
Mike Chesnut wrote: I have a check that I only want to occur once a day, so I do this in the service definition: normal_check_interval 1440 However, when it fails, I want it to retry every 10 minutes, so I do this: retry_check_interval10 My default notification_interval is set to 15. When I run a pre-flight check, I get this: Warning: Service 'service' on host'host' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval. Is that warning telling me that notifications are only sent when a normal check occurs? What I want is for in the event of a failure, notifications to continue to be sent (every 15 minutes) until the service recovers. Will that be the case? Thanks, Mike What is the value of max_check_attempts? It's at the end of that number of checks that the service enters a hard state and a notification is sent. If the value is 1, then the warning makes perfect sense because no retry checks will be done. Paul Dubuc -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] notification_interval normal_check_interval
Mike Chesnut wrote: On 04/18/2011 12:08 PM, Paul M. Dubuc wrote: Mike Chesnut wrote: I have a check that I only want to occur once a day, so I do this in the service definition: normal_check_interval 1440 However, when it fails, I want it to retry every 10 minutes, so I do this: retry_check_interval10 My default notification_interval is set to 15. When I run a pre-flight check, I get this: Warning: Service 'service' on host'host' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval. Is that warning telling me that notifications are only sent when a normal check occurs? What I want is for in the event of a failure, notifications to continue to be sent (every 15 minutes) until the service recovers. Will that be the case? Thanks, Mike What is the value of max_check_attempts? It's at the end of that number of checks that the service enters a hard state and a notification is sent. If the value is 1, then the warning makes perfect sense because no retry checks will be done. max_check_attempts is 2. Is that a sensible number here? Thanks, Mike OK, I think it will work this way: You will get a notification if there is still a problem after the retry check. After that, the check interval reverts to the normal interval and, if the problem persists after the retry, you will not get another notification until after the next normal interval check. You will not get a recovery notification until then either if the problem clears up unless you rerun the check manually. This doesn't sound like what you want. I don't think you can do what you want without shortening the normal check interval. Paul Dubuc -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Escalating notifications
Patrik Båt wrote: Hello mailinglist! im trying to get a notification like this: in first hardstate, email staff. (notication 1) at the other notification (notification 2) im sending a SMS to the oncall. But the problem is, that on recovery im only getting a SMS due to the sms escalation is in use. Anyone have any good way to get this to work? 1. MAIL Problem 2. SMS Problem On recovery: 1. Mail Recovery 2. SMS Recovery with 2 escalations, i get like this: 1. Mail problem 2. Mail problem, SMS problem recovery: 1. SMS recovery. Config: # SMS define serviceescalation { host_name * service_description * first_notification 2 last_notification 3 notification_interval 0 contacts oncall } define hostescalation { host_name * first_notification 2 last_notification 3 notification_interval 0 contacts oncall } # MAIL define serviceescalation { host_name * service_description * first_notification 1 last_notification 1 notification_interval 10 contacts sysadmin.reports } define hostescalation { host_name * first_notification 1 last_notification 1 notification_interval 10 contacts sysadmin.reports } i have tried with diffrent last_notifications and so on, but with no luck. Regards Patrik BÃ¥t. Try using a separate escalation for the recovery events. The recovery event is the last numbered event so it's hard to catch without a specific escalation. Example: define serviceescalation { host_name * service_description * first_notification 1 last_notification 0 notification_interval 0 contacts sysadmin.reports,oncall escalation_options r } define hostescalation { host_name * first_notification 1 last_notification 0 notification_interval 0 contacts sysadmin.reports,oncall escalation_options r } -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Escalating notifications
Same place you put the others. The thing that makes them only apply to recovery events is the escalation_options r directive. Edwin Zoeller wrote: This is also what I am looking for. Where would you put the separate escalation? -Original Message- From: Paul M. Dubuc [mailto:w...@paul.dubuc.org] Sent: Friday, April 01, 2011 8:49 AM To: Nagios Users List Subject: Re: [Nagios-users] Escalating notifications Patrik Båt wrote: Hello mailinglist! im trying to get a notification like this: in first hardstate, email staff. (notication 1) at the other notification (notification 2) im sending a SMS to the oncall. But the problem is, that on recovery im only getting a SMS due to the sms escalation is in use. Anyone have any good way to get this to work? 1. MAIL Problem 2. SMS Problem On recovery: 1. Mail Recovery 2. SMS Recovery with 2 escalations, i get like this: 1. Mail problem 2. Mail problem, SMS problem recovery: 1. SMS recovery. Config: # SMS define serviceescalation { host_name * service_description * first_notification 2 last_notification 3 notification_interval 0 contacts oncall } define hostescalation { host_name * first_notification 2 last_notification 3 notification_interval 0 contacts oncall } # MAIL define serviceescalation { host_name * service_description * first_notification 1 last_notification 1 notification_interval 10 contacts sysadmin.reports } define hostescalation { host_name * first_notification 1 last_notification 1 notification_interval 10 contacts sysadmin.reports } i have tried with diffrent last_notifications and so on, but with no luck. Regards Patrik BÃ¥t. Try using a separate escalation for the recovery events. The recovery event is the last numbered event so it's hard to catch without a specific escalation. Example: define serviceescalation { host_name * service_description * first_notification 1 last_notification 0 notification_interval 0 contacts sysadmin.reports,oncall escalation_options r } define hostescalation { host_name * first_notification 1 last_notification 0 notification_interval 0 contacts sysadmin.reports,oncall escalation_options r } -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Problem: Nagios service check retry interval shorted than configured.
Jelle Smet wrote: Why is there only 10 seconds between these pairs of checks? Sometimes I see a 20 or 30 second difference sometimes 60 seconds. Most of them are less than 30 seconds. It's very inconsistent. Any idea what could be causing this? Hi Paul, I have been looking into this myself the last couple of days. Nagios does on demand host checks, the reason for this is explained here http://nagios.sourceforge.net/docs/3_0/hostchecks.html It basically means Nagios executes the host check when it thinks it needs to do so. You could alter the cached host check horizon (http://nagios.sourceforge.net/docs/3_0/cachedchecks.html) so Nagios does on demand checks less frequent and uses older host results instead. What I'm personally wondering is whether on demand checks should count as retries? Because this is the case at the moment and it makes the parameter 'retry_interval' virtually useless. Hope this helps, Jelle Smet http://www.smetj.net Thanks, I think I understand how this works. But I'm having this problem with service checks, not host checks. I do have the concurrent service check limit set to 30 and I wonder if that is affecting the scheduling of service check retries but, if so, I would think it would make the retry interval longer, not shorter than specified. Does anyone know if service check retries are subject to the concurrency limit? Paul Dubuc -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] service notification logged but not done
Could it be that your scripts are stored on an NFS mounted filesystem or other networked storage? (What is $USER1$ defined to be?) If so, maybe you're having intermittent problems with access. Using local storage for the scripts will solve this problem. You might find some evidence of the problem by turning on debugging in Nagios and looking at it's debug output (see debug_file, debug_level and debug_verbosity parameters in nagios.cfg. Hope this helps, Paul Dubuc MAYER Hans wrote: Dear Chad Ø Have you recently upgraded Nagios? Yes, I am running Core 3.2.3 since Feb 24^th Ø When did you start noticing that it was missing execution runs? I noticed the problem month ago. even with version 3.2.1 - therefore I made an upgrade to the latest version, to see, if this would fix the problem. Ø Do you have enough disk space free? As I said: 91 % free, only 9 % used Ø What are the permissions of the script set to? -rwxr-xr-x 1 nagios nagios 1035 Feb 18 10:17 rshsendsms I said, it happens only sometimes. Wrong permissions would result in a never working situation. Ø Were they recently changed? No. Ø Have you done any type of software changes with any type of supporting packages (i.e. Perl) that could have brought up this issue? No, this server is running since Jun 2010 unchanged. What happens within Nagios between writing the log-file and executing the script ? Something permits to execute the script, but only sometimes. Kind regards Hans *From:* Chad Rhyner [mailto:crhy...@box.net] *Sent:* Wednesday, March 09, 2011 6:32 PM *To:* Nagios Users List *Cc:* MAYER Hans *Subject:* Re: [Nagios-users] service notification logged but not done Have you recently upgraded Nagios? When did you start noticing that it was missing execution runs? Do you have enough disk space free? What are the permissions of the script set to? Were they recently changed? Have you done any type of software changes with any type of supporting packages (i.e. Perl) that could have brought up this issue? Here are some thoughts on where I would start looking. Anything that you can dig up we can look at more closely to identify a potential cause for this issue. ~Chad On Wed, Mar 9, 2011 at 1:29 AM, MAYER Hans ma...@iiasa.ac.at mailto:ma...@iiasa.ac.at wrote: Dear all Using Nagios since a lot of years, I was starting with one of the first versions of “netsaint”, and more than 25 years of experience with UNIX, I have now a strange problem I never had before. I am running Nagios Core 3.2.3 on Solaris 10 OS. Hardware is M3000 with SPARC V9 architecture. My problem is, I see sometimes – not always – a service notification in the log, but it is not really done. Here an example, the entry in the log [03-09-2011 09:13:25] SERVICE NOTIFICATION: sms_mayer;amazon;DISK/p14amazon;OK;notify-service-by-sms;DISK OK - free space: /p14amazon 4531 MB (6% inode=99%): Here is the definition for notify-service-by-sms # 'notify-service-by-sms' command definition define command{ command_name notify-service-by-sms command_line $USER1$/rshsendsms $CONTACTPAGER$ \Info: $HOSTALIAS$/$SERVICEDESC$ $SERVICEOUTPUT$ \ } As you see I execute a command named “rshsendsms”. And this are the first lines of the shell script: : # Wed Jan 19 10:12:15 MET 2011 - mayer initial # Wed Feb 16 10:11:54 MET 2011 - mayer logging the UID # usage: # rshsendsms 0043664xxx 'hello world - how are you ' # info: both types of apostrophes are important export PATH LOG NUMBER TEXT ID UID NOTSENT RUNLOG PATH=/usr/bin:$PATH LOG=/var/adm/rshsendsms.log RUNLOG=/var/adm/rshsendsms_run.log date '+%y%m%d %H:%M' $RUNLOG The first action I do, I write a log-entry. (91% of the disk is free) But in this case I cannot find the entry. The last one is dated with 110309 06:39, where I received a SMS really. I also switched on the process accounting weeks ago. But there is no entry to be found, that the shell script was executed. I also switched on the debug facility of “syslog”. I can find an equivalent entry like in the Nagios log. But there are no other messages, that something could be wrong. But on other hand I was informed at 06:39 and nothing was changed in the meantime. This is not the first time this problem happens. Most of the time notification works fine, but sometimes not. This is of course a pain as notification is one central functionality of Nagios. Any idea where I can start searching for the error ? Kind regards Hans -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net mailto:Nagios-users@lists.sourceforge.net
[Nagios-users] Problem: Nagios service check retry interval shorted than configured.
I have nagios core 3.2.3 built on SuSE 11.1 and I've been noticing apparent problem with service check retries. The normal check interval is set to 7.5 and the retry interval is set to 1 minute. I'm seeing entries like this in the log: [03-02-2011 16:44:39] SERVICE ALERT: aps11;Extra_01.20;OK;SOFT;2;SELRC OK [03-02-2011 16:44:29] SERVICE ALERT: aps11;Extra_01.20;UNKNOWN;SOFT;1;SELRC UNKNOWN - Timeout (130 sec.) reached [03-02-2011 13:28:19] SERVICE ALERT: aps14;Extra_04.15;OK;SOFT;2;SELRC OK [03-02-2011 13:28:09] SERVICE ALERT: aps14;Extra_04.15;CRITICAL;SOFT;1;SELRC CRITICAL Why is there only 10 seconds between these pairs of checks? Sometimes I see a 20 or 30 second difference sometimes 60 seconds. Most of them are less than 30 seconds. It's very inconsistent. Any idea what could be causing this? Thanks, Paul Dubuc -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Q: are service check retries subject to concurrency limit?
Nagios 3.2.3: I'm wondering if Nagios subjects retries on a check failure to the limit set by the max_concurrent_checks parameter in nagios.cfg. My sense is that max_concurrent_checks only applies to checks done during the normal check interval. Does anyone know for sure if that is true? Thanks, Paul Dubuc -- What You Don't Know About Data Connectivity CAN Hurt You This paper provides an overview of data connectivity, details its effect on application quality, and explores various alternative solutions. http://p.sf.net/sfu/progress-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Trying to develop a new perl plugin
If you're going to be writing many of your own plugins, it might be worth the effort to use the Nagios::Plugin modules (http://search.cpan.org/~tonvoon/Nagios-Plugin-0.35/lib/Nagios/Plugin.pm). They're probably installed under the perl/lib subdirectory of your Nagios installation. Among other things, they provide a convenient wrapper for Getopt::Long and Params::Validate so you can do position independent options and argument validation for your Perl plugins. Using it could save you quite a bit of time and code maintenance headaches in the long run. Paul Dubuc Nibin VM wrote: Thanks for your reply folks.. :) Finally I have concluded that the portion which reads the argument has issues. $host=$ARGV[0]; It isn't taken correctly when its executed from nagios. Please somebody tell me what code should I put if I need to specify the host name like ./test.pl http://test.pl -H hostname? On Sun, Jan 23, 2011 at 10:44 PM, Boyer, Timothy A. timothy.bo...@opm.gov mailto:timothy.bo...@opm.gov wrote: Permissions problem? You're running the command line as root; try running the command line as your Nagios username. From: Nibin VM [nibin...@piserve.com mailto:nibin...@piserve.com] Sent: Sunday, January 23, 2011 10:46 AM To: nagios-users@lists.sourceforge.net mailto:nagios-users@lists.sourceforge.net Subject: [Nagios-users] Trying to develop a new perl plugin Hello guys, I am trying to write some nagios perl plugin to monitor some services I'm responsible for. Initially I tried to write custom plugin to monitor mail queue using the following script. === #!/usr/bin/perl -w use strict; use Net::SNMP; use Getopt::Long; use lib /usr/lib64/nagios/libexec; my %ERRORS=('OK'=0,'WARNING'=1,'CRITICAL'=2,'UNKNOWN'=3); my $host = undef; my $result = undef; my @array = undef; $host=$ARGV[0]; $result=`/usr/lib64/nagios/plugins/check_snmp -H $host -C community -o extOutput.1`; @array = split(/\ /, $result); chomp($array[3]); if ( $array[3] le 1 ) { print OK: current emails queue is $array[3]\n; exit $ERRORS{OK}; } elsif ( $array[3] ge 2 $array[3] le 2 ) { print Warning: current emails queue is $array[3]\n; exit $ERRORS{WARNING}; } elsif ( $array[3] ge 3 ) { print Critical: current emails queue is $array[3]\n; exit $ERRORS{CRITICAL}; } else { print Unknown; exit $ERRORS{UNKNOWN}; } As you can see, I use snmp to pull mail queue from the remote server. When I try the command from command line it work fine. ]# ./test.pl http://test.plhttp://test.pl server name OK: current emails queue is 264 But from the nagios from end it shows as critical and it shows Critical: current emails queue is Unknown error :( Please somebody help me to sort this out. Obviously its the first perl script that I ever wrote and I really interested to write more plugins in perl(I am in love with perl now :) ). Thanks in advance! -- Regards, Nibin. -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net mailto:Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Regards, Nibin. -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Special Offer-- Download ArcSight Logger
Re: [Nagios-users] Nagios kept from restarting after reboot by lockfile
eric.b...@barclayscapital.com wrote: It's weirdwhen I run nagios and kill it with -9, it leaves the pid file intact, but when I restart it, it zero's out the pid file and starts just fine. when I just kill it with the default kill signal, it removes the pid file. This isn't weird. That's how it should work. kill -9 sends an uncatchable, compulsory, kill signal (SIGKILL) to the process giving it no time to clean up before exiting. The default kill signal is SIGTERM, which can be caught and handled (or ignored) by the process. Restarting Nagios from the web interface, doesn't terminate and restart the process (the PID doesn't change), only re-initializes it. -- Forrester recently released a report on the Return on Investment (ROI) of Google Apps. They found a 300% ROI, 38%-56% cost savings, and break-even within 7 months. Over 3 million businesses have gone Google with Google Apps: an online email calendar, and document program that's accessible from your browser. Read the Forrester report: http://p.sf.net/sfu/googleapps-sfnew ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] different contact groups depending on time of day
Mario Garcia Ortiz wrote: Hello list, is it possible to send notification to a certain contact group depending on the time, what i mean, send notification (sms) to certain people between working hours and to other people outside working hours and weekends. thank you Yes. Define contact objects with different host_notification_period and service_notification_period specifications. http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#contact -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] JVM Monitoring
Marc-André Doll wrote: Hi list, I have to monitor some JVM and I don't find plugins that fit exactly with what I want/imagine. I could use the check_jmx but I don't really want to install a JRE on my Nagios server. Currently, I'm monitoring Tomcat servers with check_jmx4perl and I'm quite happy with it. Is it possible to configure/tweek the JVM or the J4P war to use it on a non-JEE server? Or am I doomed to install java on my monitoring server? Thanks for your help. I was just looking at the web page for check_jmx4perl at http://exchange.nagios.org/directory/Plugins/Java-Applications-and-Servers/check_jmx4perl/details It says that it requires No Java installation required on the Nagios host. Is this not true? Paul Dubuc -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check scheduling when checks are inhibited.
Andreas, Thanks for your reply to my earlier message. I've done some testing and some more thinking on this since then: On 11/23/2010 03:50 AM, Andreas Ericsson wrote: On 11/22/2010 10:41 PM, Paul M. Dubuc wrote: We're using Nagios 3.2.3 for simulation of monitoring load in a load test environment as well as for monitoring production services. I've notices some interesting behavior in the way Nagios schedules checks when checks are inhibited either though the CGI Process Commands or by setting a check_period timeperiod that inhibits checks during regularly scheduled down times. Normally Nagios seems to spread out host and service checks evenly over time but when checks are stopped with the Process Command, Nagios seems to reschedule checks so that they are bunched up much closer together. This creates alternating periods of densely scheduled and more sparsely scheduled checks that seem to persist when checks are turned on again. It has a noticeable effect in our load testing. The only way--or the quickest way--to get Nagios to smooth out the schedule again is to stop the process completely until all the scheduled check times have passed. In testing Nagios monitoring of our production services, if I use the check_period to inhibit checks during our down times, I notice that as the downtime approaches, ALL checks are rescheduled for the exact time that the downtime ends (according to the check_period). This creates a big spike in monitoring activity after the downtime. One way to avoid this, I think, is to let checks run during the down times but inhibit notifications instead by using the timeperiod to define a notification_period. But I wonder if this bunching up of the schedule when using check_periods is ever a desirable behavior. I have some plans to make Nagios spread the checks with a randomized interleave factor so that a check scheduled to run once every 5 minutes can be run anywhere between 4m 30s and 5m 0s after it last ran. The 30 second random-spread would be the default and it would otherwise be configurable. Another thing worth looking into is to make services to the same host not run simultaneously, in case the checked server is expected to be loaded heavily it may not play nicely with 30-40 checks fired at it at once. Here's another suggestion: An option that would tell Nagios to stagger the scheduling of service checks when the check_period resumes. Instead of scheduling all the checks for the exact time that the next check_period begins, add an amount of time equal to the time past the check_period ending that the service would have run if the check_period hadn't disabled checks. For example, If I have a check period that is from 9:00 to 17:00 every day. A service running every 5 minutes that runs at 16:57:14 would normally run at 17:02:14 if the check_period did not end at 17:00. This check would be scheduled to run at 9:02:14 the next day instead of 9:00:00. This should keep all checks staggered by the same amount of time in the schedule once the check_period resumes. I think this would be an ideal solution to the problem. Using the auto_rescheduling options (discussed below) seems to help a little bit but not as much as I'd hoped. You really should be using scheduled downtime for regular downtime though. There are pre-hacked solutions to automagically reschedule re-occurring downtime. Ninja supports it out of the box as of the latest version (or possibly latest git). There are some cases where we really should not be running the checks during down times because of the extra load they put on our system when they fail. (Checks are still run during down times, if I'm not mistaken, only notifications are inhibited.) Many of our checks fail in this case by timing out and they use relatively scarce (shared) and resource intensive processes (web browser sessions run under SeleniumRC). Timeouts tend to be long for these checks so there is more contention for these processes when all the checks using them start failing, and they're run more often until they all go into a 'hard' failure state, etc. Maybe we can live with this, but it would be easier on the system to just inhibit checks we know are going to fail during certain regularly scheduled down times. These aren't critical issues for us since we can work around them procedurally. That's good to hear. But I wonder if there his a way to prevent the scheduled checks from getting bunched together like this if/when you need to inhibit checks for a time while keeping Nagios running. Maybe the auto_rescheduling options in the nagios.cfg are meant to address this, but they have a potentially negative effect on performance according to the comments around them in the file. The below text is what I'd call educated speculation after having thrown a quick glance at the code. I might be completely wrong, but I don't think so
[Nagios-users] check scheduling when checks are inhibited.
We're using Nagios 3.2.3 for simulation of monitoring load in a load test environment as well as for monitoring production services. I've notices some interesting behavior in the way Nagios schedules checks when checks are inhibited either though the CGI Process Commands or by setting a check_period timeperiod that inhibits checks during regularly scheduled down times. Normally Nagios seems to spread out host and service checks evenly over time but when checks are stopped with the Process Command, Nagios seems to reschedule checks so that they are bunched up much closer together. This creates alternating periods of densely scheduled and more sparsely scheduled checks that seem to persist when checks are turned on again. It has a noticeable effect in our load testing. The only way--or the quickest way--to get Nagios to smooth out the schedule again is to stop the process completely until all the scheduled check times have passed. In testing Nagios monitoring of our production services, if I use the check_period to inhibit checks during our down times, I notice that as the downtime approaches, ALL checks are rescheduled for the exact time that the downtime ends (according to the check_period). This creates a big spike in monitoring activity after the downtime. One way to avoid this, I think, is to let checks run during the down times but inhibit notifications instead by using the timeperiod to define a notification_period. But I wonder if this bunching up of the schedule when using check_periods is ever a desirable behavior. These aren't critical issues for us since we can work around them procedurally. But I wonder if there his a way to prevent the scheduled checks from getting bunched together like this if/when you need to inhibit checks for a time while keeping Nagios running. Maybe the auto_rescheduling options in the nagios.cfg are meant to address this, but they have a potentially negative effect on performance according to the comments around them in the file. -- Increase Visibility of Your 3D Game App Earn a Chance To Win $500! Tap into the largest installed PC base get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Macros in notes?
Mark A. Lappin wrote: What I would like to do, for my network printers, switches, routers, and some other devices, is add more information to the extended info page. I have been playing around with notes and to get decently readable output, I end up with a bunch of ugly looking HTML which I have been duplicating on every host definition. Trying to include printer make, model, print queue, location, primary users, toner part number etc; routers nearest service center, circuit identifier, etc. Works great, hard to maintain. So I was/have been trying (unsuccessfully) to use macros in my host definition and on the template put in the more complex HTML that would fill in from the macros The below configs show what I was attempting. I do not get any configuration warnings, I don't however get the value that I have set in the host, I get the literal output: $_HOSTprnMake$. So I'm thinking (1) Nagios doesn't support what I'm trying to do and I can't use macros in notes or (2) I have a syntax error that I'm not seeing. I'm hoping somebody here can give me some insight into which case it might be - especially for #1 before I really start beating my head against the wall. It's #1. Nagios only supports macro expansion for command objects (maybe others I don't know). Using macro expansions will work in the arguments (if any) that you pass to the check_command because they're expanded for the command object. Being able to do what you are trying to do here would be nice. I would like to use macros for constructing host and service names. define host{ use generic-printer host_name 11314-AR alias 11314-AR-4200N address 192.168.98.31 action_url http://192.168.98.31 hostgroups network-printers _prnMakeHP _prnModel Laserjet 2300n _prnMainQueue lmfj-print\\11314-AR } define host{ namegeneric-printer ; The name of this host template use generic-host; Inherit default values from the generic-host template check_period24x7; By default, printers are monitored round the clock check_interval 5 ; Actively check the printer every 5 minutes retry_interval 1 ; Schedule host check retries at 1 minute intervals max_check_attempts 10 ; Check each printer 10 times (max) check_command check-host-alive; Default command to check if printers are alive notification_period workhours ; Printers are only used during the workday notification_interval 30 ; Resend notifications every 30 minutes notification_optionsd,r ; Only send notifications for specific host states contact_groups admins ; Notifications get sent to the admins by default register0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE notestable border=1 width=100% cellpadding=3 cellspacing=0 bgcolor=#FF style=border-collapse: collapse bordercolor=#00\ tr bgcolor=lightbluetd align=centerMake/td/tr\ trtd align=center$_HOSTprnMake$/td/tr\ /table } Any advice/input is very much appreciated. --Mark Mark A. Lappin, CCNA, MCITP: Enterprise Administrator | Lee Michaels Fine Jewelry Director of Information Technology 11314 Cloverland Ave | Baton Rouge, LA 70809 Ph: 225.291.9094 ext 245 | Fax: 225.368.3675 | Mobile: 225-362-2770 www.lmfj.com -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] No permission to web-interface
Astakhov Peter wrote: Hello, colleagues! I installed nagios on RHEL6. But I get error on web-interface: It appears as though you do not have permission to view information for any of the hosts you requested... If you believe this is an error, check the HTTP server authentication requirements for accessing this CGI and check the authorization options in your CGI configuration file. I checked /etc/httpd/conf.d/nagios.conf ScriptAlias /nagios/cgi-bin/ /usr/lib/nagios/cgi-bin/ ... Which display are you trying to use when you get this error? I have one instance of Nagios configured with no host groups and this error comes out if I try to view host groups. It's a little confusing since it's not really a permission issue since I have permission to access all the hosts. It's just that there is nothing to display using that particular query. There is no default all hosts hostgroup. Paul Dubuc -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios Historical Data Question
Marc Powell wrote: On Nov 15, 2010, at 8:05 AM, Korrawit Yindeeyoungyeon wrote: Where can I find the standard database schema of Nagios ? or I need to find in source code of 3rd party front-end software? You'll need to look at the third party software to determine how they get data into a database. Nagios doesn't use a database so has no standard database schema. Each addon either has it's own specific schema or utilizes one of other common event broker - database addons (such as ndoutils). -- Marc Maybe he's looking for this: http://nagios.sourceforge.net/docs/ndoutils/NDOUtils_DB_Model.pdf the DB schema used by NDOUtils. -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Suppress Max concurrent service checks messages.
We're running Nagios 3.2.3 with concurrent service checks set to 40. We can't go much higher than this due to resource constraints outside of Nagios but we're running 329 services at 5 minute intervals (this is a load test of sorts not production load ... yet). Average execution time/latency is 36/11 seconds so we're seeing quite a few messages like this in the Nagios log file: (Informational Message) [11-11-2010 14:55:57] Max concurrent service checks (40) has been reached. Nudging host:service by 9 seconds... Is there any way to suppress these messages from being logged? I don't see an option for logging these in the config file documentation. Thanks, Paul Dubuc -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Suppress Max concurrent service checks messages.
Ton Voon wrote: On 12 Nov 2010, at 15:30, Paul M. Dubuc wrote: We're running Nagios 3.2.3 with concurrent service checks set to 40. We can't go much higher than this due to resource constraints outside of Nagios but we're running 329 services at 5 minute intervals (this is a load test of sorts not production load ... yet). Average execution time/latency is 36/11 seconds so we're seeing quite a few messages like this in the Nagios log file: (Informational Message) [11-11-2010 14:55:57] Max concurrent service checks (40) has been reached. Nudginghost:service by 9 seconds... Is there any way to suppress these messages from being logged? I don't see an option for logging these in the config file documentation. I put those messages in. Firstly, 40 doesn't necessarily mean there are 40 concurrent service checks running as they may have finished but not been reaped yet (to decrement the counter). Secondly, if you are getting these messages, then either (1) this limit is too low - increase and keep an eye of the load on your nagios server; (2) you've got too many checks running - reduce frequencies/ numbers or setup a slave server. The trouble with the way the nudging works is that it hides the fact that you have latency issues (as the check is rescheduled to a future time). This means nagiostats will not include the additional latency time here. If someone has a better way of working this out, I'm all ears. Ton Thanks, Ton. This is helpful information and advice. The services we're running require web browsers to run which are a cpu and memory intensive resource that, temporarily, we need to manage on the Nagios server. In production we shouldn't have these limitations, but for now I just wanted to keep all these messages from flooding the log. Andreas, I know it's doing things wrong, but there's not much I can do about it right now. Since I know what the problem is that these messages are trying to tell me. I'd just like to keep them from flooding the logs so I can see what else is happening more easily. That's all. Thanks, Paul Dubuc -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Suppress Max concurrent service checks messages.
Ton Voon wrote: ... The trouble with the way the nudging works is that it hides the fact that you have latency issues (as the check is rescheduled to a future time). This means nagiostats will not include the additional latency time here. If someone has a better way of working this out, I'm all ears. Would it cause other problems if the total nudging time for a service were included in its latency time? -- Centralized Desktop Delivery: Dell and VMware Reference Architecture Simplifying enterprise desktop deployment and management using Dell EqualLogic storage and VMware View: A highly scalable, end-to-end client virtualization framework. Read more! http://p.sf.net/sfu/dell-eql-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Time frame for Monitoring Performance?
I'm using Nagios 3.2.3. I'm wondering what time frame is used for the measurements shown in the Monitoring Performance box on the Tactical Overview display. In particular, are the execution times (min. max. avg.) measured over the last hour, 10 minutes, or what? I can't find any information on this in the documents. Thanks, Paul Dubuc -- The Next 800 Companies to Lead America's Growth: New Video Whitepaper David G. Thomson, author of the best-selling book Blueprint to a Billion shares his insights and actions to help propel your business during the next growth cycle. Listen Now! http://p.sf.net/sfu/SAP-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] any macro for viewing host parent?
John Alberts wrote: I would like to have our notification emails for service alerts, include the host parent. Is there any existing macro I can use to include this? I couldn't find anything when googling. If not, any suggesions how I might get it in an email? The way we do this is to use a user-defined macro in the host definition like so: define host{ use aps-launcher host_name APS-P52 parents aps52 __PARENT_HOST aps52 } Then you can expand expand it, $_PARENT_HOST$, in the notification. Unfortunately this means you need to define the parent in 2 places. Would be nice if there was built-in macro for this, but I don't think there is. -- The Next 800 Companies to Lead America's Growth: New Video Whitepaper David G. Thomson, author of the best-selling book Blueprint to a Billion shares his insights and actions to help propel your business during the next growth cycle. Listen Now! http://p.sf.net/sfu/SAP-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] any macro for viewing host parent?
Paul M. Dubuc wrote: John Alberts wrote: I would like to have our notification emails for service alerts, include the host parent. Is there any existing macro I can use to include this? I couldn't find anything when googling. If not, any suggesions how I might get it in an email? The way we do this is to use a user-defined macro in the host definition like so: define host{ use aps-launcher host_name APS-P52 parents aps52 __PARENT_HOST aps52 } Then you can expand expand it, $_PARENT_HOST$, in the notification. I mean that would be $_HOST_PARENT_HOST$ Unfortunately this means you need to define the parent in 2 places. Would be nice if there was built-in macro for this, but I don't think there is. -- The Next 800 Companies to Lead America's Growth: New Video Whitepaper David G. Thomson, author of the best-selling book Blueprint to a Billion shares his insights and actions to help propel your business during the next growth cycle. Listen Now! http://p.sf.net/sfu/SAP-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] any macro for viewing host parent?
diego.roc...@gmail.com wrote: Isn't it $_HOSTPARENT_HOST$ ? Not if you put TWO underscores in front of the macro name. Then you get $_HOST_PARENT_HOST$ which I think is much more readable (a nice suggestion I found in Barth's book.) btw, in order to avoid the double declaration (and human errors) you could add in generic-host (ot whatever template you define) define generic-host { ... parents $_HOSTPARENT_HOST$ } and in the real host definition you will define only the custom macro. Haven't tried it, but it should work I don't think this will work because the macro isn't expanded in that context. I think they only expand in the command object or (effectively) in arguments in the check_command definition (because their expanded when passed to the command). Even if this did work it would work if all your hosts had the same parent. All my hosts have different parents. On Tue, Nov 9, 2010 at 7:55 PM, Paul M. Dubucw...@paul.dubuc.org wrote: Paul M. Dubuc wrote: John Alberts wrote: I would like to have our notification emails for service alerts, include the host parent. Â Is there any existing macro I can use to include this? Â I couldn't find anything when googling. Â If not, any suggesions how I might get it in an email? The way we do this is to use a user-defined macro in the host definition like so: define host{ use aps-launcher host_name APS-P52 parents aps52 __PARENT_HOST aps52 } Then you can expand expand it, $_PARENT_HOST$, in the notification. I mean that would be $_HOST_PARENT_HOST$ Unfortunately this means you need to define the parent in 2 places. It Would be nice if there was built-in macro for this, but I don't think there is. -- The Next 800 Companies to Lead America's Growth: New Video Whitepaper David G. Thomson, author of the best-selling book Blueprint to a Billion shares his insights and actions to help propel your business during the next growth cycle. Listen Now! http://p.sf.net/sfu/SAP-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Best Practice: Forgotten Acknowledgements
Andre Timmermann wrote: Am Montag, den 01.11.2010, 12:50 -0400 schrieb Chris Beattie: Acknowledgements add comments to hosts and services, so you could just set yourself a reminder to occasionally check the comments link in the side bar and look for anything that's getting stale. Yes, but this would enforce a human not to forget things. I tend to believe something automatic is more reliable than a human ;) You could write an event handler that fixes whatever the problem is. Otherwise you are relying on a human at every level not to forget the acknowledgments and reminders. ;-) -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] host_port objects - Enhancement Request
I think this would be a very nice enhancement. Many of the services we run are associated with a host and a port. We're using the service-based ports solution that you describe. Since Nagios requires that the combination host_name and service_description be unique, we often have to embed a port name in the service_description. Since the port is also passed as an argument to the check_command, it ends up being defined in two places and the service_description has to be changed when we change the port being used for the service. Having to configure a separate service for each port on a given host also complicates configuration changes. -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Services on statusmap
Laszlo Csepanyi-Furjes wrote: Hi, I'm using Nagios core 3.2.3. I have couple of hosts in my system and there are web services installed in every hosts that I would like to monitor. I implemened own plug-in for that purpose. So far the configuration is going well. But now I'm bumping my head into the wall. In the statusmap I can see only the defined hosts. How can I get the services visible there? Is it possible at all with the core version or do I have to install something extra? At least I found this picture: http://a9k.info/images/nagios.png It contains Chat, Staff, etc those should be services, right? Please help! I think the status map is only for hosts. The Up and Down status you see on the diagram apply to hosts, not services. (So Chat and Staff must be hosts.) Service status is OK, Warning Critical, etc. You can select a hosts (click) and double-click on it host to display its service status details. -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] question about macros
Joel Brooks wrote: hey gang, can macros be used in configuration objects? i.e. can i use $HOSTNAME$ in the display_name directive on a host object? That would be nice, but I don't think you can. You can use them in the command arguments in the check_command directive though. -- Download new Adobe(R) Flash(R) Builder(TM) 4 The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly Flex(R) Builder(TM)) enable the development of rich applications that run across multiple browsers and platforms. Download your free trials today! http://p.sf.net/sfu/adobe-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] escalation question
Terry wrote: On Mon, Oct 11, 2010 at 3:48 AM, michal.lacko...@cz.schneider-electric.com wrote: Hi All, Is there any way how to create service escalation in the following way: hostgroup_nameGroup1,Group2 service_description* contact_groupManagers Basically I would need to escalate all service problems on the hosts which are members of Group1 and Group2 to the managers. thanks in advance Michal -- Yes, you're exactly right. We took it a step further and put all hosts in a single group then globbed it as you did above: define serviceescalation{ hostgroup_name allhosts service_description .* contactsfoo,foo2 first_notification 1 last_notification 1 notification_interval 1 escalation_options w,u,c } define hostgroup { hostgroup_name allhosts alias allhosts members .* } use_regexp_matching=1 I think that's all you need to enable globbing. Thanks for this example. I'm trying to do something similar with an allhosts hostgroup definition and it doesn't seem to work unless all hosts in the allhosts group also have services defined for them. In this case I get an error like Error: Could not find a service matching host name 'AXSP51' and description '.*' (config file '/vol/omni/nagios-3.2.1/config/test/objects/contacts/Contacts.cfg', starting on line 74) Error: Could not expand services specified in service escalation (config file '/vol/omni/nagios-3.2.1/config/test/objects/contacts/Contacts.cfg', starting on line 74) AXSP51 has no services defined for it, but I monitor it as a parent for hosts that do. Do I need to maintain a host group to use instead of allhosts just for the hosts that have services defined for them, or is there a more convenient (i.e., less error prone) way around this? Thanks, Paul Dubuc -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Do plugins terminate gracefully on Nagios restart or shutdown?
I'm wondering if there is any termination signal sent to a plugin that happens to be executing at the time Nagios is restarted or shut down? Do plugins need a signal handler for this case if they have cleanup that needs doing? Do plugins using the embedded Perl also get a signal? Is the signal different for restart vs. shutdown? Or perhaps Nagios waits for plugins that are executing to finish while not starting any before doing the restart or shutdown. I haven't found the answer to this in the development guidelines or other documentation. Can anyone tell me how this is handled? Thanks, Paul Dubuc -- Sell apps to millions through the Intel(R) Atom(Tm) Developer Program Be part of this innovative community and reach millions of netbook users worldwide. Take advantage of special opportunities to increase revenue and speed time-to-market. Join now, and jumpstart your future. http://p.sf.net/sfu/intel-atom-d2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Plugin termination signal?
I'm wondering if there is any termination signal sent to a plugin that happens to be executing at the time Nagios is restarted or shut down? So plugins need a signal handler for this case if they have cleanup that needs doing? Do plugins using the embedded Perl also get a signal? Is the signal different for restart vs. shutdown? I haven't found the answer to this in the development guidelines. Thanks, Paul Dubuc -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Running a command when Nagios config changes
Ryan C Ash wrote: Paul M. Dubuc wrote I would like to have some way of running a command only when Nagios is started, or is restarted from the Process Commands menu, or any time Nagios reloads its configuration files. Is there a way to do this? I thought about writing it as a localhost service plugin that simply does nothing if $LASTSERVICECHECK$ $PROCESSSTARTTIME$ but that doesn't seem optimal. It this the best solution? It would be nice if I could write it as an event handler, but events are only for host or service state changes. This is a Nagios process state change. We run on redhat linux and I use a common init script in /etc/rc.d/init.d/nagios. That would be an easy place to add that additional script. Currently it maintains our pnp4nagios, nsca listener, ndoutils, etc. Thanks for your response. What I really need is something that will run my script anytime Nagios reads its config files (possible configuration change) so this is only a partial solution. Executing the Restart the Nagios process process command from the Process Info screen doesn't create a new Nagios process (it has the same PID after the restart), but it does cause Nagios to reload the configuration and resets the $PROCESSSTARTTIME$ macro value. The localhost plugin I describe above does the job, but I wouldn't be able to guarantee that it would run promptly after the restart event and there seems to be no way to have it just run once instead of at intervals. This isn't a problem for my current use so I can keep doing it this way. It would be nice for Nagios to have a process restart (or config changed) event for which one cold write an event handler script. -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] How to find active check status for a service?
Is there some programmatic way to find out whether or not active checks are enabled or disabled for a service in Nagios. We have a requirement for an audit to provide notifications for certain critical services that may have their active checks disabled so they aren't left that way any longer than necessary. Thanks, Paul Dubuc -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] How to find active check status for a service?
Holger Weiß wrote: * Paul M. Dubucw...@paul.dubuc.org [2010-07-15 17:08]: Is there some programmatic way to find out whether or not active checks are enabled or disabled for a service in Nagios. We have a requirement for an audit to provide notifications for certain critical services that may have their active checks disabled so they aren't left that way any longer than necessary. You could parse the status file, see ftp://ftp.in-berlin.de/pub/users/weiss/nagios/tools/disabled-notifications for an example. Thanks! Using MK Livestatus (http://mathias-kettner.de/checkmk_livestatus.html) is also a possibility. Paul Dubuc -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Running a command when Nagios config changes
I would like to have some way of running a command only when Nagios is started, or is restarted from the Process Commands menu, or any time Nagios reloads its configuration files. Is there a way to do this? I thought about writing it as a localhost service plugin that simply does nothing if $LASTSERVICECHECK$ $PROCESSSTARTTIME$ but that doesn't seem optimal. It this the best solution? It would be nice if I could write it as an event handler, but events are only for host or service state changes. This is a Nagios process state change. -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] using multiple templates
Litwin, Matthew wrote: Are there any consequences to using multiple templates other than that the last one defined gets precedence? I would like to have sevice templates the do things like define notifications interval separately from escalation path, time periods etc I was thinking of ending up with something like this: define{ namesome_services use an_escalation_template use a_notification_template use an_action_template } Assuming there is no collisions in namespace, this should work, right? Have you tried it? I don't know if separate 'use' directives work. I use a comma separated list with one 'use' directive: use an_escalation_template,a_notification_template,an_action_template Remember that the order is important. Anything defined in the first template takes precedence. See Multiple Inheritance Sources here for more details: http://nagios.sourceforge.net/docs/3_0/objectinheritance.html -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] groups of hostgroups?
Litwin, Matthew wrote: It doesn't appear that there is a way to have a way to include hostgroups in other hostgroups, but is there some other way to get this behavior? Since my environment has several dozen types of servers in our environment, it would be helpful to define a class of host somehow rather than having servers be listed explicitly in multiple hostgroups. Any ideas? I use templates to add hosts and services to groups. If the definition inherits from more than one template the 'hostgroups' or 'servicegroups' specifier will replace whatever was specified previously unless you prefix the group name with a plus sign (+). Then it adds the group to whatever other groups are specified: define hostgroup{ hostgroup_name HG_ALPHA ... } define host{ namealpha-host register0 ; this is a template hostgroups +HG_ALPHA ... } define hostgroup{ hostgroup_name HG_BETA ... } # # Nagios service definition template used by services in this config file # define host{ namebeta-host register0 ; this is a template use alpha-host hostgroups +HG_BETA } Now any host that uses the beta-host template is put in both the HG_BETA hostgroup and the HG_ALPHA hostgroup. This effectively puts the HG_BETA group within the HG_ALPA group. Hope this helps. Same thing can be done with servicegroups of course. -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nrpe configuration help
Could you define one wrapper service that executes one of the others based on an argument passed to it? shadih rahman wrote: All, I need some suggestion for nrpe configuration. I have 3 different kind of architecture in my setup. I have 32 bit linux machine (plugins installed at /usr/lib/nagios/plugins directory) , 64 bit linux machine (plugins installed /usr/lib64/nagios/plugins directory), solaris machine (plugins installed at /opt/libexec directory) In my nrpe.conf file I would three definitions like below [check_something]=/usr/lib/nagios/plugins/check_something [check_something_x64]= /usr/lib64/nagios/plugins/check_something [cehck_something_unix]=/opt/libexec/check_somthing in my service definition, I would name them differently and call the command file, for example I would have a check disk, disk_x64, disk_unix. In commands.cfg file I would call them like command_name check_remote command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ However, now new requirements came in, where disk, disk_x64, disk_unix must have same service name. I need to find a clever way define service disk and call different nrpe command based on architecture. Can someone please help me with this. Thanks -- Cordially, Shadhin Rahman -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] servicegroups directive doesn't seem to work
FYI, The reason this wasn't working was that there was 'use' directive in the service template that was using a template that also has a servicegroups directive for another service group (that line got edited out of my example). Putting a + sign in front of the ebusiness servicegroup name did the trick, adding the new service group instead of using it to replace the old one. Paul M. Dubuc wrote: Hello, The documentation at http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html for Service Definition says that you can use a 'servicegroups' directive to assign a service to a servicegroup instead of using the 'members' directive in the service group: *servicegroups*: This directive is used to identify the /short name(s)/ of the servicegroup(s) http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#servicegroup that the service belongs to. Multiple servicegroups should be separated by commas. This directive may be used as an alternative to using the /members/ directive in servicegroup http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#servicegroup definitions. I would like to do this using a service template that service definitions can use to do the assignment like the configuration below. This would save me from having to add many host,service pairs to the members directive in the service group. But it doesn't seem to work (I'm using Nagios 3.2.0). I get the following configuration error: Error: Servicegroup members must be specified in host_name,service_description pairs (config file ' ... I get the same error when I delete the service template and move the servicegroups directive into the service definitions. What am I doing wrong? Thanks, Paul Dubuc define servicegroup{ servicegroup_name ebusiness alias Business Services # members ; use servcicegroups in service definitions below instead. } # # Nagios service definition template used by services in this config file # define service{ nameebusiness-service register0 ; this is a template servicegroups ebusiness ; add the service to this service group } define service{ use ebusiness-service host_name host1,host2 service_description service1 check_command ... } # # SciFinder Password Change test service # define service{ use ebusiness-service host_name host1,host2 service_descriptionservice2 check_command ... } -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] servicegroups directive doesn't seem to work
Hello, The documentation at http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html for Service Definition says that you can use a 'servicegroups' directive to assign a service to a servicegroup instead of using the 'members' directive in the service group: *servicegroups*: This directive is used to identify the /short name(s)/ of the servicegroup(s) http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#servicegroup that the service belongs to. Multiple servicegroups should be separated by commas. This directive may be used as an alternative to using the /members/ directive in servicegroup http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#servicegroup definitions. I would like to do this using a service template that service definitions can use to do the assignment like the configuration below. This would save me from having to add many host,service pairs to the members directive in the service group. But it doesn't seem to work (I'm using Nagios 3.2.0). I get the following configuration error: Error: Servicegroup members must be specified in host_name,service_description pairs (config file ' ... I get the same error when I delete the service template and move the servicegroups directive into the service definitions. What am I doing wrong? Thanks, Paul Dubuc define servicegroup{ servicegroup_name ebusiness alias Business Services # members ; use servcicegroups in service definitions below instead. } # # Nagios service definition template used by services in this config file # define service{ nameebusiness-service register0 ; this is a template servicegroups ebusiness ; add the service to this service group } define service{ use ebusiness-service host_name host1,host2 service_description service1 check_command ... } # # SciFinder Password Change test service # define service{ use ebusiness-service host_name host1,host2 service_descriptionservice2 check_command ... } -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] How to access user-defined service variables in a command object
I'm trying to integrate the use of an internally developed alarm generation command into our Nagios configuration. So I want to define an Nagios command object that calls this command with arguments specific to the service that is generating the status condition that generates the alarm. One of the arguments is an alarm number. I can set this number in the service definition as a user defined variable: define service{ ... __ALARM_NUMBER 123 } Is it possible to access this variable in the command definition using on-demand macros? I tried to do this in the following way, but it doesn't seem to work: define command{ command_namenotify-service-by-alarm command_line/usr/local/bin/sendalarm $HOSTALIAS$ $_SERVICE_ALARM_NUMBER:HOSTNAME:SERVICEDESC$ $SERVICESTATE$ $SERVICEDESC$ $SERVICEOUTPUT$ } Is there an alternative? Thanks, Paul M. Dubuc -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] How to access user-defined service variables in a command object
I should have made more clear what I am trying to do below. I know I can access the service __ALARM_NUMBER from the command definition by giving the literal host_name and service description like this (I've updated the service definition in my previous example to illustrate): $_SERVICE_ALARM_NUMBER:localhost:DUMMY but I would like the command definition to be able to do this using the macro names $HOSTNAME$ and $SERVICEDESC$ so that one command definition works for all services that use it for notification. Is there a way to do this? I would not like to have to define a separate command and contact group for every alarm number. Also, I'm using Nagios 3.2.0. Thanks, Paul Dubuc Paul M. Dubuc wrote: I'm trying to integrate the use of an internally developed alarm generation command into our Nagios configuration. So I want to define an Nagios command object that calls this command with arguments specific to the service that is generating the status condition that generates the alarm. One of the arguments is an alarm number. I can set this number in the service definition as a user defined variable: define service{ host_name localhost service_description DUMMY ... __ALARM_NUMBER 123 } Is it possible to access this variable in the command definition using on-demand macros? I tried to do this in the following way, but it doesn't seem to work: define command{ command_namenotify-service-by-alarm command_line/usr/local/bin/sendalarm $HOSTALIAS$ $_SERVICE_ALARM_NUMBER:HOSTNAME:SERVICEDESC$ $SERVICESTATE$ $SERVICEDESC$ $SERVICEOUTPUT$ } Is there an alternative? Thanks, Paul M. Dubuc -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] How to access user-defined service variables in a command object
Sorry to have bothered the list. I was making the problem too hard because I was confused by what I'd read about on demand macros in Barth's book (p. 632). Using $_SERVICE_ALARM_NUMBER$ works in the command definition. I don't know why I didn't try that first. For some reason I thought you had to specify the host and service description to get the value of the variable. Paul Dubuc Paul M. Dubuc wrote: I should have made more clear what I am trying to do below. I know I can access the service __ALARM_NUMBER from the command definition by giving the literal host_name and service description like this (I've updated the service definition in my previous example to illustrate): $_SERVICE_ALARM_NUMBER:localhost:DUMMY but I would like the command definition to be able to do this using the macro names $HOSTNAME$ and $SERVICEDESC$ so that one command definition works for all services that use it for notification. Is there a way to do this? I would not like to have to define a separate command and contact group for every alarm number. Also, I'm using Nagios 3.2.0. Thanks, Paul Dubuc Paul M. Dubuc wrote: I'm trying to integrate the use of an internally developed alarm generation command into our Nagios configuration. So I want to define an Nagios command object that calls this command with arguments specific to the service that is generating the status condition that generates the alarm. One of the arguments is an alarm number. I can set this number in the service definition as a user defined variable: define service{ host_name localhost service_description DUMMY ... __ALARM_NUMBER 123 } Is it possible to access this variable in the command definition using on-demand macros? I tried to do this in the following way, but it doesn't seem to work: define command{ command_namenotify-service-by-alarm command_line/usr/local/bin/sendalarm $HOSTALIAS$ $_SERVICE_ALARM_NUMBER:HOSTNAME:SERVICEDESC$ $SERVICESTATE$ $SERVICEDESC$ $SERVICEOUTPUT$ } Is there an alternative? Thanks, Paul M. Dubuc -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null