[Nagios-users] OT - fault tolerant default router for Nagios host. [SEC=UNCLASSIFIED]
Dear Folks, I am writing to request comments on a proposal to reduce the risk of loss of Network visibility/spurious alerts etc caused by the failure of the Nagios host's default gateway. When the Nagios host is connected via multiple links, it is still necessary to ensure that data flow either both links or that somehow traffic is diverted to the other links. Solutions I have rejected include 1 Link teaming/bonding - immature in Linux 2 HSRP/VRRP - don't want to change network structure to suit Nagios and I can't afford fibre links from Nag to a core switch in the 'other' data centre. Otherwise this is a fine solution 3 Load sharing - half the traffic will be dropped if a link fails Here is what I think is the best fit: an application layer (non kernel) fault tolerant router. This could be implemented by 1 a Nag service check of the reachability of the default router 2 an event handler (run by sudo) that replaces the default router if the check returns CRITICAL HARD. Your comments are very welcome. Thank you, Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] RFC Possible bug in 3.0 alpha event handlers/macros ... [SEC=UNCLASSIFIED]
Dear Folks, Initial indications from RC2 are that event handlers are called with the _correct_ values of the macros. (This is a simulation: ie disable host/service checks and then submit a passive host check result to DOWN and UP a host) Tue Feb 12 07:47:01 2008 PASSIVE HOST CHECK: wtmrt200;1;Test of event handler/macros - bad val of LASTHOSTDOWN Tue Feb 12 07:47:01 2008 HOST ALERT: wtmrt200;DOWN;HARD;1;Test of event handler/macros - bad val of LASTHOSTDOWN Tue Feb 12 07:47:01 2008 GLOBAL HOST EVENT HANDLER: wtmrt200;(null);(null);(null);global_host_event_handler Tue Feb 12 07:49:43 2008 EXTERNAL COMMAND: DISABLE_HOST_SVC_CHECKS;wtmrt200 Tue Feb 12 07:49:43 2008 EXTERNAL COMMAND: DISABLE_HOST_CHECK;wtmrt200 Tue Feb 12 08:05:41 2008 EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;wtmrt200;0;Test of event handler/macros - bad val of LASTHOSTDOWN| Tue Feb 12 08:05:51 2008 PASSIVE HOST CHECK: wtmrt200;0;Test of event handler/macros - bad val of LASTHOSTDOWN Tue Feb 12 08:05:51 2008 HOST ALERT: wtmrt200;UP;HARD;1;Test of event handler/macros - bad val of LASTHOSTDOWN Tue Feb 12 08:05:51 2008 GLOBAL HOST EVENT HANDLER: wtmrt200;(null);(null);(null);global_host_event_handler And from the event handler log (that appends its args to a file) Tue Feb 12 07:47:01 2008 : wtmrt200 DOWN HARD 1202762761 1202762821 0 0. Tue Feb 12 08:05:51 2008 : wtmrt200 UP HARD 1202763951 1202762821 0 0. $ perl -le 'print join , map { scalar localtime($_) } qw(1202763951 1202762821)' Tue Feb 12 08:05:51 2008 Tue Feb 12 07:47:01 2008 $ First two time_t args in the call to the global event handler are $LASTHOSTUP$ and $LASTHOSTDOWN$. In this case these values of the arguments to the event handler correspond to the times the host went down and up, so on the basis of this test case, the values of the macros are being passed correctly to the event handler in Nagios 3.0 rc2. Bravo Nagios ! Classification: UNCLASSIFIED - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] RFC Possible bug in 3.0 alpha event handlers/macros ... [SEC=UNCLASSIFIED]
Dear Hugo, I am writing to thank you for your letter and say, [EMAIL PROTECTED] wrote: | If you want to use alpha/rc1, 2, 3 .. nagios, don't whine about it on | Nag users. The point is that doing a bug report on 3.0alphaX where there are at least 2 release candidates have followed is not usefull. If the problem still exists in in the latest release then it makes sense to report it as such. But for any software is it not usefull to use older versions to send in a bug report. So my recommendations still stands. Upgrade to the latest 3.0 release candidate and retest. Any other 3.0 version of nagios should be considere obsolete and a bug report against those versions is pointless. You are right. I beg your pardon. Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] RFC Possible bug in 3.0 alpha event handlers/macros ... [SEC=UNCLASSIFIED]
Dear Hugo, I am writing to thank you for your letter and say, Message: 8 Date: Sat, 09 Feb 2008 09:23:29 +0100 From: Hugo van der Kooij [EMAIL PROTECTED] Have you considered that the changelog might not be complete? Of course ! But don't you think a _major_ change in behaviour should be documented, or a serious bug - think about it, blowing third party software out of the water - acknowledged ? I strongly recommend you to DO upgrade first before you even think of sending in a bug report. Is it a bug ? If so, is it it fixed in rc2 ? If it hasn't been fixed in rc2, will it be fixed in the release ? If you cannot do so as soon as you have a couple of minutes then you should not be running 3.0 alpha to begin with. Hey man ! I spend time fulfilling _my_ responsibility by reporting a potential problem and being perfectly willing to be corrected, and you say I should not test new software and identify bugs - unless I am willing to do things you obviously are not - so that when it is released, others are saved others from those bugs ! I could have diffed rc1, rc2 and alpha for an undocumented change; I could have identified the code (maybe) at fault, and may-maybe submitted a patch; and yes, I was hoping, someone else might for me because it is not my code, I am not familiar with it, and I lack the talent to do it quickly, if at at all. In other words, that's why I am asking for help, having done as much as I could. Tell me I should upgrade to rc2 and the problem will go away because of this evidence (such as was sent), and I will gladly upgrade (since I was hoping to go to the release without every step, because for me, upgrade means package build, test, install and possibly rollback) and report the result. Otherwise, your message is clearly If you want to use alpha/rc1, 2, 3 .. nagios, don't whine about it on Nag users. And please do not reply to a message if you want to create a new thread. It makes a mess of any threading system (like the archives) and in this case even the subject is rather uninformative. I beg your pardon (my employers domain has changed so the mail with the correct subject and content bounced and as you can see from my tone, it is starting to become too hard). Hugo. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios-users Digest, Vol 21, Issue 5 [SEC=UNCLASSIFIED]
Dear Folks, Please would someone help me out with what may be a bug in global event handlers in 3.0 alpha (not rc1 or 2 since there is nothing in the Changelog that seems to warrant upgrade) ? I have a (gloal host) event handler called like so command_line$USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$ I expect that $LASTHOSTUP$ when $HOSTSTATE$ eq UP and $HOSTSTATETYPE eq HARD, to be the time the handler was called, and $LASTHOSTDOWN$ to contain, generally, the time that the host was detected in a (hard) down state. Right so far ? The Nagios logs show records like so [EMAIL PROTECTED] nagios]$ tail -500 nagios.log | perl -lne 'print if /Hobart/ /EVENT|HARD/ !/SERV/' | ./ns-time_t2localtime Sat Feb 9 03:13:39 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:13:59 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:15:19 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:16:39 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:17:59 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:19:19 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:20:39 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:21:59 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:23:19 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:24:39 2008 HOST ALERT: Hobart;DOWN;HARD;10;CRITICAL - Plugin timed out after 10 seconds Sat Feb 9 03:24:39 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 06:48:49 2008 HOST ALERT: Hobart;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 32.19 ms Sat Feb 9 06:48:49 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler but in 3.0alpha, the event handlers arguments have these values (dumped by the handler when it's called) Sat Feb 9 06:48:49 2008 : Hobart UP HARD 1202500129 1202499949 1200659464 0 or in localtime format, [EMAIL PROTECTED] nagios]$ perl -le 'print join , map { localtime($_) . } qw(1202500129 1202499949 1200659464)' Sat Feb 9 06:48:49 2008 Sat Feb 9 06:45:49 2008 Fri Jan 18 23:31:04 2008 ie the $LASTHOSTDOWN$ is 06:45:49 instead of 03:3:39 !! The event handler is perhaps foolish to rely on the macros, but what is wrong here ? Is it the macro value ? Is it the event handler call from Nagios ? Is it something that needs fixing before a 3.0 release ? I am sure this behaviour is different to that in 2.9 since I was using this event handler with only minor changes thruought the 2.x series and producing reports from that data each month (for about 18 months). The docco for 3.x LASTHOST macros is, as far as I can tell, exactly the same as for 2.x, so this appears to be an undocumented (and unwelcome) change. Any comments or suggestions are welcome. From my point of view, I will have to rewrite an event handler that worked fine with 2.9 since this stuff is VITAL to my availability reporting. Thank you, Yours sincerely. Classification: UNCLASSIFIED Dear Folks, Please would someone help me out with what may be a bug in global event handlers in 3.0 alpha (not rc1 or 2 since there is nothing in the Changelog that seems to warrant upgrade) ? I have a (gloal host) event handler called like so command_line$USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$ I expect that $LASTHOSTUP$ when $HOSTSTATE$ eq UP and $HOSTSTATETYPE eq HARD, to be the time the handler was called, and $LASTHOSTDOWN$ to contain, generally, the time that the host was detected in a (hard) down state. Right so far ? The Nagios logs show records like so [EMAIL PROTECTED] nagios]$ tail -500 nagios.log | perl -lne 'print if /Hobart/ /EVENT|HARD/ !/SERV/' | ./ns-time_t2localtime Sat Feb 9 03:13:39 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:13:59 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:15:19 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:16:39 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:17:59 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:19:19 2008 GLOBAL HOST EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler Sat Feb 9 03:20:39 2008 GLOBAL HOST EVENT HANDLER:
[Nagios-users] Global event handler problem in 3.0 ? [SEC=UNCLASSIFIED]
Dear Folks, In sehandlers.c I see if(log_event_handlers==TRUE) logit(NSLOG_EVENT_HANDLER,FALSE,GLOBAL HOST EVENT HANDLER: %s;%s;%s;%s;%s\n,hst-name,macro_x[MACRO_HOSTSTATE],macro_x[MACRO_HOSTS TATETYPE],macro_x[MACRO_HOSTATTEMPT],global_host_event_handler); which suggests that if the macro_x[FOO] vals are NULL, I will see what I do in the nagios.log [1199423092] GLOBAL HOST EVENT HANDLER: Sydney-backup;(null);(null);(null);global_host_event_handler Why on earth should the macros be undefined ? The DEBUG statements look good but how do I enable debug for event handlers ? Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Debugging global host event handler in Nag 3 [SEC=UNCLASSIFIED]
Dear Folks, The debug options in Nag 3.x are wonderful (especially for embedded Perl. It is no longer necessary to enable debugging in p1.pl. This Is a MASSIVE simply MASSIVE improvement. Thank you). (FWIW the debug options for event handlers in nagios.cfg are debug_level=16 # DEBUG VERBOSITY # This option determines how verbose the debug log out will be. # Values: 0 = Brief output # 1 = More detailed # 2 = Very detailed debug_verbosity=2 ) The debug file then shows the event handler being called with all the args [1199526258.177826] [016.1] [pid=12436] Propagating checks to immediate non-UNREACHABLE child hosts... [1199526258.177833] [016.1] [pid=12436] Pre-handle_host_state() Host: acisp014, Attempt=1/10, Type=HARD, Final State=1 [1199526258.177900] [016.1] [pid=12436] Running global event handler for host 'acisp014'.. [1199526258.177919] [2320.2] [pid=12436] Raw Command Input: $USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$ [1199526258.177928] [2320.2] [pid=12436] Expanded Command Output: $USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$ [1199526258.177935] [016.2] [pid=12436] Raw global host event handler command line: $USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$ [1199526258.177958] [016.2] [pid=12436] Processed global host event handler command line: /usr/lib/nagios/plugins/eventhandlers/global_host_event_handler acisp014 DOWN HARD 1199526081 1199526258 0 0 [1199526258.213601] [016.1] [pid=12436] Post-handle_host_state() Host: acisp014, Attempt=1/10, Type=HARD, Final State=1 even though it is logged like so in nagios.log [1199526258] HOST ALERT: acisp014;DOWN;HARD;1;DOWN BABY DOWN. [1199526258] GLOBAL HOST EVENT HANDLER: acisp014;(null);(null);(null);global_host_event_handler Later, [1199527058.256653] [016.1] [pid=12436] HOST: acisp014, ATTEMPT=1/10, CHECK TYPE=ACTIVE, STATE TYPE=HARD, OLD STATE=1, NEW STATE=0 [1199527058.256661] [016.1] [pid=12436] Host was DOWN/UNREACHABLE. [1199527058.256667] [016.1] [pid=12436] Host experienced a HARD recovery (it's now UP). [1199527058.256673] [016.1] [pid=12436] Propagating checks to parent host(s)... [1199527058.256679] [016.1] [pid=12436] Propagating checks to child host(s)... [1199527058.256685] [016.1] [pid=12436] Pre-handle_host_state() Host: acisp014, Attempt=1/10, Type=HARD, Final State=0 [1199527058.256750] [016.1] [pid=12436] Running global event handler for host 'acisp014'.. [1199527058.256767] [2320.2] [pid=12436] Raw Command Input: $USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$ [1199527058.256776] [2320.2] [pid=12436] Expanded Command Output: $USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$ [1199527058.256783] [016.2] [pid=12436] Raw global host event handler command line: $USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$ [1199527058.256806] [016.2] [pid=12436] Processed global host event handler command line: /usr/lib/nagios/plugins/eventhandlers/global_host_event_handler acisp014 UP HARD 1199527058 1199526258 0 0 [1199527058.385280] [016.1] [pid=12436] Post-handle_host_state() Host: acisp014, Attempt=1/10, Type=HARD, Final State=0 So the event handler is called correctly; there is ZERO likelihood of the failure to get the handler results being the fault of Nagios 3 (it is the fault of MY event handler). On the other hand 1 why does the log entry not show the event handler args apart from $HOSTNAME$ 2 why does debug not show the return code of the handler 3 why does the debug not show the ePN processing (as it does very well with service checks) of the event handler 4 why does the debug output showing the 'Expanded Command Output:' not show the macro values ? Your comments are very welcome. Thank you, Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Trouble with Global event handlers in Nag 3 (b7). [SEC=UNCLASSIFIED]
Dear Folks, I am writing to ask for help with global event handlers in Nagios 3.0 (b7). The handler that worked Ok with Nag 2.9 seems to work erratically (ie only some of the time; more often than not it doesn't do anything) with 3.0. Nag would log (in nagios.log) this message with 2.9 [1186030923] GLOBAL HOST EVENT HANDLER: Wollongong;DOWN;HARD;10;global_host_event_handler [1186031313] GLOBAL HOST EVENT HANDLER: Wollongong;UP;HARD;1;global_host_event_handler but with 3.0b7 these, [1199420252] GLOBAL HOST EVENT HANDLER: Sydney-backup;(null);(null);(null);global_host_event_handler [1199423092] GLOBAL HOST EVENT HANDLER: Sydney-backup;(null);(null);(null);global_host_event_handler but the definition of the handler command has not changed, command_nameglobal_host_event_handler command_line$USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$ It is wonderful to see that the 3.x series has debug level, and type in nagios.cfg, but I can't see what is useful with this problem. I think the event broker should be what I want even though I have no event broker module in use. When I try it I see a lot of messages about callbacks (which I didn't know I had). Any advice will be very welcome. Please point me to the FM if this is a change from 2.x that I haven't noticed. Thank you, Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Experience with 3.0b6 ... most good but a few tiny probs [SEC=UNCLASSIFIED]
Dear Folks, I am writing to present my observations on 3.0b6. The synopsis is excellent but there are a few non show-stopping problems. Firstly, the performance of host detection is simply excellent and those contemplating mega installations should take great heart. As soon as the service check complete the max_checks, the host down notifications are unlaunched. The developer/s have done a great job (as usual). For our site, it used to take nearly 2 minutes after service checks failed to detect down hosts. Problems 1 ARG macros with ePN (may be without) $ARG$s appear to be instantiated differently to 2.x, and in such a way as to cause Perl plugins using Getopt (ie expecting args) to barf if called without args. Workaround: command_name! in services.cfg 2 Aberrant intermittent ePN behaviour Fri Nov 9 14:30:59 2007 Warning: Check of service 'Redundant link is operational' on host 'TRASW210' did not exit properly! This happens occasionally. Restart sorts it. Plugins _known_ to be good with ePN in 2.x/passed by new_mini_epn (which prob needs revising). 3 global_event_handler strangeness Not sure if this is bad, but Nag log shows Thu Nov 8 21:04:28 2007 GLOBAL HOST EVENT HANDLER: NDCSW209;(null);(null);(null);global_host_event_handler when global event handler runs. Event handler still called with the same args and seems to do the 'right thing'. From my point of view, problem 2 is a concern (PITA to use the verancular). Looks like I will be trawling checks.c for the origin of this message. Thank you, Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] configuration directory and file directive ...perplexity [SEC=UNCLASSIFIED]
Dear Folks, I am writing to to express my gratitude for all the valuable (and good natured) contributions about this matter. All the suggestions were valuable and helpful. Thank you very much, Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] configuration directory and file directives ... perplexity (long and boring). [SEC=UNCLASSIFIED]
Dear Folks, Has anyone used the nagios configuration directive cfg_dir to point to an SMB (Windows) share ? The interest in doing this is that my colleagues hate vi and 'nix; they are qualified Cisco/Window admins who respect Nagios but have no sympathy with anachronistic editors. They would be much happier using notepad/ wordpad to edit the object configuration files. When I tried it for myself (Nagios 2.9, removing all the cfg_file directives from nagios.cfg and adding cfg_dir to point to the Windows share), Nagios complained about the main configuration file directive in cgi.cfg. When I changed cfg_dir to point back to the (untouched) Unix path, nagios -v nagios.cfg still complained. I had to 1 remove the cfg_dir directive 2 replace the cfg_file directives before it would stop whining. Thanks for any helpful comments. Yours sincerely. Stanley Hopcroft Data Communications 02 6211 6110 0412 766 832 Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] configuration directory and file directives ... perplexity (long and boring). [SEC=UNCLASSIFIED]
Dear Tim, (Yes, I am the nitwit). -Original Message- Mr Hopcroft, My first reaction was an unqualified yuk!, what nitwit would even consider this, then I noticed it was you, and having seen your ever useful posts since the Netsaint 0.0.7 days, I relented. Although hearing vi called anachronistic ruffles a couple of feathers. (OT. Have a look at what Rob Pike has been saying about Unix for some years. To an MSCE, vi is .. well I am happy with my original choice of words. I am not a Windows admin and am perfectly happy with vi to do my Nagios configuration [or my own home brew semi-automation] but not everyone who likes Nagios likes vi). Notepad isn't? No accounting for taste... (OT. For unambitious text mangling it's Ok. How many people depend on vi macros or even conditional substitution ?) I can't actually speak to your specific question, but it just seems like a scary thought. Better to run samba on the Nagios machine and let them mount it, and/or SVN. And then there's the GUI method, of course. Good thought but why should they change to suit one application ? At this site there are no Unix Sys admin skills (apart from me) and everyone likes Windows. Having the configs on Win means management is happy they are adequately backed up. Does cfg_dir=/Some/Path actually work ? and if so, would anyone be so kind as to paste a few lines containing these directives from their nagios.cfg ? Here is the problem, adding a cfg_dir to point to a _Unix_ directory like so *** *** 78,83 --- 78,87 # extension) in a particular directory by using the cfg_dir # directive as shown below: + cfg_dir=/etc/nagios + + # cfg_dir=/mnt/dest_smb/coms/NMS/nagios + #cfg_dir=/etc/nagios/servers #cfg_dir=/etc/nagios/printers #cfg_dir=/etc/nagios/switches [EMAIL PROTECTED] nagios]# causes [EMAIL PROTECTED] nagios]# nagios -v nagios.cfg Nagios 2.9 Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org) Last Modified: 04-10-2007 License: GPL Reading configuration data... Error: Unexpected token or statement in file '/etc/nagios/cgi.cfg' on line 23. *** One or more problems was encountered while processing the config files... Check your configuration file(s) to ensure that they contain valid directives and data defintions. If you are upgrading from a previous version of Nagios, you should be aware that some variables/definitions may have been removed or modified in this version. Make sure to read the HTML documentation regarding the config files, as well as the 'Whats New' section to find out what has changed. [EMAIL PROTECTED] nagios]# Take out cfg_dir and all is well. good luck! tim Thank you. Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] ePN patch - testers wanted. [SEC=UNCLASSIFIED]
Dear Folks, I am writing to invite testing of a small patch for the embedded Perl Nagios feature. Currently (2.10/3.x) ePN, when a plugin is modified (without a restart) refuses to run the modified plugin because compilation of the modified plugin fails when Perl attempts to redefine the modified plugins subroutines in the package corresponding to the modified plugin. The only work around is to restart Nagios. The patch deletes the Perl package (and therefore all the subroutines it contains) before the modified plugin is recompiled (the plugin and the package are therefore compiled into a nonexistent namespace). I would prefer to only send the patch to people who are 1 familiar with the ePN tradeoffs (the memory leak) 2 convinced that the ePN tradeoffs are outweighed by the benefits 3 have some experience with the memory footprint of the ePN Nagios and 4 those who do not object to testing a potentially unstable release of Nagios I have been using the patch for my production system (195 hosts, 356 service checks) with out any problem that is obvious to me (custom Perl plugins for SNMP checks of routers, spanning tree etc). I am particuarly keen on knowing whether the memory leak is worse. Please let me know privately if you are interested. Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Backing up Cisco router configs. Was: Nagios backup [SEC=UNCLASSIFIED]
Dear Folks, -Original Message- From: Cook, Garry [EMAIL PROTECTED] Subject: Re: [Nagios-users] Nagios backup You can use the 'archive' commands in recent IOS to have your config backed up to a TFTP server anytime the config is written to NVRAM, as well as at specified time intervals. Thanks, Garry -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Hawley Sent: Thursday, October 18, 2007 1:16 PM To: [EMAIL PROTECTED]; nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios backup Thanks Roger, I was looking at something that would perform backups when the config changes automatically. Ta Mike You mean you want your availability monitor to backup your router configs ? You could write a custom plugin to check for differences in runnin and startup configs and then do something with the running config; You could pay for CiscoWorks and sacrifice small furry animals until it runs; or you could try RANCID in conjunction with viewcvs (PHP application to publish the CVS). Have fun. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios-users Digest, Vol 17, Issue 35 [SEC=UNCLASSIFIED]
Dear Larry, I am writing to thank you for your letter and say, [mailto:[EMAIL PROTECTED] On Behalf Of Larry Low Sent: Friday, October 12, 2007 10:13 AM Subject: Re: [Nagios-users] Nagios 3.0b5 - ePN and perl caching [SEC=UNCLASSIFIED] Thanks Stanley, Using my check_ifoperstatus script. Available from http://www.nagiosexchange.org/Networking.53.0.html?tx_netnage xt_pi1[p_view] =1099 (FYI, you may want to check it with new_mini_epn [ distributed with Nagios ] to see if ePN is going to complain about it). I've done a few minutes of debugging and the first problem I see is the MTIME is not being populated. Here is my epn_leave-msgs.log. I added print LH $filename - $mtime = .$Cache{$filename}[MTIME].\n; while (my ($key,$value) = each %Cache) { foreach (@$value) { print LH $key - $_\n; } } right before it compares mtime and you will see below that MTIME is not populated. I also added a couple logs where MTIME is supposed to be set. print LH $mtime ; $Cache{$filename}[MTIME]= $mtime unless $delete ; print LH $Cache{$filename}[MTIME].\n; You will see below that $mtime is fine but $Cache{$filename}[MTIME] is not. I changed $Cache{$filename}[MTIME]= $mtime unless $delete ; to $Cache{$filename}[MTIME]= $mtime; and the problem goes away. I tested for $delete and it is being set to 1 every time. What is calling eval_file? Is this from the nagios core? Yep, base/checks.c. IIRC. checks.c also sets the value for $delete that is passed to eval_file (see http://nagios.cvs.sourceforge.net/nagios/nagios/base/checks.c?view=marku p use the source). The value that becomes $delete is set in checks.c as the value of DO_CLEAN. I think this value is set by configure. You may want to remove the config.cache (or whatever it is) and reconfigure with the appropriate settings. (On an unrelated matter, thank you for this thread since I think a simple mod to p1.pl will deal with the recompilation problem of putting symbols in the same stash and thereby raising the subroutine already exists exception). Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nagios 2.9 ePN INC line [SEC=UNCLASSIFIED]
Dear John, From: john [EMAIL PROTECTED] Subject: Re: [Nagios-users] nagios 2.9 ePN INC line I guess that is an option, but I'd prefer not to have to do that for all the additional modules/plugins that I end up with. Does version 3 behave better as I may be able to hold off for that before changing the main monitoring node. There is no difference in ePN between v2 and v3 (or very little). Thanks, john On Fri, 12 Oct 2007, David Fulton wrote: Symbolic link the NET::DNS plugin to one of those directories (like /usr/lib/perl5/site_perl/5.8.8) and it should find the module after that. I gave up on trying to change how ePN looks for modules when it couldn't find utils.pm in my plugin directory. Since all the default PERL nagios plugins need that I just made a symlink. Works smooth as silk. I haven't been following this thread but with respect you maybe mistaken in blaming ePN for INC problems. ePN is Perl no ifs or buts. If Perl can find the path to the plugin, and Perl has not been changed since ePN was built, they should have the same view of INC. The standard plugins put utils.pm in a non standard Perl path, so most of my plugins have an added 'use libs q/usr/lib/nagios/plugins; ' #RHEL3 From the OP point of view, if you have upgraded Perl since building ePN, recompile ePN/Nagios so it can get the version (5.8.8) dependent paths. Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios 3.0b5 - ePN and perl caching [SEC=UNCLASSIFIED]
Dear Larry, I am writing to thank you for your letter and say, -Original Message- Message: 11 Date: Thu, 11 Oct 2007 09:21:35 -0700 From: Larry Low [EMAIL PROTECTED] This is without making changes to the script. Scenario: 1) ./configure --prefix=/opt/Nagios --enable-event-broker --with-embedded-perl (I have tried --without-perlcache as well and have not had time to sift through code to see if this is the actual problem) 2) Have an ePN script with sub print_help 3) Execute 1st check of ePN script, returns OK, no problem 4) Execute 2nd check of ePN script and ePN compile reports print_help function is redeclared Thank you for the very clear synopsis of the problem. I have not had that experience with 3.0b4 (a few funny behaviours, but by and large, like 2.9). If I compile without embedded-perl the problem does not exist. I should probably post this to the devel list. You have got someones attention here. FYI (and also to save me a reply to Andreas) the ePN stuff mainly happens in p1.pl. This code 1 manages a cache of compiled plugins (ie checks if the mtime of the plugin is different to the cached value and recompiles if it is) 2 transforms the plugin to a Perl subroutine in a package named (with a mangled name) like the plugins file name 3 calls the subroutine if the mtime has not changed (and the compilation is clean). What you describe should not happen, and moreover the new stuff in 3.0 has not changed (at least as far as I can see) the interface (from that in 2.x) to Perl in checks.c and utils.c. Would you send me the plugin privately so I can inspect it ? Larry Low Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios 3.0b5 - ePN and perl caching [SEC=UNCLASSIFIED]
Dear Larry, There are debugging hooks in p1.pl that would be useful to enable. If you are interested in helping deal with this problem please would you 1 Back up your original copy of p1.pl (path is specified in nagios.cfg IIRC) 2 Change the DEBUG_LEVEL to use constant DEBUG_LEVEL = LEAVE_MSG | CACHE_DUMP ; 3 Change the DEBUG_LOG_PATH to something appropriate for your system eg use constantDEBUG_LOG_PATH = '/tmp/' ; 4 Make p1.pl still compiles (perl -c p1.pl should be nag free; $? == 0) 5 Restart Nagios (IIRC, all this is documented in POD format in p1.pl, so perldoc p1.pl should show ... blah blah Extra logging is given by setting DEBUG_LEVEL to include LEAVE_MSG 1 opens an extra output stream in the path given by the value of DEBUG_LOG_PATH 2 logs messages describing the success or otherwise of the plugin com- pilation and the result of the plugin run. An example of such messages are Fri Apr 22 11:54:21 2005 eval_file: successfully compiled /usr/local/nagios/libexec/check_bass . Fri Apr 22 11:54:21 2005 run_package: /usr/local/nagios/libexec/check_bass returning (0, BASS Transaction completed Ok. ). Fri Apr 22 11:55:02 2005 eval_file: successfully compiled /usr/local/nagios/libexec/check_ad -D production.prod -S. Fri Apr 22 11:55:02 2005 run_package: /usr/local/nagios/libexec/check_ad -D foo.dom -S returning (0, Ok. Expected 2 domain controllers [foo1 foo2] for foo.dom.prod domain from 1.1.2.3 DNS, found 8 [foo1 foo2 ..] ). .. blah blah ) In my case I see [EMAIL PROTECTED] bin]# perl -c p1.pl p1.pl syntax OK [EMAIL PROTECTED] bin]# diff -c p1.pl.orig p1.pl *** p1.pl.orig 2007-10-12 14:09:24.0 +1000 --- p1.pl 2007-10-12 14:09:56.0 +1000 *** *** 10,22 use constant CACHE_DUMP = 2 ; use constant PLUGIN_DUMP = 4 ; ! use constant DEBUG_LEVEL = 0 ; # use constantDEBUG_LEVEL = CACHE_DUMP ; # use constantDEBUG_LEVEL = LEAVE_MSG ; ! # use constantDEBUG_LEVEL = LEAVE_MSG | CACHE_DUMP ; # use constantDEBUG_LEVEL = LEAVE_MSG | CACHE_DUMP | PLUGIN_DUMP ; ! use constant DEBUG_LOG_PATH = '/usr/local/nagios/var/' ; # use constantDEBUG_LOG_PATH = './' ; use constant LEAVE_MSG_STREAM= DEBUG_LOG_PATH . 'epn_leave-msgs.log' ; use constant CACHE_DUMP_STREAM = DEBUG_LOG_PATH . 'epn_cache-dump.log' ; --- 10,22 use constant CACHE_DUMP = 2 ; use constant PLUGIN_DUMP = 4 ; ! # use constantDEBUG_LEVEL = 0 ; # use constantDEBUG_LEVEL = CACHE_DUMP ; # use constantDEBUG_LEVEL = LEAVE_MSG ; ! use constant DEBUG_LEVEL = LEAVE_MSG | CACHE_DUMP ; # use constantDEBUG_LEVEL = LEAVE_MSG | CACHE_DUMP | PLUGIN_DUMP ; ! use constant DEBUG_LOG_PATH = '/tmp/' ; # use constantDEBUG_LOG_PATH = './' ; use constant LEAVE_MSG_STREAM= DEBUG_LOG_PATH . 'epn_leave-msgs.log' ; use constant CACHE_DUMP_STREAM = DEBUG_LOG_PATH . 'epn_cache-dump.log' ; and [EMAIL PROTECTED] nagios]# more /tmp/epn_leave-msgs.log Fri Oct 12 14:17:08 2007 eval_file: successfully compiled /usr/lib/nagios/plugins/check_sysUpTime -R 10.208.1.254. Fri Oct 12 14:17:08 2007 run_package: /usr/lib/nagios/plugins/check_sysUpTime -R 10.208.1.254 returning (0, sysUpTime of router 1 0.208.1.254 is 231 days, 18:14:31.55). Fri Oct 12 14:17:17 2007 eval_file: /usr/lib/nagios/plugins/check_sysUpTime already successfully compiled and file has not changed; skipping compilation. Fri Oct 12 14:17:17 2007 run_package: /usr/lib/nagios/plugins/check_sysUpTime -R 10.36.103.254 returning (0, sysUpTime of router 10.36.103.254 is 269 days, 00:03:26.48). Fri Oct 12 14:17:22 2007 eval_file: successfully compiled /usr/lib/nagios/plugins/check_backuplinks -N BRUSW200. Fri Oct 12 14:17:22 2007 run_package: /usr/lib/nagios/plugins/check_backuplinks -N BRUSW200 returning (0, Ok. All links from br usw200/10.0.254.167 to mtasw200 via Etherchannel _are_ in up operational status. Redundant topology Ok.). [EMAIL PROTECTED] nagios]# Unfortch, although the log stream should be unbuffered, it wasn't being flushed while Nag was running. I had to restart Nag again to get the messages flushed (when I changed the path for the log messages). You prob should ensure that the problem plugin is scheduled frequently (eg each 5 mins) and let it run for about 5 check periods. Please post the results to the list. Thank you, Yours sincerely. Stanley Hopcroft Data Communications 02 6211 6110 0412 766 832 Classification: UNCLASSIFIED
Re: [Nagios-users] Nagios 3.0b5 - ePN and perl caching [SEC=UNCLASSIFIED]
Dear Larry, I am writing to thank you for your letter and say, -Original Message- Message: 9 Date: Tue, 9 Oct 2007 17:01:41 -0700 From: Larry Low [EMAIL PROTECTED] Subject: [Nagios-users] Nagios 3.0b5 - ePN and perl caching The problem I am having is with subroutines inside of perl scripts ran under the ePN. If a subroutine is defined the next time the script or similar script is executed the ePN compile fails with a duplicate subroutine reported as the problem. I believe this problem does not exist under the 2.x code. Do you mean that, after you change a plugin that Nagios is running, ePN fails to compile the modified plugin with this (spurious) error code ? If so, I am running 2.9 and have this problem. Reluctantly I have found that the only way to deal with this is restart Nagios after a plugin mod. It is a PITA. Patches welcome. The Larry Low Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios 3.0b1 and perl check plugins [SEC=UNCLASSIFIED]
Dear Folks, In the last 4 - 6 weeks there were reports of failures of embedded Perl in Nagios 3.0 betas. I am running 3.0b3 with the event broker and ePN with a full complement of non standard Perl plugins known to work with ePN in 2.9 (my employers production system). This site is RHEL 3 + Perl 5.8.0 (EL 3 RPM). There are a few strange behaviours (eg a warning apparently from a standard Perl module Getopt::Long), but after about 20 mins running it appers not too bad (ie all Perl plugins go belly up). Can those that interested in helping sort out ePN with 3.0 contact me (or mail to the list) with bug reports or better still patches ? Thank you, Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Perl checks in 3.0 [SEC=UNCLASSIFIED]
Dear Folks, There is a comment in base/checks.c that may be relevant to the probs with embedded Perl in 3.0 betas. There was apparently a long standing bug of freeing memory associated with the Perl plugin output _before_ it was copied to Nagios. The comment says the bug was corrected by a patch sent by Hendrik B. of July this year. Nice one Hendrik. Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] SLA reports [SEC=UNCLASSIFIED]
Dear Matthew, I am writing to thank you for your letter and say, Hi there, I wish (well, have been told to..) to produce SLA type reports of our IT systems for management. At the moment the requirements are rather vague... As we are currently using NDO I am hoping that Jasper Reports may be used to pull reports directly from the database. Poking around I can find no reference to people having done so. I haven't heard of too many people taking the next big leap with Nagios, namely, using the NDO infrastructure as the basis of availability reporting. OTOH there are some who doing what you are proposing, one at least with the NDO outage table. At the moment, my employer has an event handler that stashes outage data in a table and some home-brew (Perl/DBI/Spreadsheet::WriteExcel) to generate some reports (including SLA reports) (ie this is NOT an NDO application. However, obviously this is the way to go and once I get enough time and energy, I would like to pursue this). Doing an SLA report is basically filtering the outage times against the SLA time period. Amazingly enough, Nagios already does a lot of this sort of filtering when it determines on the basis of time-periods whether or not to notify contacts. It may therefore possible that the Nagios core could provide more SLA support than it does by only actioning outages that occur within the SLA. However, irrespective of future core support, you could acheive something like the same result by only running checks for the time period corresp to your SLA and therefore you would only get outages within the SLA. If on the other hand you want to filter the outages in the NDO tables, there is a Nagios::SLA that is used here, but since I have no idea what Jasper reports is/does, you may not need this. (if you are interested in Nagios::SLA let me know privately. It is not published and may not be for quite a while since I am busy trying to pass 642-901). Any advice while I am still at the stage of working out what management want? Yep. Write the all singing all dancing Nag availability reporting package and earn everlasting fame. For bonus marks, donate it to the project (or maintain it). Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Log monitoring with SEC and Nagios. [SEC=UNCLASSIFIED]
Dear Risto (Thank you very much for SEC, the king of event correlators). Message: 19 From: Risto Vaarandi [EMAIL PROTECTED] Subject: [Nagios-users] Log monitoring with Nagios - recommendations? hi all, few weeks ago I posted a question to this list about passive service checks - I was actually experimenting with Nagios as an event log monitoring GUI. I am tracking event logs with SEC and also sending out alerts with it, but I would still like to see correlated log messages in Nagios web interface as well. I used to use (and enjoy) SEC to inject passive service check results to Nagios. Is that an option in this case ? Yours sincerely. Classification: UNCLASSIFIED - - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Monitoring multicast - any ideas [SEC=UNCLASSIFIED]
Dear Folks, Does anyone have any wisdom to offer about monitoring multicast applications ? The context is Cisco PIM, so the Cisco Mroute MIB is an obvious place to start. What makes it a bit harder is that the application of interest (TV broadcasting) uses an MS Media server that simply joins any old group when it starts. In other words, the clients learn from the Web page what group they should join, and since the mroute table is indexed by group, the group needs to be known or the whole table is checked for the presence of the server. Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios 3.0b1 and perl check plugins [SEC=UNCLASSIFIED]
Dear Folks, The reports about this matter are a bit perplexing in that 1 if the Nag internals have changed so that the ePN functions do not have their return values processed correctly, or the ePN functions are not called in the same way as 2.x, then the SEGV is expected with ePN. 2 However, if Nag is recompiled without ePN, the plugins are run by the shell in exactly the same way that other plugins are, and therefore if the Perl plugin works from the command line, it should work with Nagios 3.x IIRC, the ePN log shows that the Perl harness is doing the right thing and so there is something in Nags internals that needs investigating for ePN to work properly. So 1 is just a bug and should get fixed when I can focus some time and energy. OTOH, 2 is inexplicable. Are all Perl plugins failing to run under 3.x Nagios _without_ ePN. Please would you list some of the plugins that work under 2.x but fail under 3.x ? It would probably be good to check out the appropriate DEBUG setting (DEBUG_3) and copy the list with the result. Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_tacacs_plus.pl [SEC=UNCLASSIFIED]
Dear Folks, Message: 8 Date: Tue, 22 May 2007 18:47:21 -0700 From: Daniel Lacey [EMAIL PROTECTED] Subject: Re: [Nagios-users] Any experience with check_tacacs_plus.pl I don't know this platform, but A TACACS+ server's password database should be invisible to a TACACS client. The server's purpose is to authenticate in a way that makes such details irrelevant. I would create a separate user for this with little to no authorization... You just need to test the authentication server. The user and password will be stored somewhere in plain text so that the script using Authen::TACACSPlus will know how to connect to the server. There are source RPMS for Authen::TACACSPlus so the overhead of this Perl plugin is not too bad. check_tacacs_plus works nicely with the Cisco Secure ACS after 1 the ACS is configured to recognise the Nagios hosts (ie names + addresses of all interfaces) 2 a user is created on the ACS that the plugin will use to check that the users password is validated. A less attractive aspect of this plugin is that the TACACS+ secret key needs to be known to the Nagios host. Having a separate (from production) key seems like a good idea but since the plugin accepts username and pw as options, they are visible to other users on the Nagios host (unless you use ePN or hack the plugin). I am grateful to the plugins authors (P Farmer et al) for this. Nice job. Thank you, Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Any experience with check_tacacs_plus.pl (NagiosExchange) or Authen::TACACSPlus [SEC=UNCLASSIFIED]
Dear Folks, Please would you let me or the list know of experience checking the TACACS+ server implemented by Cisco in their 'Secure ACS for Windows 3.3' product ? Nagios Exchange has a plugin named check_tacacs_plus.pl that makes use of the Authen::TACACSPlus module from CPAN. I am not sure these will be helpful in checking a Secure ACS that uses Windows/AD authentication. That said, since I am very ignorant about TACACS+ I am probably wrong in thinking that ASCII, CHAP or MS-CHAP (the alternatives supported by Authen::TACACSPlus) passwords don't sound right for Windows/AD authentication. check_tcp on port 49 is a useful standby but hopefully there are other, non SNMP, alternatives. Thank you, Yours sincerely. Classification: UNCLASSIFIED - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Potential BUG report Was: Nagios 3.x + ePN = Garbage data in status [SEC=UNCLASSIFIED]
Dear Folks, I think there is a bug in the Nag 3.x processing of the plugin output returned by an ePN check. Message: 7 Date: Mon, 7 May 2007 11:29:11 -0400 From: James Whittington [EMAIL PROTECTED] Subject: Re: [Nagios-users] Nagios 3.x + ePN = Garbage data in status ..[SEC=UNCLASSIFIED] I turned on logging in the p1.pl and here's what I got. I have cut and pasted a couple of plugins as seen in the epn logfile and through the webui of nagios. I think epn is getting the correct response from the plugin, and the performance information looks good in the user interface, but the status field in the user interface has garbage data mixed in with valid data. This is nagios-3.0a2 by the way. From epn_leave-msgs.log : Mon May 7 10:49:08 2007 run_package: /usr/lib/nagios/plugins/check_rfinput -H10.0.5.26 -Cpublic -Onagios returning (0, -34 dBm|rf-input=34;58;60;22;80). From Nagios Web UI: Current Status: OK (for 6d 4h 51m 46s) Status Information: ?FdBm Performance Data: rf-input=34;58;60;22;80 Current Attempt: 1/1 (HARD state) Last Check Time: 05-07-2007 10:49:08 Check Type: ACTIVE Check Latency / Duration: 0.287 / 1.486 seconds Nagios seems to be discarding the plugin output (that is normally put in the Status Information field of the UI) but retaining the performance data. From epn_leave-msgs.log : Mon May 7 10:58:03 2007 run_package: /usr/lib/nagios/plugins/check_radio_status -H10.0.7.46 -Cpublic returning (0, Status: No Alarms Uptime: 297 Days UAS: 0 SES: 0). From Nagios Web UI: Current Status: OK (for 24d 8h 39m 16s) Status Information: (No output returned from plugin) This plugin returns no Perf data so unfortch you get yada in both of the extended information panel fields. Performance Data: Current Attempt: 1/1 (HARD state) Last Check Time: 05-07-2007 10:58:03 Check Type: ACTIVE Check Latency / Duration: 156.153 / 3.484 seconds Please let me know if I need to try anything else. One last matter: do you get beeped ? Is it only the UI that is wrong or is Nagios also treating the plugin response as a failure ? If you are getting beeped (ie HARD error from ePN plugin) the fault is prob in checks.c otherwise the CGIs may be the culprits. I agree with your conclusion: this looks to me like a bug in the Nag handling of the data returned by ePN. The problem does _not_ appear to be ePNs return of the data since if that was the case, there would be no Performance Data (which is data appended to the plugin output following a pipe symbol): some plugin output is getting back to Nagios so it must be Nagios incorrectly processing the plugin output. I think there's nothing more to be said apart from finding the bug, probably in checks.c (although it could be in other code that also processes data returned by Perl; event handlers for example). I am hoping that the Nag developers will see this and comment. Thanks, James Whittington [EMAIL PROTECTED] Classification: UNCLASSIFIED - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Does anyone use and love NDOUtils for availability reporting .. ? [SEC=UNCLASSIFIED]
Dear Folks, I am writing to thank you for your letter and say, -Original Message- Message: 4 Date: Fri, 6 Apr 2007 12:24:47 +0100 From: Rob Blake [EMAIL PROTECTED] Would anyone like to comment on the use of NDOUtils (Nagios 2.x or later) for availability reporting ? NDOUtils will take the majority of the data associated with your Nagios installation and send it to a database for you (currently only mysql is supported). You can store information about your current setup, notifications, current host/service status, the results of checks etc. With this information in a database you are free to do what you want with it. I believe the current plan is to leverage the data that is stored in the database to faciliate a complete overhaul of the current Nagios frontend. There is absolutely nothing stopping you from putting together your own application that makes custom graphs, custom reports based around the data available to you. You can use whatever language you like, through whatever presentation medium you like. You are simply limited by the connection to the database, and as I assume you will be managing the database, this shouldn't be a problem. This is excellent. While the availability CGIs are excellent they do not 1 facilitate arbitrary presentation of the data (without say importing the CSV output into a DB) 2 allow the combination of the outage data with other information such as links to 'trouble ticket/service desk (for the ITIL inclined)' systems for combining the outage data with other views of the 'incident' (such as WTF caused it). Having the outage data in tables should lead to an explosion of third party/community developed reports and presentation frameworks in the same way that the very clean architectural divisions in Cacti has lead to that products extensibility and popularity. Thank you. Rob Yours sincerely. Classification: UNCLASSIFIED - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Does anyone use and love NDOUtils for availability reporting ? [SEC=UNCLASSIFIED]
Dear Folks, Would anyone like to comment on the use of NDOUtils (Nagios 2.x or later) for availability reporting ? I believe that NDOUtils inserts rows representing down times in an MySQL table, making it much easier for DIY reporters to produce reports. I am currently using an event handler for adding outage records to a table but I am not happy with this method. Thank you, Yours sincerely. Classification: UNCLASSIFIED - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Nagios memory Leaks
Dear Sir, I am writing to thank you for your valuable letter and say, From: Tobias Klausmann [EMAIL PROTECTED] Subject: [Nagios-users] Memory leaks Hi! (First off: if this should also go to nagios-devel, just yell at me.) I don't think so because it deals with the aspects of the implementation that are visible (and in fact, the letter doesn't propose detailed solutions). Nagios 2.6 and 2.5 have memory leaks. They are not that big that within hours your machine will be swapping, but they degrade performance in other ways. First off, their approximate extent. 2.5 and 2.6 without perl cache have the smallest memory leaks. A fairly busy Nagios server (hardware quoted below) with about 3000 services on about 330 hosts will degrade from 330M used (that's *not* Nagios alone) to 368M used in about 16 hours. Or about 2.4 MB per hour. The very same machine behaves neutral if Nagios is not running, so it's definitely Nagios itself. Do you mean: 2.5 and 2.6 Nagios with embedded Perl but without the Perl plugin cache option ? If so, the fault is not Nagios, but the embedded Perl implementation and or Perl. Your next paragraph suggest that this is plain vanilla Nagios without any Perl options to configure. Is that correct ? Activating the embedded Perl interpreter and -cache will increase the amount of lost memory to about 5-6M per hour. In this case, however, sometimes the memory usage snaps back, i.e. some of the lost memory is collected. I've not yet found out what triggers the reclaim. Still, over the course of hours, more and more memory is lost. Still, it's roughly linear memory loss. I have never witnessed memory being reclaimed after ePN leaks it. I can't conceive of the process memory size being reduced while the process is running (free() and friends only return the memory to the process heap). I think the leak is caused by the ePN implementation. I a hoping to trying some measurements with several pilot implementations to see what is the most promising way of doing this. ... (snip) Yep. I agree. The leak is bad. The question that remains is, if this can (and will) be tackled before 3.0 is released. A related question is if Nagios 3 will be prone to the same problem. Certainly it will if the current ePN implementation remains. If (pretty big if) I can provide you stuff to try are you willing to repeat your measurements on candidate implementations (wrt 2.5 or 2.6 code base) ? I am not sure of my willingness/energy quotient but if they look Ok, I may not have anything to show until March this year. Any thoughts, ideas etc. are appreciated. Regards, Tobias Yours sincerely. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] ePN Was Performance issues.
Dear Sir, I am writing to thank you for your letter and say, -Original Message- From: Robert Hajime Lanning [EMAIL PROTECTED] Subject: Re: [Nagios-users] Performance issues, too1 .. skipped helpful remarks. The perl code path that runs in the master Nagios process (after all pluggins have been compiled successfully and you remove argument caching) is: sub eval_file { my ($filename, $delete, undef, $plugin_args) = @_ ; my $mtime = -M $filename ; if ( exists($Cache{$filename}) $Cache{$filename}[MTIME] ($Cache{$filename}[MTIME] = $mtime)) { if ( $Cache{$filename}[PLUGIN_ERROR] ) { ... } else { return $Cache{$filename}[PLUGIN_HNDLR]; }; }; }; I am not sure where the leak is, unless it is in the interpreter itself. It probably is, since most of the published documents (eg perlembed, 'Extending and Embedding Perl') emphasise the _big_ tradeoffs with embedding Perl. Thank you for repeating the code in your letter as I was trying to remember how it works and grappling with the fact that once the plugins are converted into Perl subroutines and compiled, the C caller (in checks.c) should simply be able to load the Perl stack with the Plugin arguments - as is done in checks.c - and then call Perl_call_sv() with the subroutine reference returned by eval_file (the content of return $Cache{$filename}[PLUGIN_HNDLR]). What happens is more complicated than this and I can't see why at the moment. (Part of the complexity is that the C args must be converted to Perl, and I think I preferred a second call to Perl to do this [after which Perl calls the subroutine itself] rather than converting the arguments - which is tricky - in C and then calling the subroutine from C). 1 refactor the Perl/C interface with a view to improving efficiency/readability/comprehensibility (I thought I understood ..) 2 consider a different approach with PPerl. BTW, for those wishing to play with this, contrib/new_mini_epn.c has most of the guts of the C interface (and uses the same Perl driver in p1.pl) may be the easiest way to start. (The main difference between this and the Nag code is that the Nag code forks for each plugin). Thank you, Yours sincerely. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Performance issues, too1
Dear Sir, I am writing to thank you for your letter and say, -Original Message- -- Message: 2 Date: Tue, 2 Jan 2007 08:26:34 +0100 (CET) From: Daniel Meyer [EMAIL PROTECTED] Subject: Re: [Nagios-users] Performance issues, too Hi there, and happy new year :-) Program Running Time: 10d 21h 22m 42s So, for almost eleven days nagios runs smoothly now, no more latency problems. I'll try it again with EPN (but still without perlcache) now. Context is massive memory leak with ePN. Leak goes when ePN is removed. Firstly, look at the caveats for ePN at http://nagios.sourceforge.net/docs/2_0/embeddedperl.html There should be added another major caveat to this: depending on your plugins you may have a bigger or smaller leak, however leak it will. For me, I wouldn't consider Nagios without ePN since I code most of my plugins in Perl and the advantages for me (and this installation) outweigh the leak. Finally about the meaning of the configure switches for ePN. 1 --enable-embedded-perl This builds Perl into the Nagios executable and at the least means that your system does __not__ fork a new process to run Perl plugins. Instead, Perl is parsed and run by direct calls to the Nagios binary. So, setting this switch saves a context switch. 2 --with-perl-cache If in addition, this switch is set, the Perl plugin is compiled only once (otherwise, each time Nagios goes to run a Perl plugin, it recompiles it). The resultant Perl op code tree remains in memory. Unfortuntaely, for reasons that are not clear to me, this is the source of the leak. Danny -- Yours sincerely. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios-users Digest, Vol 7, Issue 32
Dear Sir, I am writing to thank you for your letter and say, -Original Message- Message: 1 Date: Tue, 19 Dec 2006 15:31:25 -0600 From: Craig Van Tassle [EMAIL PROTECTED] Subject: [Nagios-users] Getting pie charts in host's history -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 My boss wants to be able to look at pie charts when ever we are looking at history for various services. So far I have only been able to get a bar graph that is showing the uptime and down time, but not I have not been able to get a pie chart to display. I am using ubuntu 6.10, with nagios 2.4 Any help or though would be appreciated. Thoughts only. Pie chart of what - a histogram of hosts in availability ranges (eg 0-90%, 90.1-98.5, 98.6 - 99.95, 99.95%) ? If you want to do it with the standard tools you will have some work counting numbers in each cathegory and then charting. The approach might be 1 extract the availability data as CSV 2 import into the charting tool of your choice (eg OpenOffice) 3 chart We 1 have an event handler that inserts a record into a DB when a host comes up (yes, this is pretty wonky) 2 have SQL that does the histogram stuff 3 use Perl Spreadsheet::WriteExcel to do the charting. This isn't partic good but it provides the charts that managers insist on (rightly). HTH Craig - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios-users Digest, Vol 2, Issue 8
Dear Folks, For the benefit of the archives, Dear Folks, Is it possible to use the standard plugin distro check_ping to distinguish a reachability failure brought about by sluggish transport and one caused by a routing failure. I think the best way to do this is with a plugin that returns either OK or CRITICAL depending on whether the host is contactable (CRITICAL also if the plugin times out). This means that there is never any ambiguity between a congested unresponsive link and a host on an unreachable network. One way of doing this is with an SNMP ping (eg check sysUpTime) to a router on the subnet of interest. (The actual application is determining if user subnets are reachable when there is a routing protocol on each of the leaf [user subnet] routers tunnelled through an MPLS transport. All sorts of wierd stuff can happen. We use a routing protocol even on the single exit leaf nodes because we cannot trust the provider [eg the only routing protocol they provide is RIP ...]). Yours sincerely. - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Really dumb question - how do folks distinguish unreachable from over threshold ?
Dear Folks, Is it possible to use the standard plugin distro check_ping to distinguish a reachability failure brought about by sluggish transport and one caused by a routing failure. What occurs to me at the moment is to either 1 don't use check_ping in cases of volatile routing (examine routing with SNMP or CLI) 2 have a service event handler that reacts to 'plugin time out' (by ultimately generating a HOST_DOWN passive service check result if the routing has failed). Thank you, Yours sincerely, S Hopcroft Data Communications Dept of Education, Science and Training Level 1, 240 City Walk Canberra City ACT 2601 +61 2 6211 6110 Fax: +61 2 6123 6262 0412 766 832 [EMAIL PROTECTED] Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Monitoring Cisco 3750 stacks - OIDs or traps ?
Dear Folks, We like many others have happily deployed the Cisco 370 stackable switch/routers in stacked configurations. Please would anyone with experience of monitoring these units with Nagios comment on how best to monitor the performance of the internals. I am particuarly interested in checking that all the switches in the stack are present and correct. There appear to be 3 ways of checking this 1 Net::Telnet and parsing the output of 'show inventory' 2 Some OID with check_by_snmp (possibly from the CISCO-STACK-MIB) 3 Traps from the stack manager What experience have people had with these methods ? Thank you, Yours sincerely. Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios-users Digest, Vol 1, Issue 3212
Dear Folks, I am writing to thank you for your letter and say, -Original Message- Message: 1 Date: Wed, 7 Jun 2006 14:27:36 +0200 From: Rene Fertig [EMAIL PROTECTED] Subject: Re: [Nagios-users] How to monitor complex websites? To: nagios-users@lists.sourceforge.net Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=iso-8859-1 check_http version 1.89 (which comes with nagios-plugins 1.4.3) can set a User-Agent-String: -A, --useragent=STRING String to be sent in http header as User Agent ... snip But probably you should make your own plugin if you need special cookie support. bye, Rene You may want to revisit writing your own, since there's a new CPAN module FEAR::API for fearless programming of web clients. From http://www.perl.com/lpt/a/2006/06/01/fear-api.html ' FEAR::API's documentation says: FEAR::API is a tool that helps reduce your time creating site scraping scripts and helps you do it in an much more elegant way. FEAR::API combines many strong and powerful features from various CPAN modules, such as LWP::UserAgent, WWW::Mechanize, Template::Extract, Encode, HTML::Parser, etc., and digests them into a deeper Zen. (Here's an example that Fetch CPAN's homepage. Extract data with a template. Process links using a control structure. Print fetched content to STDOUT. Dump links in the page. Use YAML to print extract results ) It might be best to introduce FEAR::API by rewriting the previous example: 1use FEAR::API -base; 2url(search.cpan.org); 3fetch [ 4 qr(foo) = _feedback, 5 qr(bar) = \my @link, 6 qr()= sub { 'do something here' } 7]; 8fetch while has_more_links; 9extmethod('Template::Extract'); 10extract($template); 11print Dumper extresult; 12print document-as_string; 13print Dumper [EMAIL PROTECTED]; 14invoke_handler('YAML'); ' The article compares FEAR::API with the former standards WWW::Mechanize. Even if you decide that FEAR::API, the standard Perl HTTP modules do cookies parse HTML - in partic, extract links handle fill out forms HTH, Yours sincerely. ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] RE: Nagios config parser in Perl
Dear Sir, I am writing to thank you for your letter and say Message: 13 Date: Tue, 30 May 2006 16:47:02 +0200 From: Marc Haber [EMAIL PROTECTED] To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Parsing a Nagios 2 configuration file from perl Hi, I'd like to have a list of all host_name directives in host definitions in a nagios 2 template style configuration in a perl script. Did anybody already write a nagios 2 configuration file parser in perl? Yep. There's a (prob since I haven't used it) good one on CPAN in the Nagios name space (search Nagios should find it) written by a Perl luminary Al Tobey. You need the Build module to get it installed. There is also a rough-as-guts one that I use that is sometimes useful. If you want to be the guinea pig .. Here's an example of it in 'action'. [EMAIL PROTECTED] sh1517]$ perl -MNagios::Config -e '$x=Nagios::Config-new(/etc/nagios/nagios.cfg); @x=$x-grep(hosts, q[$host_name =~ /mt[ab]sw21/i]); $x-pprint(hosts, [EMAIL PROTECTED])' define host{ host_nameMTBSW210 address 10.0.254.149 alias14 MORT BUILDING contact_groups datacomms-admins,premier_support_group notification_period 24x7 parents MTASW200,BRUSW200 use generic-host } define host{ host_nameMTASW210 address 10.0.254.169 alias16 MORT BUILDING contact_groups datacomms-admins,premier_support_group notification_period 24x7 parents MTASW200,BRUSW200 use generic-host } extracting stuff. I actually use it (for some definition of 'use') for batch adds. But it is rough ... Greetings Marc caveat computer. Yours sincerely. --- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnkkid7521bid$8729dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] RE: Jabber Custom Notification Script Not Working (Frederik Vanhee)
Dear Folks, I am writing to thank you for your letter and say, -Original Message- Message: 1 Date: Wed, 24 May 2006 06:52:22 +0200 From: Frederik Vanhee [EMAIL PROTECTED] To: Norman Harebottle [EMAIL PROTECTED] CC: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Jabber Custom Notification Script Not Working Norman Harebottle wrote: Hello Everyone, I am wondering if anyone has had success getting the Jabber Notification script written by David Cox to work. However, when running the Perl script in the Embedded Perl environment, something happens (which is not logged) which causes the script to not execute as desired. embedded Perl Nagios like mod_perl is more sensitive to coding issues than the 'fork a new Perl interpreter for each run' model. There is every chance that a plugin that works fine from the command line won't work under ePN. You can turn ePN logging on by changing the DEBUG_LEVEL in p1.pl. Set it to LEAVE_MSG (or whatever is mentioned in perldoc p1.pl). You need to make sure the log path and name suits you also. In a former job, I found jabber/XMPP was very sensitive to the Perl modules and configuration of the Jabber server. I think we had an event handler that would publish stuff to jabber consoles with not much more than one of the example usages from the Perl jabber client. Unfortch my records are elsewhere. No experience unfortch with the jabber plugins. Yours sincerely. --- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnkkid7521bid$8729dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
RE: [Nagios-users] Undefined subroutine Embed::Persistent::eval_file called. in check_disk_smb
Dear Sir, I am writing to thank you for your letter and say, -Original Message- Message: 15 Date: Tue, 23 May 2006 14:10:03 +0200 (CEST) From: =?iso-8859-1?q?jacobo=20garc=EDa?= [EMAIL PROTECTED] To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Undefined subroutine Embed::Persistent::eval_file called. in check_disk_smb Under Nagios i had this problem when running check_disk_smb Undefined subroutine Embed::Persistent::eval_file called. It sounds like that you have a Nagios built with embedded Perl. (you can check by strings /Path/to/nagios | grep -i perl | head -5. If you see libperl.so Perl_croak Perl_markstack_grow Perl_croak_nocontext Perl_save_int then you have a Perl interpreter built into Nagios). If this is not the case, I can't help. Unfortunately if your Nagios does have Perl in it, the Perl driver (p1.pl) seems not to be where Nagios expects it. Look at your nagios.cfg (eg [EMAIL PROTECTED] Dhcp]$ grep -i P1 /etc/nagios/nagios.cfg # P1.PL FILE LOCATION # This value determines where the p1.pl perl script (used by the p1_file=/usr/bin/p1.pl [EMAIL PROTECTED] Dhcp]$ ) and check if p1.pl is in the location nagios.cfg says it should be. If it is, I don't know what is happening. If not, try and locate it - you can get it from the Nagios CVS or from the corresponding dist tarball or the RPM/package - and put it there. You could also check its not hiding somewhere else in the file system. If you can get a copy of p1.pl then either relocate it or put it in the path specified by nagios.cfg (p1.pl is used only by Nagios so moving it won't break anything). You may have to restart Nagios for the change to take effect. The file is plain text, pure Perl. p1.pl defines the subroutine that Nagios is trying to have Perl call. Finally, if you don't have a good reason to use embedded Perl, you probably shouldn't be using it. If you don't want to use embedded Perl, you can't turn it off at run time. The only option is replacing the Nagios binary with one compiled without embedded Perl. The Dag Wieers Redhat RPMs for Nagios all build with embedded Perl so if you use RPMs, you would need to choose another RPM or hack the Dag SPEC file to not build embedded Perl. when i run from command line i response on 2 lines, but it seems to be ok. i dont know what to do. Yours sincerely. --- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnkkid7521bid$8729dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
RE: [Nagios-users] How many parent hosts in a parent directives ?
Dear Sir, I am writing to thank you for your letter and say, -Original Message- I usually define HSRP as a child of the two parent routers that are participating. Both routers have to be down before the HSRP is marked unreachable. Devices behind those routers can then use the HSRP object as their parent. In cases where the HSRP address is closer to the Nagios server, obviously you'd flip the parent/child around Nagios | {Internet} | RtrA--+--RtrB \ / \_/ | RtrAB-HSRP | {other devices} Firstly, thank you very much to all those who answered both on the list and privately. For the benefit of the archives, the consensus of the replies is that the hosts behind multiple routers (the canonical example being a subnet with two routers sharing the host gateway address with HSRP/VRRP, although a more common example may be a single gateway that has multiple paths back to the monitoring host. In this case, the gateway will have one or more parents corresponding to routers in each of the paths) should enumerate each of the routers in the parents directive. Nagios checks each of the routers marked as a parent and if _any_ of them are up, then the host is marked as DOWN (probably a SOFT state); otherwise - if all the parents are unreachable, then the host is marked as unreachable. I think that Mr Eng's advice is consistent with this: the difference being that the HSRP is visble as a host and a service in the Nagios configuration and that Nagios will check the HSRP service by checking the reachability of the gateway address (as well as seperately checking the standby addresses). Thank you, Yours sincerely. --- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid0709bid3057dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Re: Nagios 2.3.1, problems with perl plugins
Dear Folks, I am writing to thank you for your letters and say, -Original Message- Message: 17 From: Michael =?iso-8859-1?q?H=FCttig?= [EMAIL PROTECTED] Organization: MSP To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] RE: Nagios 2.3.1, problems with perl plugins Date: Wed, 17 May 2006 15:40:22 +0200 Cc: [EMAIL PROTECTED] Hi Stanley, i=B4ve posted on monday the following message. I think there was a problem = with=20 embedded perl. Which perl-plugins do you use? I just upgraded my Nagios-2.0 to nagios-2.3. Both Versions were compiled wi= th=20 epn-support with --enable-embedded-perl --with-perlcache Runnning nagios-2.0 all checks using perl-plugins (check_smart,=20 check_cisco_env, check_ifoperstatus and others) were doing fine. Using nagios-2.3 i got the following errors: check_ciscoenv.pl ;Cisco environmental health;UNKNOWN;HARD;3;UNKNOWN: Unable to resolve=20 destination address '-c' check_load.pl ;Load;UNKNOWN;SOFT;2;**ePN /usr/local/nagios/libexec/check_load.pl: Argume= nt=20 isn't numeric in numeric lt () at (eval 12) line 61,. check_ifoperstatus.pl ;UNKNOWN;notify-by-email;**ePN /usr/local/nagios/libexec/check_ifoperstatus= :=20 Option d requires an argument. check_smart.pl ;S.M.A.R.T-Status;UNKNOWN;SOFT;1;**ePN /usr/local/nagios/libexec/check_smar= t.pl:=20 Can't exec sudo: No such file or directory at (eval 15) line 119,. check_traffic.pl ;Traffic ISDN-Interface;CRITICAL;notify-by-email;CRITICAL: Could not match= ISDN Basic Rate Interface (S0) Firstly thanks to Frederick and Michael for the notification about this serious problem. Unfortunately the situation as I see it is, 1 I am running 2.3 not 2.3.1 and so my limited Perl plugins (home-brew) may not be picking up the problem. Also I lack a Nagios work bench at the moment so its going to be slow if heavy lifting is involved as it seems. 2 A quick glance at the CVS does not seem to show any relevant changes 2.1 there appear to be no changes in checks.c near the embedded Perl code 2.2 the change to p1.pl was only to allow plugins to return more than one line of output (the nagios-snmp plugins do this I think). 3 If Michael or Frederick would enable the LOGGING options in the copy of p1.pl they use for new_mini_epn (perldoc p1.pl should help). IIRC you want to change this p1.pl to have use constant DEBUG_LEVEL = LEAVE_MSG ; and make sure the plugin log path looks Ok. This will leave messages like Mon Mar 6 15:43:39 2006 run_package: /usr/lib/nagios/plugins/check_rootport -H 10.0.254.167 -N BRUSW200 returning (0, Ok. No topology change: root port of 10.0.254.167/BRUSW200 has not changed from that expected: 513. See a href=http://nms/cgi-bin/display_spanning_tree;Current spanning tree graph/a. ). Mon Mar 6 15:43:39 2006 eval_file: /usr/lib/nagios/plugins/check_rootport already successfully compiled and file has not changed; skipping compilation. Mon Mar 6 15:43:39 2006 run_package: /usr/lib/nagios/plugins/check_rootport -H 10.0.254.170 -N MTASW200 returning (0, Ok. No topology change: root port of 10.0.254.170/MTASW200 has not changed from that expected: 0. See a href=http://nms/cgi-bin/display_spanning_tree;Current spanning tree graph/a. ). Mon Mar 6 15:43:39 2006 run_package: /usr/lib/nagios/plugins/check_rootport -H 10.0.254.168 -N MTASW207 returning (0, Ok. No topology change: root port of 10.0.254.168/MTASW207 has not changed from that expected: 1. See a href=http://nms/cgi-bin/display_spanning_tree;Current spanning tree graph/a. ). in the log file (also named in p1.pl). This should provide some clues. The only quick workaround that may be worth a _try_ is to replace pl.pl for an older one from CVS (say 1.7). However, I am not confident. Perhaps my installation is a bit atypical: every one of the 2.0 series (inc betas) has been in prod use with either heavy or light embedded Perl without a hitch. The last random thought is, could you have changed Perl or Text::ParseWords around about the time the problem started ? This module is responsible for argument processing and this appears to be breaking. OTOH, if it was the culprit, all versions would be b0rked. Good luck, Yours sincerely. --- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid0709bid3057dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] RE: CSV output out of availibility Reports
Dear Sir, I am writing to thank you for your letter and say, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Friday, 7 April 2006 13:16 To: nagios-users@lists.sourceforge.net Subject: Nagios-users digest, Vol 1 #3123 - 37 msgs ... Message: 5 Date: Thu, 6 Apr 2006 13:16:06 +0200 From: Sand Philipp [EMAIL PROTECTED] Subject: [Nagios-users] CSV output out of availibility Reports Hi there, I remember a patch fort he avail.cgi, with which, you could generate a cs= v output out of each avail.cgi report. I already did some research in this list and with google, but I can't fin= d the patch any more. Can anyone please give me a hint, where I can download this patch? Has anyone tested this patch with the new version of the avail.cgi in Nag= ios 2.1? avail.cgi has _always_ done CSV output (as long as you choose all hosts or all services). It's up to you to filter the CSV by any means you care to choose. If you want some help, try - either putting all the CSV records in a DB and use SQL (if you are serious about reporting you prob want to do this). - Nagios::Report (munges and filters the CSV from avail.cgi). Question for Ethan: why isn't this patch integrated into the avail.cgi by= default? Is this planned for a future release? Patches welcome. BTW, there are bugs in avail.cgi relating to scheduled down time that are probably more important than this. Thanks in advance! I probably didn't do your letter justice. It seems on reflection that you meant you want the patch for avail.cgi that generates CSV from a specific availability report eg a new link in the report for a host or service that does CSV. However, I think it has been made pretty clear from the Nag roadmap that the CGIs have been end of lifed and will be replaced by PHP. I think that is a much better use of scare developer resources than trying to fix difficult and fragile code. As Radia Perlman said about sub-optimal routing, 'people should be grateful that their data is delivered at all'. Yours sincerely. --- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Ideas sought - reporting on monitored sites in TZs != Nagios TZ.
Dear Folks, I am writing to invite ideas about how to report on availability in cases where there is one Nagios monitoring hosts located in other time zones (TZ). In many cases, people want reports of availability in 'business hours' and Nagios does that beautifully by (avail.cgi) accepting the time_period to compute availability over. However, 'business hours' at the Nagios site is not the same as that in the other sites, so outages that should be excluded are not and outages that should be included are excluded. The best I can think of is defining a 'business hours' time_period that spans business hours in all the time zones. This has the advantage of not excluding any outage but the drawback of including outages that fall inside the left hand side of the time period (morning outages at remote sites that are way to early to be included. They are included because it is say 8 am at the site where Nagios is even though it may be 5 am at the remote site). Another solution that doesn't cut it for us is multiple copies of Nagios, each reporting on 'business hours' in the TZ where each copy is located. A real nasty hack would be to wrap the host_check plugin with logic that 1 determines where the host is 2 decides whether or not to pass the down back to Nag depending on the locations TZ Obviously this is useless since it suppresses outages you may want to report on in a 24x7 view. The only other idea I have is 1 abandon Nagios reporting 2 have a global host event handler log outages in a DB 3 report by accumulating outages (depending on TZ) from the DB. None of these are attractive. Any other suggestions are welcome. Thank you, Yours sincerely. --- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] RE: Nagios::Report (end of dev cycle).
Dear Sir, I am writing to thank you for your letter and say, -Original Message- Content-Disposition: inline Hmm, it would be a pitty not to develop this thing further. Like you say it provides the kind of reports management likes and up until now it is the only way I have found to create biased reporting. What do I mean ? I mean I don't care about how many hours application or server xyz was down during the weekend. Those hours are of no importance to me in terms of management reports. The information and especially metrics outside of these hours need to be taken for administration purposes of course. But the weekend is used here for maintenance as is the case in many places. For my SLA's these hours should not be counted. Up until now your tool is the only tool I have found that can ignore data based on time-table= s in its reports. So as far as I am concerned please keep up the good work. The reason for saying that devel has prob ceased is that I have run out of ideas for it and think it may have reached the limits imposed by some of the implementation choices. OTOH, if there are things that need fixing or facilities I think can be added, please let me know. The other thing is that the modules reason for existence is to provide an API for the only current source of availability data, namely avail.cgi. avail.cgi is complicated and may not be maintained as responsively as people may wish - I can't imagine submitting patches for it without a great deal of hard labour. If on the other hand, somehow (as you say with the NEBs) outage details can be stuffed by the core into an RDBMS, then people can use all sorts of wonderful software from the DB/report world to generate very sophisticated reports. In fact, when I heard about DBD::AnyData I thought this would blow the original API (of Nagios::Report) away because people would SELECT to their hearts content. As it turned out, the SQL implemented by the AnyData module is not as powerful as one would like and in fact is good for only basic filtering. So my feeling is that Nagios::Report is a stop gap for me until something better comes along. Another technique that I find useful is to have a Nag event handler insert outage details into a table. This in principal allows one to combine outages with (manual updates) with causes and commentary. Question though .. DB NEB modules ? Thanks for your encouraging words. Cheers, Hans Yours sincerely. --- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Nagios-Report-0.002 on CPAN/NagiosExchange.
Dear Folks, I am writing to say that Nagios::Report 0.002 has been 'released' and is available at the usual places. This relase fixes a bug, and adds limited charting capability and a weaker alternate interface (provided by the Perl DBD::AnyData module) that allows client code to select the report data with SQL (the small subset that AnyData accepts). 0.002 Fri Mar 17 14:44:36 EST 2006 - fix bug in mkreport() processing of MUNGE_CALLBACK (would not change report values). *** This entailed a change _non_ backward compatible change in the MUNGE_CALLBACK interface. *** Client code that calls the alter-() callback _requires_ changing. *** The alter callback is now called with one parm, a ref to a hash of the field values *** indexed by field name. See examples/ for scripts that have been changed. - added to_dbh() method to allow DBD::AnyData provided use of SQL (simple) on report data - added primitive support for chart templates to excel_dump. The workbook written by Spreadsheet::WriteExcel can contain _one_ (1) chart of the availability data. This project does not scale very well. It provides a limited capability to provide a Data source for processing by Reporting tools such as Excel. This module has probably reached the end of development (some may say it would better have not started) apart from bug fixes. If you are serious about reporting look at the DB NEB modules or Steve Shipways stuff on NagiosExchange. This module provides however, a limited capacity to provide reports in the format beloved by PHBs. Yours sincerely. --- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Potential bug in avail.cgi/Nagios 2.0.
Dear Folks, I am writing to report a peculiar behaviour of the availability CGI with Nagios 2.0 Firstly, I think the avail.cgi is a wonderful beast that turns the Nagios logs into a very useful and desirable data source. My reporting requirements are down times minus any down time scheduled. I guess that total_time_down is in fact the sum of time_down_scheduled and time_down_unscheduled. But for some of my availability data (last month), I see (with an unpublished SQL interface on top). DB29 p $SQL SELECT host_name, total_time_down, time_down_scheduled, time_down_unscheduled FROM tab_24x7 where total_time_down 0 and time_down_unscheduled 1 DB30 $s = $d-prepare($SQL) DB31 $s-execute DB32 $s-dump_results 'Lismore_Optus_router_PE_interface', '96712', '0', '96712' 'MTASW203', '800', '1', '4294958096' 'TODSW210', '429', '6771', '4294960954' 'TRASW202', '200', '7000', '4294960496' 'TRASW203', '1392', '5808', '4294962880' 'TRASW204', '1092', '6108', '4294962280' 6 rows The problem is the too large values of time_down_unscheduled and the fact that the total_time_down is not the sum of sched and unsched downtime. In this case, downtime was scheduled for the TOD\w+ and TRA\w+ hosts (the Lismore entry is correct BTW). What can I provide that may help the investigation progress ? Yours sincerely. --- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] RE: Nagios-users digest, Vol 1 #3076 - 9 msgs
Dear Folks, Message: 8 To: Nagios-users@lists.sourceforge.net From: John R. Daily [EMAIL PROTECTED] Date: Sun, 5 Mar 2006 19:51:35 -0500 Subject: [Nagios-users] ePN: notification script I've been trying to get a Perl script to work as part of the notify- by-email command as defined in minimal.cfg, and it's finally dawned on me that the RPM package I'm using has the embedded Perl interpreter compiled in. (Is there a replacement for nagios -m in v2.0? The documentation still refers to it, but it doesn't seem to work.) Anyway, I'm not thrilled about having to deal with ePN for simple Perl utilities that aren't plugins, but I figured I could get it to work anyway. However, now I'm less confident. Give this a try with your Perl alternative to printf. 1 cd into /usr/bin (if RHEL; Nag bin path otherwise). 2 run new_mini_epn from there (the path to p1.pl should be set in the new_mini_epn binary but is not). (new_mini_epn has readline support so command line history and edit work ok.) eg [EMAIL PROTECTED] bin]$ ./new_mini_epn plugin command line: /usr/lib/nagios/plugins/check_rootport -H 10.0.254.168 embedded perl plugin return code and output was: 0 Ok. No topology change: root port of 10.0.254.168 has not changed from that expected: 1. See a href=http://nms/cgi-bin/display_spanning_tree;Current spanning tree graph/a. plugin command line: /usr/lib/nagios/plugins/check_rootport -H 10.0.254.168 embedded perl plugin return code and output was: 0 Ok. No topology change: root port of 10.0.254.168 has not changed from that expected: 1. See a href=http://nms/cgi-bin/display_spanning_tree;Current spanning tree graph/a. plugin command line: /usr/lib/nagios/plugins/check_rootport -H 10.0.254.168 embedded perl plugin return code and output was: 0 Ok. No topology change: root port of 10.0.254.168 has not changed from that expected: 1. See a href=http://nms/cgi-bin/display_spanning_tree;Current spanning tree graph/a. plugin command line: /usr/lib/nagios/plugins/check_rootport -H 10.0.254.168 embedded perl plugin return code and output was: 0 Ok. No topology change: root port of 10.0.254.168 has not changed from that expected: 1. See a href=http://nms/cgi-bin/display_spanning_tree;Current spanning tree graph/a. plugin command line: q embedded perl compiled plugin q with error: **ePN failed to open q: No such file or directory at p1.pl line 168. - skipping plugin plugin command line: That's all folks. [EMAIL PROTECTED] bin]$ This is repeatedly running a Perl plugin check_rootport (simulating reuse by ePN). It will display probs either 1 at compile time (ePN has some funny limits. See the mod_perl docco for more info). 2 at run time. Good luck. Yours sincerely. --- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] RE: Reporting bits and pieces.
Dear Folks, Mainly off topic - related to Reporting Firstly some observations and corrections to my recent letters about Reporting. 1 Use of DBD::RAM to at one fell swoop download the Nagios all hosts/services report and stash it in an in-core table (prior to filtering with SELECT and saving in an RDBMS). DBD::RAM no longer builds. It has been replaced by DBD::AnyData. From the README of DBD::AnyDATA: HISTORICAL NOTE: this module was formerly called DBD::RAM. Its name was changed because many people were unaware that the module supports file operations in addition to in-memory operations. See the Changes file for a description of changes since the last release of DBD::RAM. 2 Another way of accounting for outages (apart from daily log file parsing or using avail.cgi directly) may be with an event handler that inserts rows into a table with these columns HOST_NAME HOST_DOWN HOST_UP OUTAGE Since there are lot of good DB interfaces in various programming languages (Perl, Python, Ruby), this is pretty straight forward. I only recently became aware of the $HOSTDOWNTIME$ macro that allows one to filter outages that are in scheduled downtime. (At this low capability site, the event handler simply appends a row to a CSV file so that Excel can view/edit the data. While this is not great it allows manual update with COMMENT and CAUSE [via Excel] so it allows one in principal to combine problem analysis/resolution data with the outage). 3 Would anyone like to post their Reporting schemas ? /Mainly off topic - related to Reporting and now back to our usual program .. Yours sincerely. --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] RE: Reporting and misc rave.
Dear Folks, Firstly thanks to all that answered either on the list or privately. I will now attempt to emulate a journalling file system by summarising the responses. 1 Having Nagios availability in a DB is a good thing. Doing so reduces the cost of reporting since there is are data representation/conversion problems and the extraction can be done with SQL thereby minimising the script-hell problem. 2 Availability data capture Mr Shipways approach is too process the Nag logs periodically with private/in-house (AFAIK) code to extract the entries of interest and insert them as rows in a table(s). (Incidentally, this sounds very enterprising since the extraction code has to deal with all the cases handled by avail.cgi. The difficulty of extracting outages from the logs is why I chose to use avail.cgi as a source of availability data). Other approaches include event handlers that insert a row at the end of an outage. This is easy to code but unfortch since AFAIk, there is no macro that indicates if scheduled down time was prevailing may require manual post processing to update the column 'IN_SCHED_DOWNTIME'. 3 Reporting From Mr Shipway, Rouillard. There are at least two DBs with ODBC connectors (SQL Lite and MySQL) available. This is very important since the availability of ODBC connectors make available the wealth of MS applications for 3.1 client programs eg update your DB with Excel 3.2 reporting - use Excel charts for example 4 Re-use Any site worth its salt will ultimately recognise the need for various registries/directories that reduce the cost of client coding. Such registries/directories include 4.1 Provider circuit IDs 4.2 Addressing/subnets/VLANs etc etc 4.3 Managed nodes It would be helful if the Nagios config data could also be made available as a DB. Personally I think it would a bad thing if Nagios lost its template/text driven config but the config data should be made available to other applications so that there is not the endless client code churn of mapping names between applications. One approach would be to use Al Tobeys Nagios::Config to load a config DB. Why is this useful ? At least one application is mapping structured node names to those used in Reports. What exec understands benrt200 ? What about Bendigo ? Thanks for your time. Yours sincerely. --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] RE: Reporting and misc rave.
Dear Folks, I am writing with mainly a rant about Graphing and Reporting. Mainly OT rant 1 About graphing with Nagios Why would one bother when 1.1 Cacti does such a good job 1.2 Nagios could check the Cacti RRDs with either check_rrd, or by an outboard (Cron scheduled) RRD poller that submits passive service check results 1.3 the graphs can be associated with Nag service checks by either - explict URL of the Cacti graph in the service check output - for the adventurous, a Wiki front end that displays some of the Nag CGI service status and a link to the Cacti graph. As a footnote, since Cacti supports RRD 1.2 with built-in supported Holt Winters forecasting RRAs, the poller could be smart and simply check the exception Data store to see if the current rate is in fact outside the normal seasonal variation (computed by the Holt Winters algorithm inside the RRD). Of course this would require the modification of the RRDs that Cacti produces to add the HW RRAs (this doesn't require that the RRD content be unloaded and reloaded IIRC). 2 About reporting After writing a lot of code in Nagios::Report to extract and report on Nagios availability data it occurs to me that a better way of doing Reporting is to 2.1 put the availability data in a DB table (prob with an auto-incremented index) 2.2 use either 2.2.1 ad-hoc SQL queries, or 2.2.2 the reporting package of your choice (eg iReport) I hope that Nagios::Report will be enhance to take advantage of DBD::Ram, a Perl module that very easily gets a CSV file with LWP and sticks it in an in core DB that can almost as easily be used as a Data source to insert rows into the DB of your choice (MySQL, or whatever). /Mainly OT rant and now back to our normal program. Yours sincerely. --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] 2.0rc2 avail.cgi - possible bug in reporting of HOST DOWNTIME START/END EVENTS.
Dear Folks, I am writing to report a possible anomaly/bug in avail.cgi for Nag 2.0 rc2 (RPM based on Dag Wieers for RHEL3). The problem is that when a host has exited a period of scheduled downtime, the 'Host log entries' shown by avail.cgi look like Event Start Time Event End TimeEvent Duration Event/State Type Event/State Information 01-02-2006 00:00:00 01-02-2006 14:27:54 0d 14h 27m 54s HOST UP (HARD) PING OK - Packet loss = 0%, RTA = 0.82 ms 02-02-2006 20:59:44 02-02-2006 20:59:44 0d 0h 0m 0s HOST DOWN (HARD) CRITICAL - Plugin timed out after 10 seconds 02-02-2006 20:59:44 02-02-2006 21:06:53 0d 0h 7m 9s HOST DOWNTIME START Start of scheduled downtime 02-02-2006 21:06:53 02-02-2006 22:59:44 0d 1h 52m 51s HOST UP (HARD) PING OK - Packet loss = 0%, RTA = 0.71 ms 02-02-2006 22:59:44 03-02-2006 11:35:39 0d 12h 35m 55s+ HOST DOWNTIME END End of scheduled downtime and then the next time the Report is run the last line shows again how long it was since the host exited downtime (ie now minus the downtime end). eg Event Start Time Event End TimeEvent Duration Event/State Type Event/State Information 01-02-2006 00:00:00 01-02-2006 14:27:54 0d 14h 27m 54s HOST UP (HARD) PING OK - Packet loss = 0%, RTA = 0.82 ms 02-02-2006 20:59:44 02-02-2006 20:59:44 0d 0h 0m 0sHOST DOWN (HARD) CRITICAL - Plugin timed out after 10 seconds 02-02-2006 20:59:44 02-02-2006 21:06:53 0d 0h 7m 9sHOST DOWNTIME START Start of scheduled downtime 02-02-2006 21:06:53 02-02-2006 22:59:44 0d 1h 52m 51s HOST UP (HARD) PING OK - Packet loss = 0%, RTA = 0.71 ms 02-02-2006 22:59:44 03-02-2006 11:41:51 0d 12h 42m 7s+ HOST DOWNTIME END End of scheduled downtime This looks a little peculiar to me. It's not a bug but unfortunately violates the principle of least surprise (don't know what I was expecting but ..) and for those of us who mine the host log entries it means some code modification. The behaviour of the CGI seems Ok - the event duration is simply the time to the last event - and seems reasonable. Thanks for your time. Yours sincerely. --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Re: Nagios ignoring Perl Shebang
Dear Folks, -Original Message- From: [EMAIL PROTECTED] --__--__-- Message: 1 Date: Thu, 26 Jan 2006 23:12:03 +0100 From: Arno Lehmann [EMAIL PROTECTED] To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Nagios ignoring Perl Shebang - Was: Notification script problems.. How do I debug You've probably got a Nagios with embedded Perl running. There's a section in the manual with some hints how to write your Perl scripts in that case. Arno I don't think so. I haven't been following this thread so I don't have too much helpful to say but embedded Perl doesn't care about the shebang. If the plugin text contains the string '/bin/perl' - usually in the shebang line, then the plugin is assumed to be Perl and is compiled by the Perl compiler (called by eval { }) once, and thereafter the in core op-codes executed without recompilation. In any case, all my plugins and all the standard plugins have a standard shebang line that works fine with embedded Perl. The usual way to deal with _any_ misbehaving program that Nagios runs - plugins, event handlers, the whole shebang - is to wrap the offender in a reliable script that captures the argv it was called with, invokes the program with those args and then logs args and stdout, stderr to somewhere convenient. I know that this matter of wrappers has been discussed on this list before (my name and Andreas Ericsson). Yours sincerely. --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] RE: iReport Was: RE: [Announce] Nagios_Simple_Report on
Dear Folks, I am writing to thank you for your letter and say, -Original Message- Message: 1 Subject: RE: [Nagios-users] [Announce] Nagios_Simple_Report on NagiosExchange/CPAN. From: Mels Kooijman [EMAIL PROTECTED] Hi Hans, =20 I use iReport, a good reporting tool http://ireport.sourceforge.net/index.php =20 Mels =20 iReport looks amazingly wonderful. However, it appears to be a DB reporting tool. Would you care to amplify on how you use it ? Is it simply 1 Periodic script to get the Nagios availability report (all hosts/services) and load into DB 2 Ad-hoc reports or canned iReport programs that report against the DB ? Is the wizard up to it or are some Java skills needed ? This is a _hot_ topic with most Nag users so your comments are welcome. Yours sincerely. --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] [Announce] Nagios_Simple_Report on NagiosExchange/CPAN.
Dear Folks, I am writing to announce that Nagios::Report, a Perl module to munge data from the Nagios all hosts/services availability report is on CPAN and NagiosExchange (where it is called Nagios_Simple_Report). The class treats the CSV data from the availability report as a flat file database and allows 1 multiple reports corresponding to multiple time periods (eg 24x7 and Business hours) 2 selection of rows - availability records - based on field values 3 specification of columns to appear in the reports 4 transformation of rows by adding columns computed from other col values 5 sorting of the rows by col vals or functions of the column values The report data can be output as 1 data on stdout 2 csv data (for use by an SQL processor such as DBI::CSV or simply to load a database) 3 Excel spreadsheets (using the CPAN module Spreadsheet::WriteExcel which you must install). The module provides the capability to use ad-hoc or canned scripts (some examples of which are included) to produce periodic reports; these scripts contain user specified callbacks to do the munging. This site uses such scripts to generate monthly exception reports of all the hosts that reported outages (as a spreadsheet), and in conjunction with other tools such as Al Tobeys Nagios::Config module, an aggregate report of availability per site (again as a spreadsheet with a bar chart) where the aggregation is done over the (dependent) nodes at a site and the site names are extracted from the 'alias' attribute of the host configuration. It is also useful for such things as [EMAIL PROTECTED] Nagios-Report-0.001_REL-DIST]$ host_down_report -h '(?i)bendigo_optus' -t last9days 24x7 HOST_NAME DOWNUP OUTAGE Bendigo_Optus_router_PE_i 11-01-2006 14:20:49 11-01-2006 14:25:59 5m 10s Bendigo_Optus_router_PE_i 11-01-2006 15:06:22 11-01-2006 15:12:52 6m 30s Bendigo_Optus_router_PE_i 11-01-2006 17:39:48 11-01-2006 17:48:02 8m 14s Bendigo_Optus_router_PE_i 11-01-2006 17:55:22 11-01-2006 18:02:02 6m 40s Bendigo_Optus_router_PE_i 11-01-2006 19:28:00 11-01-2006 19:33:10 5m 10s Bendigo_Optus_router_PE_i 12-01-2006 11:40:23 12-01-2006 11:47:23 7m 0s Bendigo_Optus_router_PE_i 12-01-2006 13:44:30 12-01-2006 14:04:40 20m 10s [EMAIL PROTECTED] Nagios-Report-0.001_REL-DIST]$ In future, we will probably use this tool to load a database with the monthly availability data. 4 accessors that make the raw or munged data available to other programs This module does __NOT_ 1 give you an SQL interface to the availability data 2 generate charts as such - at this stage it only generates workbooks or flat file data. Charts can be generated - relatively simply - using Spreadsheet::WriteExcel (see http://groups.google.com/group/spreadsheet-writeexcel/browse_thread/thre ad/7bc303cb793ffebd/47de1b364366cf23?q=chartrnum=9#47de1b364366cf23 ) by - manually producing an Excel workbook with a chart linked to worksheets containing data - extracting the binary part of the workbook containing the chart macro - generating with Spreadsheet::WriteExcel a new workbook that includes the chart data from the last step and fills in the worksheet data that is linked to the charts. However this requires standalone code 3 give you a 'single sytem view' or 'business view' or any other buzz word (unless your Nag monitoring provides that data) Concluding notes. This module is useful for me and may be for others. Nagios probaby needs to have its availability data in a DB since DBs have a huge range of reporting tools, DBs have standard syntax to extract and munge data, and the data conversion/parsing effort is less with DBs. That said, this module can provide what management want and maybe what they think they want, with less effort than doing it all from scratch. The module _is_ on NagiosExchange but I made the fatal mistake of uploading a file with a high version to CPAN so it will probably take longer or have a different version number (like .015). Yours sincerely, S Hopcroft Data Communications Dept of Education, Science and Training Level 1, 240 City Walk Canberra City ACT 2601 +61 2 6211 6110 Fax: +61 2 6123 6262 0412 766 832 [EMAIL PROTECTED] --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and
[Nagios-users] Reporting ideas sought.
Dear Folks, I am writing to welcome clues about providing an itemised list of outages and their causes from, 'in some way', Nagios. The Nagios availability report does ineed provide a useful list of outages that can be wrapped and processed to ones hearts content (eg HOST_NAME DOWN UP OUTAGE Albany_DEST_router05-12-2005 04:10:59 05-12-2005 08:42:29 4h 31m 30s Albany_Optus_router_PE_in 05-12-2005 04:10:59 05-12-2005 08:42:29 4h 31m 30s Lismore_DEST_router 05-12-2005 16:11:30 05-12-2005 20:01:40 3h 50m 10s Lismore_Optus_router_PE_i 05-12-2005 16:11:30 05-12-2005 20:01:40 3h 50m 10s Kempsey_DEST_router 05-12-2005 13:16:39 05-12-2005 13:22:49 6m 10s Kempsey_Optus_router_PE_i 05-12-2005 13:16:39 05-12-2005 13:22:49 6m 10s Broken_Hill_Optus_router_ 05-12-2005 01:54:17 05-12-2005 01:57:27 3m 10s Broken_Hill_DEST_router 05-12-2005 01:56:07 05-12-2005 01:57:27 1m 20s ) but Nagios has AFAIK, no means of capuring event related data and associating it with an outage event to produce something like HOST_NAME DOWN UP OUTAGE CAUSE COMMENT Albany_DEST_router05-12-2005 04:10:59 05-12-2005 08:42:29 4h 31m 30s 1 BDR - down, provider Albany_Optus_router_PE_in 05-12-2005 04:10:59 05-12-2005 08:42:29 4h 31m 30s 1 BDR - down, provider Lismore_DEST_router 05-12-2005 16:11:30 05-12-2005 20:01:40 3h 50m 10s 2 router restart by power-on Lismore_Optus_router_PE_i 05-12-2005 16:11:30 05-12-2005 20:01:40 3h 50m 10s 2 power failure Kempsey_DEST_router 05-12-2005 13:16:39 05-12-2005 13:22:49 6m 10s 1 BDR - down, provider Kempsey_Optus_router_PE_i 05-12-2005 13:16:39 05-12-2005 13:22:49 6m 10s 1 BDR - down, provider Broken_Hill_Optus_router_ 05-12-2005 01:54:17 05-12-2005 01:57:27 3m 10s 5 dismiss Broken_Hill_DEST_router 05-12-2005 01:56:07 05-12-2005 01:57:27 1m 20s 5 dismiss In this case, cause is a coded value that classifies the fault and the comment is free form text. The best I can think of to create something like this is to 1 Append the outages to a file - possibly by having an event handler run the code that extracts the outage from the availability CGI - or better still all the data for an outage is prob provided by macros - for the host or service and appending that to a file. 2 Have an admin edit the file and add the values when they become known. The guts of the problem is Nagios does the right thing by automatically changing the state of monitored entity; there is no opportuntity to 'officially' close the 'fault' by collecting user-input and associating it with an outage. Looked at another way, outages don't really exist as first class objects (with their own methods and data). All comments are very welcome, Yours sincerely. --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37alloc_id865op=click ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null