[Nagios-users] OT - fault tolerant default router for Nagios host. [SEC=UNCLASSIFIED]

2008-02-22 Thread Stanley.HOPCROFT

Dear Folks, 

I am writing to request comments on a proposal to reduce the risk of
loss of Network visibility/spurious alerts etc caused by the failure of
the Nagios host's default gateway.

When the Nagios host is connected via multiple links, it is still
necessary to ensure that data flow either both links or that somehow
traffic is diverted to the other links. 

Solutions I have rejected include 

1 Link teaming/bonding - immature in Linux 

2 HSRP/VRRP - don't want to change network structure to suit Nagios and
I can't afford fibre links from Nag to a core switch in the 'other' data
centre.

Otherwise this is a fine solution 

3 Load sharing - half the traffic will be dropped if a link fails 

Here is what I think is the best fit: an application layer (non kernel)
fault tolerant router. 

This could be implemented by 

1 a Nag service check of the reachability of the default router 

2 an event handler (run by sudo) that replaces the default router if the
check returns CRITICAL HARD. 

Your comments are very welcome. 

Thank you, 

Yours sincerely. 




Classification: UNCLASSIFIED


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] RFC Possible bug in 3.0 alpha event handlers/macros ... [SEC=UNCLASSIFIED]

2008-02-11 Thread Stanley.HOPCROFT
Dear Folks,

Initial indications from RC2 are that event handlers are called with the
_correct_ values of the macros.

(This is a simulation: ie disable host/service checks and then submit a
passive host check result to
DOWN and UP a host)

Tue Feb 12 07:47:01 2008 PASSIVE HOST CHECK: wtmrt200;1;Test of event
handler/macros - bad val of LASTHOSTDOWN
Tue Feb 12 07:47:01 2008 HOST ALERT: wtmrt200;DOWN;HARD;1;Test of event
handler/macros - bad val of LASTHOSTDOWN
Tue Feb 12 07:47:01 2008 GLOBAL HOST EVENT HANDLER:
wtmrt200;(null);(null);(null);global_host_event_handler
Tue Feb 12 07:49:43 2008 EXTERNAL COMMAND:
DISABLE_HOST_SVC_CHECKS;wtmrt200
Tue Feb 12 07:49:43 2008 EXTERNAL COMMAND: DISABLE_HOST_CHECK;wtmrt200
Tue Feb 12 08:05:41 2008 EXTERNAL COMMAND:
PROCESS_HOST_CHECK_RESULT;wtmrt200;0;Test of event handler/macros - bad
val of LASTHOSTDOWN|
Tue Feb 12 08:05:51 2008 PASSIVE HOST CHECK: wtmrt200;0;Test of event
handler/macros - bad val of LASTHOSTDOWN
Tue Feb 12 08:05:51 2008 HOST ALERT: wtmrt200;UP;HARD;1;Test of event
handler/macros - bad val of LASTHOSTDOWN
Tue Feb 12 08:05:51 2008 GLOBAL HOST EVENT HANDLER:
wtmrt200;(null);(null);(null);global_host_event_handler

And from the event handler log (that appends its args to a file)

Tue Feb 12 07:47:01 2008 : wtmrt200 DOWN HARD 1202762761 1202762821 0 0.
Tue Feb 12 08:05:51 2008 : wtmrt200 UP HARD 1202763951 1202762821 0 0.

$ perl -le 'print join  , map { scalar localtime($_) } qw(1202763951
1202762821)'
Tue Feb 12 08:05:51 2008 Tue Feb 12 07:47:01 2008
$

First two time_t args in the call to the global event handler are
$LASTHOSTUP$ and $LASTHOSTDOWN$.

In this case these values of the arguments to the event handler
correspond to the times the host went
down and up, so on the basis of this test case, the values of the macros
are being passed correctly
to the event handler in Nagios 3.0 rc2.

Bravo Nagios !




Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] RFC Possible bug in 3.0 alpha event handlers/macros ... [SEC=UNCLASSIFIED]

2008-02-10 Thread Stanley.HOPCROFT
Dear Hugo,

I am writing to thank you for your letter and say,
 
 [EMAIL PROTECTED] wrote:
 
 | If you want to use alpha/rc1, 2, 3 .. nagios, don't whine
 about it on
 | Nag users.
 
 The point is that doing a bug report on 3.0alphaX where there are at 
 least 2 release candidates have followed is not usefull.
 
 If the problem still exists in in the latest release then it makes 
 sense to report it as such. But for any software is it not usefull to 
 use older versions to send in a bug report.
 
 So my recommendations still stands. Upgrade to the latest 3.0 release 
 candidate and retest. Any other 3.0 version of nagios should be 
 considere obsolete and a bug report against those versions is 
 pointless.
 

You are right.

I beg your pardon.

Yours sincerely.


Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] RFC Possible bug in 3.0 alpha event handlers/macros ... [SEC=UNCLASSIFIED]

2008-02-09 Thread Stanley.HOPCROFT
Dear Hugo, 

I am writing to thank you for your letter and say, 


 Message: 8 
 Date: Sat, 09 Feb 2008 09:23:29 +0100 
 From: Hugo van der Kooij [EMAIL PROTECTED] 

 Have you considered that the changelog might not be complete? 

Of course ! 

But don't you think a _major_ change in behaviour should be documented, 
or a serious bug - think about it, blowing third party software out of
the water - 
acknowledged ? 

 I strongly recommend you to DO upgrade first before you even think of
sending in a bug report. 

Is it a bug ? 

If so, is it it fixed in rc2 ? 

If it hasn't been fixed in rc2, will it be fixed in the release ? 

 If you cannot do so as soon as you have a couple of minutes then you
should not be running 3.0 alpha to begin with. 

Hey man ! I spend time fulfilling _my_ responsibility by reporting a
potential problem and being perfectly 
willing to be corrected, and you say I should not test new software and
identify 
bugs - unless I am willing to do things you obviously are not - so that
when it is released, others 
are saved others from those bugs ! 

I could have diffed rc1, rc2 and alpha for an undocumented change; 
I could have identified the code (maybe) at fault, and may-maybe
submitted a patch; 
and yes, I was hoping, someone else might for me because it is not my
code, I am not 
familiar with it, and I lack the talent to do it quickly, if at at all. 

In other words, that's why I am asking for help, having done as much as
I could. 

Tell me I should upgrade to rc2 and the problem will go away because of
this evidence (such as was sent), 
and I will gladly upgrade (since I was hoping to go to the release
without every step, because for me, upgrade 
means package build, test, install and possibly rollback) and report the
result. 

Otherwise, your message is clearly 

If you want to use alpha/rc1, 2, 3 .. nagios, don't whine about it on
Nag users. 

 And please do not reply to a message if you want to create a new
thread. It makes a mess of any threading system (like the archives)

 and in this case even the subject is rather uninformative. 

I beg your pardon (my employers domain has changed so the mail with the
correct subject and content bounced and as you 
can see from my tone, it is starting to become too hard). 

 Hugo. 


Classification: UNCLASSIFIED


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Nagios-users Digest, Vol 21, Issue 5 [SEC=UNCLASSIFIED]

2008-02-08 Thread Stanley.HOPCROFT

Dear Folks,

Please would someone help me out with what may be a bug in global event
handlers in 3.0 alpha (not
rc1 or 2 since there is nothing in the Changelog that seems to warrant
upgrade) ?

I have a (gloal host) event handler called like so

command_line$USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$
$HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$
$HOSTDOWNTIME$

I expect that $LASTHOSTUP$ when $HOSTSTATE$ eq UP and $HOSTSTATETYPE eq
HARD, to be the time the handler was called, and
  $LASTHOSTDOWN$ to contain, generally, the time that the
host was detected in a (hard) down state.

Right so far ?

The Nagios logs show records like so

[EMAIL PROTECTED] nagios]$ tail -500 nagios.log | perl -lne 'print if
/Hobart/  /EVENT|HARD/  !/SERV/' | ./ns-time_t2localtime Sat Feb  9
03:13:39 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:13:59 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:15:19 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:16:39 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:17:59 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:19:19 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:20:39 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:21:59 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:23:19 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:24:39 2008 HOST ALERT: Hobart;DOWN;HARD;10;CRITICAL -
Plugin timed out after 10 seconds Sat Feb  9 03:24:39 2008 GLOBAL HOST
EVENT HANDLER: Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 06:48:49 2008 HOST ALERT: Hobart;UP;HARD;1;PING OK - Packet
loss = 0%, RTA = 32.19 ms Sat Feb  9 06:48:49 2008 GLOBAL HOST EVENT
HANDLER: Hobart;(null);(null);(null);global_host_event_handler


but in 3.0alpha, the event handlers arguments have these values (dumped
by the handler when it's called)

Sat Feb  9 06:48:49 2008 : Hobart UP HARD 1202500129 1202499949
1200659464 0

or in localtime format,

[EMAIL PROTECTED] nagios]$ perl -le 'print join  , map { localtime($_)
.  } qw(1202500129 1202499949 1200659464)'
Sat Feb  9 06:48:49 2008 Sat Feb  9 06:45:49 2008 Fri Jan 18 23:31:04
2008

ie the $LASTHOSTDOWN$ is 06:45:49 instead of 03:3:39 !!

The event handler is perhaps foolish to rely on the macros, but what is
wrong here ?

Is it the macro value ?
Is it the event handler call from Nagios ?
Is it something that needs fixing before a 3.0 release ?

I am sure this behaviour is different to that in 2.9 since I was using
this event handler with only minor changes thruought the 2.x series and
producing reports from that data each month (for about 18 months).

The docco for 3.x LASTHOST macros is, as far as I can tell, exactly the
same as for 2.x, so this appears to be an undocumented (and unwelcome)
change.

Any comments or suggestions are welcome.

From my point of view, I will have to rewrite an event handler that
worked fine with 2.9 since this stuff is VITAL to my availability
reporting.

Thank you,

Yours sincerely.


Classification: UNCLASSIFIED

Dear Folks,

Please would someone help me out with what may be a bug in global event
handlers in 3.0 alpha (not
rc1 or 2 since there is nothing in the Changelog that seems to warrant
upgrade) ?

I have a (gloal host) event handler called like so

command_line$USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$
$HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$
$HOSTDOWNTIME$

I expect that $LASTHOSTUP$ when $HOSTSTATE$ eq UP and $HOSTSTATETYPE eq
HARD, to be the time the handler was called, and
  $LASTHOSTDOWN$ to contain, generally, the time that the
host was detected in a (hard) down state.

Right so far ?

The Nagios logs show records like so

[EMAIL PROTECTED] nagios]$ tail -500 nagios.log | perl -lne 'print if
/Hobart/  /EVENT|HARD/  !/SERV/' | ./ns-time_t2localtime Sat Feb  9
03:13:39 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:13:59 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:15:19 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:16:39 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:17:59 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:19:19 2008 GLOBAL HOST EVENT HANDLER:
Hobart;(null);(null);(null);global_host_event_handler
Sat Feb  9 03:20:39 2008 GLOBAL HOST EVENT HANDLER:

[Nagios-users] Global event handler problem in 3.0 ? [SEC=UNCLASSIFIED]

2008-01-05 Thread Stanley.Hopcroft

Dear Folks,

In sehandlers.c I see

   if(log_event_handlers==TRUE)
logit(NSLOG_EVENT_HANDLER,FALSE,GLOBAL HOST EVENT HANDLER:
%s;%s;%s;%s;%s\n,hst-name,macro_x[MACRO_HOSTSTATE],macro_x[MACRO_HOSTS
TATETYPE],macro_x[MACRO_HOSTATTEMPT],global_host_event_handler);

which suggests that if the macro_x[FOO] vals are NULL, I will see what I
do in the nagios.log

[1199423092] GLOBAL HOST EVENT HANDLER:
Sydney-backup;(null);(null);(null);global_host_event_handler

Why on earth should the macros be undefined ?

The DEBUG statements look good but how do I enable debug for event
handlers ?

Yours sincerely.
Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Debugging global host event handler in Nag 3 [SEC=UNCLASSIFIED]

2008-01-05 Thread Stanley.Hopcroft
Dear Folks,

The debug options in Nag 3.x are wonderful (especially for embedded
Perl. It is no longer necessary to enable debugging in p1.pl. This Is a
MASSIVE simply MASSIVE improvement. Thank you).

(FWIW the debug options for event handlers in nagios.cfg are

debug_level=16



# DEBUG VERBOSITY
# This option determines how verbose the debug log out will be.
# Values: 0 = Brief output
# 1 = More detailed
# 2 = Very detailed

debug_verbosity=2

)

The debug file then shows the event handler being called with all the
args

[1199526258.177826] [016.1] [pid=12436] Propagating checks to immediate
non-UNREACHABLE child hosts...
[1199526258.177833] [016.1] [pid=12436] Pre-handle_host_state() Host:
acisp014, Attempt=1/10, Type=HARD, Final State=1 [1199526258.177900]
[016.1] [pid=12436] Running global event handler for host 'acisp014'..
[1199526258.177919] [2320.2] [pid=12436] Raw Command Input:
$USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$
$LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$
[1199526258.177928] [2320.2] [pid=12436] Expanded Command Output:
$USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$
$LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$
[1199526258.177935] [016.2] [pid=12436] Raw global host event handler
command line: $USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$
$HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$
$HOSTDOWNTIME$ [1199526258.177958] [016.2] [pid=12436] Processed global
host event handler command line:
/usr/lib/nagios/plugins/eventhandlers/global_host_event_handler acisp014
DOWN HARD 1199526081 1199526258 0 0 [1199526258.213601] [016.1]
[pid=12436] Post-handle_host_state() Host: acisp014, Attempt=1/10,
Type=HARD, Final State=1

even though it is logged like so in nagios.log

[1199526258] HOST ALERT: acisp014;DOWN;HARD;1;DOWN BABY DOWN.
[1199526258] GLOBAL HOST EVENT HANDLER:
acisp014;(null);(null);(null);global_host_event_handler

Later,

[1199527058.256653] [016.1] [pid=12436] HOST: acisp014, ATTEMPT=1/10,
CHECK TYPE=ACTIVE, STATE TYPE=HARD, OLD STATE=1, NEW STATE=0
[1199527058.256661] [016.1] [pid=12436] Host was DOWN/UNREACHABLE.
[1199527058.256667] [016.1] [pid=12436] Host experienced a HARD recovery
(it's now UP).
[1199527058.256673] [016.1] [pid=12436] Propagating checks to parent
host(s)...
[1199527058.256679] [016.1] [pid=12436] Propagating checks to child
host(s)...
[1199527058.256685] [016.1] [pid=12436] Pre-handle_host_state() Host:
acisp014, Attempt=1/10, Type=HARD, Final State=0 [1199527058.256750]
[016.1] [pid=12436] Running global event handler for host 'acisp014'..
[1199527058.256767] [2320.2] [pid=12436] Raw Command Input:
$USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$
$LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$
[1199527058.256776] [2320.2] [pid=12436] Expanded Command Output:
$USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$ $HOSTSTATETYPE$
$LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$
[1199527058.256783] [016.2] [pid=12436] Raw global host event handler
command line: $USER2$/global_host_event_handler $HOSTNAME$ $HOSTSTATE$
$HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$ $LASTHOSTUNREACHABLE$
$HOSTDOWNTIME$ [1199527058.256806] [016.2] [pid=12436] Processed global
host event handler command line:
/usr/lib/nagios/plugins/eventhandlers/global_host_event_handler acisp014
UP HARD 1199527058 1199526258 0 0 [1199527058.385280] [016.1]
[pid=12436] Post-handle_host_state() Host: acisp014, Attempt=1/10,
Type=HARD, Final State=0

So the event handler is called correctly; there is ZERO likelihood of
the failure to get the handler results being the fault of Nagios 3 (it
is the fault of MY event handler).

On the other hand

1 why does the log entry not show the event handler args apart from
$HOSTNAME$

2 why does debug not show the return code of the handler

3 why does the debug not show the ePN processing (as it does very well
with service checks)
  of the event handler 

4 why does the debug output showing the 'Expanded Command Output:' not
show the macro values 

?

Your comments are very welcome.

Thank you,

Yours sincerely.





Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Trouble with Global event handlers in Nag 3 (b7). [SEC=UNCLASSIFIED]

2008-01-04 Thread Stanley.Hopcroft
Dear Folks,

I am writing to ask for help with global event handlers in Nagios 3.0
(b7).

The handler that worked Ok with Nag 2.9 seems to work erratically (ie
only some of the time; more often than not it doesn't do anything) with
3.0.

Nag would log (in nagios.log) this message with 2.9

[1186030923] GLOBAL HOST EVENT HANDLER:
Wollongong;DOWN;HARD;10;global_host_event_handler
[1186031313] GLOBAL HOST EVENT HANDLER:
Wollongong;UP;HARD;1;global_host_event_handler

but with 3.0b7 these,

[1199420252] GLOBAL HOST EVENT HANDLER:
Sydney-backup;(null);(null);(null);global_host_event_handler
[1199423092] GLOBAL HOST EVENT HANDLER:
Sydney-backup;(null);(null);(null);global_host_event_handler

but the definition of the handler command has not changed,

command_nameglobal_host_event_handler
command_line$USER2$/global_host_event_handler $HOSTNAME$
$HOSTSTATE$ $HOSTSTATETYPE$ $LASTHOSTUP$ $LASTHOSTDOWN$
$LASTHOSTUNREACHABLE$ $HOSTDOWNTIME$

It is wonderful to see that the 3.x series has debug level, and type in
nagios.cfg, but I can't see what is useful with this problem. 

I think the event broker should be what I want even though I have no
event broker module in use.
When I try it I see a lot of messages about callbacks (which I didn't
know I had).

Any advice will be very welcome. Please point me to the FM if this is a
change from 2.x that I haven't noticed.

Thank you,

Yours sincerely.








Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Experience with 3.0b6 ... most good but a few tiny probs [SEC=UNCLASSIFIED]

2007-11-08 Thread Stanley.Hopcroft
Dear Folks,

I am writing to present my observations on 3.0b6.

The synopsis is excellent but there are a few non show-stopping
problems.

Firstly, the performance of host detection is simply excellent and those
contemplating mega installations should take great heart. As soon as the
service check complete the max_checks, the host down notifications are
unlaunched.

The developer/s have done a great job (as usual).

For our site, it used to take nearly 2 minutes after service checks
failed to detect down hosts.

Problems

1 ARG macros with ePN (may be without)

$ARG$s appear to be instantiated differently to 2.x, and in such a way
as to cause Perl plugins using Getopt (ie expecting args) to barf if
called without args.

Workaround: command_name!  in services.cfg

2 Aberrant intermittent ePN behaviour

Fri Nov  9 14:30:59 2007 Warning:  Check of service 'Redundant link is
operational' on host 'TRASW210' did not exit properly!

This happens occasionally. Restart sorts it. Plugins _known_ to be good
with ePN in 2.x/passed by new_mini_epn (which prob needs revising).

3 global_event_handler strangeness

Not sure if this is bad, but Nag log shows 

Thu Nov  8 21:04:28 2007 GLOBAL HOST EVENT HANDLER:
NDCSW209;(null);(null);(null);global_host_event_handler

when global event handler runs.

Event handler still called with the same args and seems to do the 'right
thing'.

From my point of view, problem 2 is a concern (PITA to use the
verancular). Looks like I will be trawling checks.c for the origin of
this message.


Thank you,

Yours sincerely.
Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] configuration directory and file directive ...perplexity [SEC=UNCLASSIFIED]

2007-10-25 Thread Stanley.Hopcroft
Dear Folks,

I am writing to to express my gratitude for all the valuable (and good
natured) contributions about this matter.

All the suggestions were valuable and helpful.

Thank you very much,

Yours sincerely.

Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] configuration directory and file directives ... perplexity (long and boring). [SEC=UNCLASSIFIED]

2007-10-24 Thread Stanley.Hopcroft
Dear Folks, 

Has anyone used the nagios configuration directive cfg_dir to point to
an SMB (Windows) share ? 

The interest in doing this is that my colleagues hate vi and 'nix; they
are qualified Cisco/Window admins who 
respect Nagios but have no sympathy with anachronistic editors. They
would be much happier using notepad/ 
wordpad to edit the object configuration files. 

When I tried it for myself (Nagios 2.9, removing all the cfg_file
directives from nagios.cfg and adding cfg_dir 
to point to the Windows share), Nagios complained about the main
configuration file directive in cgi.cfg. 

When I changed cfg_dir to point back to the (untouched) Unix path,
nagios -v nagios.cfg still complained. 

I had to 

1 remove the cfg_dir directive 
2 replace the cfg_file directives 

before it would stop whining. 

Thanks for any helpful comments. 

Yours sincerely. 

Stanley Hopcroft 

Data Communications 

02 6211 6110 
0412 766 832 
  

Classification: UNCLASSIFIED


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] configuration directory and file directives ... perplexity (long and boring). [SEC=UNCLASSIFIED]

2007-10-24 Thread Stanley.Hopcroft
Dear Tim,

(Yes, I am the nitwit). 

 -Original Message-
 Mr Hopcroft,
 
 My first reaction was an unqualified yuk!, what nitwit would even 
 consider this, then I noticed it was you, and having seen your ever 
 useful posts since the Netsaint 0.0.7 days, I relented. Although 
 hearing vi called anachronistic ruffles a couple of feathers.

(OT. Have a look at what Rob Pike has been saying about Unix for some
years. To an MSCE, vi is .. well I am happy with my original choice of
words. I am not a Windows admin and am perfectly happy with vi to do my
Nagios configuration [or my own home brew semi-automation] but not
everyone who likes Nagios likes vi).

 Notepad isn't? 
 No accounting for taste...

(OT. For unambitious text mangling it's Ok. How many people depend on vi
macros or even conditional substitution ?)

 
 I can't actually speak to your specific question, but it just seems 
 like a scary thought. Better to run samba on the Nagios machine and 
 let them mount it, and/or SVN. And then there's the GUI method, of 
 course.

Good thought but why should they change to suit one application ?

At this site there are no Unix Sys admin skills (apart from me) and
everyone likes Windows. Having the configs on Win means management is
happy they are adequately backed up.

Does cfg_dir=/Some/Path actually work ? and if so, would anyone be so
kind as to paste a few lines containing these directives from their
nagios.cfg ?

Here is the problem, adding a cfg_dir to point to a _Unix_ directory
like so

***
*** 78,83 
--- 78,87 
  # extension) in a particular directory by using the cfg_dir
  # directive as shown below:
  
+ cfg_dir=/etc/nagios
+ 
+ # cfg_dir=/mnt/dest_smb/coms/NMS/nagios
+ 
  #cfg_dir=/etc/nagios/servers
  #cfg_dir=/etc/nagios/printers
  #cfg_dir=/etc/nagios/switches
[EMAIL PROTECTED] nagios]# 

causes 

[EMAIL PROTECTED] nagios]# nagios -v nagios.cfg 

Nagios 2.9
Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org) Last
Modified: 04-10-2007
License: GPL

Reading configuration data...

Error: Unexpected token or statement in file '/etc/nagios/cgi.cfg' on
line 23.

*** One or more problems was encountered while processing the config
files...

 Check your configuration file(s) to ensure that they contain valid
 directives and data defintions.  If you are upgrading from a
previous
 version of Nagios, you should be aware that some
variables/definitions
 may have been removed or modified in this version.  Make sure to
read
 the HTML documentation regarding the config files, as well as the
 'Whats New' section to find out what has changed.

[EMAIL PROTECTED] nagios]# 

Take out cfg_dir and all is well.

 
 good luck!
 
 tim

Thank you.

Yours sincerely.


Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] ePN patch - testers wanted. [SEC=UNCLASSIFIED]

2007-10-22 Thread Stanley.Hopcroft
Dear Folks,

I am writing to invite testing of a small patch for the embedded Perl
Nagios feature.

Currently (2.10/3.x) ePN, when a plugin is modified (without a restart)
refuses to run the modified plugin because compilation of the modified
plugin fails when Perl attempts to redefine the modified plugins
subroutines in the package corresponding to the modified plugin. The
only work around is to restart Nagios.

The patch deletes the Perl package (and therefore all the subroutines it
contains) before the modified plugin is recompiled (the plugin and the
package are therefore compiled into a nonexistent namespace).

I would prefer to only send the patch to people who are 

1 familiar with the ePN tradeoffs (the memory leak)

2 convinced that the ePN tradeoffs are outweighed by the benefits

3 have some experience with the memory footprint of the  ePN Nagios and

4 those who do not object to testing a potentially unstable release of
Nagios

I have been using the patch for my production system (195 hosts, 356
service checks) with out any problem that is obvious to me (custom Perl
plugins for SNMP checks of routers, spanning tree etc).

I am particuarly keen on knowing whether the memory leak is worse.

Please let me know privately if you are interested.

Yours sincerely.



Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Backing up Cisco router configs. Was: Nagios backup [SEC=UNCLASSIFIED]

2007-10-19 Thread Stanley.Hopcroft
Dear Folks,

 -Original Message-
 From: Cook, Garry [EMAIL PROTECTED]
 Subject: Re: [Nagios-users] Nagios backup
 
 You can use the 'archive' commands in recent IOS to have your config 
 backed up to a TFTP server anytime the config is written to NVRAM, as 
 well as at specified time intervals.
 
 Thanks,
 Garry
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Mike 
 Hawley
 Sent: Thursday, October 18, 2007 1:16 PM
 To: [EMAIL PROTECTED]; nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Nagios backup
 
 Thanks Roger, I was looking at something that would perform backups 
 when the config changes automatically.
 
 Ta
 
 Mike
 

You mean you want your availability monitor to backup your router
configs ?

You could write a custom plugin to check for differences in runnin and
startup configs and then do something with the running config; You could
pay for CiscoWorks and sacrifice small furry animals until it runs; or
you could try RANCID in conjunction with viewcvs (PHP application to
publish the CVS).

Have fun.





Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios-users Digest, Vol 17, Issue 35 [SEC=UNCLASSIFIED]

2007-10-17 Thread Stanley.Hopcroft
Dear Larry,

I am writing to thank you for your letter and say, 


 [mailto:[EMAIL PROTECTED] On Behalf Of Larry

 Low
 Sent: Friday, October 12, 2007 10:13 AM
 Subject: Re: [Nagios-users] Nagios 3.0b5 - ePN and perl caching 
 [SEC=UNCLASSIFIED]
 
 Thanks Stanley,
 
 Using my check_ifoperstatus script.  Available from 
 http://www.nagiosexchange.org/Networking.53.0.html?tx_netnage
xt_pi1[p_view]
 =1099
 

(FYI, you may want to check it with new_mini_epn [ distributed with
Nagios ] to see if ePN is going to complain about it).

 I've done a few minutes of debugging and the first problem I see is 
 the MTIME is not being populated.  Here is my epn_leave-msgs.log.  I 
 added
 print LH $filename - $mtime = 
 .$Cache{$filename}[MTIME].\n;
 while (my ($key,$value) = each %Cache) {
 foreach (@$value) {
 print LH $key - $_\n;
 }
 }
 right before it compares mtime and you will see below that MTIME is 
 not populated.
 
 I also added a couple logs where MTIME is supposed to be set.
 print LH $mtime ;
 $Cache{$filename}[MTIME]= $mtime
 
 unless $delete ;
 print LH $Cache{$filename}[MTIME].\n;
 
 You will see below that $mtime is fine but $Cache{$filename}[MTIME] is

 not.
 
 I changed
 $Cache{$filename}[MTIME]= $mtime
 
 unless $delete ;
 to
 $Cache{$filename}[MTIME]= $mtime;
 and the problem goes away.
 
 I tested for $delete and it is being set to 1 every time.  
 What is calling
 eval_file?  Is this from the nagios core?
 

Yep, base/checks.c. IIRC. checks.c also sets the value for $delete that
is passed to eval_file (see
http://nagios.cvs.sourceforge.net/nagios/nagios/base/checks.c?view=marku
p
use the source).

The value that becomes $delete is set in checks.c as the value of
DO_CLEAN.

I think this value is set by configure.

You may want to remove the config.cache (or whatever it is) and
reconfigure with the appropriate settings.

(On an unrelated matter, thank you for this thread since I think a
simple mod to p1.pl will deal with the recompilation problem of putting
symbols in the same stash and thereby raising the subroutine already
exists exception).

Yours sincerely.



Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nagios 2.9 ePN INC line [SEC=UNCLASSIFIED]

2007-10-14 Thread Stanley.Hopcroft
Dear John,

 From: john [EMAIL PROTECTED]
 Subject: Re: [Nagios-users] nagios 2.9 ePN INC line
 
 I guess that is an option, but I'd prefer not to have to do that for 
 all the additional modules/plugins that I end up with. Does version 3 
 behave better as I may be able to hold off for that before changing 
 the main monitoring node.

There is no difference in ePN between v2 and v3 (or very little).

 
 Thanks,
 
 john
 
 On Fri, 12 Oct 2007, David Fulton wrote:
 
  Symbolic link the NET::DNS plugin to one of those directories (like
  /usr/lib/perl5/site_perl/5.8.8) and it should find the module after 
  that. I gave up on trying to change how ePN looks for
 modules when it
  couldn't find  utils.pm in my plugin directory. Since all
 the default
  PERL nagios plugins need that I just made a symlink. Works
 smooth as
  silk.

I haven't been following this thread but with respect you maybe mistaken
in blaming ePN for INC problems.

ePN is Perl no ifs or buts. If Perl can find the path to the plugin, and
Perl has not been changed since ePN was built, they should have the same
view of INC.

The standard plugins put utils.pm in a non standard Perl path, so most
of my plugins have an added

'use libs q/usr/lib/nagios/plugins; ' #RHEL3

From the OP point of view, if you have upgraded Perl since building ePN,
recompile ePN/Nagios so it can get the version (5.8.8) dependent paths.

Yours sincerely.


Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios 3.0b5 - ePN and perl caching [SEC=UNCLASSIFIED]

2007-10-11 Thread Stanley.Hopcroft
Dear Larry,

I am writing to thank you for your letter and say,

 -Original Message-


 Message: 11
 Date: Thu, 11 Oct 2007 09:21:35 -0700
 From: Larry Low [EMAIL PROTECTED]
 
 This is without making changes to the script.
 
 Scenario:
 
 1) ./configure --prefix=/opt/Nagios --enable-event-broker 
 --with-embedded-perl (I have tried --without-perlcache as well and 
 have not had time to sift through code to see if this is the actual
 problem)
 2) Have an ePN script with sub print_help
 3) Execute 1st check of ePN script, returns OK, no problem
 4) Execute 2nd check of ePN script and ePN compile reports print_help 
 function is redeclared


Thank you for the very clear synopsis of the problem.

I have not had that experience with 3.0b4 (a few funny behaviours, but
by and large, like 2.9).
 
 If I compile without embedded-perl the problem does not exist.
 
 I should probably post this to the devel list.
 

You have got someones attention here.

FYI (and also to save me a reply to Andreas) the ePN stuff mainly
happens in p1.pl.

This code 

1 manages a cache of compiled plugins (ie checks if the mtime of the
plugin is different to the cached value and recompiles if it is)

2 transforms the plugin to a Perl subroutine in a package named (with a
mangled name) like the plugins file name

3 calls the subroutine if the mtime has not changed (and the compilation
is clean).

What you describe should not happen, and moreover the new stuff in 3.0
has not changed (at least as far as I can see) the interface (from that
in 2.x) to Perl in checks.c and utils.c. 

Would you send me the plugin privately so I can inspect it ?

 
 Larry Low
 


Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios 3.0b5 - ePN and perl caching [SEC=UNCLASSIFIED]

2007-10-11 Thread Stanley.Hopcroft
Dear Larry, 

There are debugging hooks in p1.pl that would be useful to enable. 

If you are interested in helping deal with this problem please would you


1 Back up your original copy of p1.pl (path is specified in nagios.cfg
IIRC) 

2 Change the DEBUG_LEVEL to 

use constant  DEBUG_LEVEL = LEAVE_MSG | CACHE_DUMP ; 

3 Change the DEBUG_LOG_PATH to something appropriate for your system eg 

use constantDEBUG_LOG_PATH = '/tmp/' ; 

4 Make p1.pl still compiles (perl -c p1.pl should be nag free; $? == 0) 

5 Restart Nagios 

(IIRC, all this is documented in POD format in p1.pl, so perldoc p1.pl
should show 

... blah blah 

  Extra logging is given by setting DEBUG_LEVEL to include 

   LEAVE_MSG 

   1 opens an extra output stream in the path given by the value of 
   DEBUG_LOG_PATH 

   2 logs messages describing the success or otherwise of the plugin
com- 
   pilation and the result of the plugin run. 

   An example of such messages are 

Fri Apr 22 11:54:21 2005 eval_file: successfully compiled
/usr/local/nagios/libexec/check_bass . 
Fri Apr 22 11:54:21 2005 run_package:
/usr/local/nagios/libexec/check_bass  returning (0, BASS
Transaction completed Ok.

). 
Fri Apr 22 11:55:02 2005 eval_file: successfully compiled
/usr/local/nagios/libexec/check_ad -D production.prod -S.

Fri Apr 22 11:55:02 2005 run_package:
/usr/local/nagios/libexec/check_ad -D foo.dom -S returning (0, Ok.
Expected 2 domain controllers [foo1 foo2] for foo.dom.prod domain from
1.1.2.3 DNS, found 8 [foo1 foo2 ..]

). 

.. blah blah 
) 

In my case I see 


[EMAIL PROTECTED] bin]# perl -c p1.pl 
p1.pl syntax OK 
[EMAIL PROTECTED] bin]# diff -c p1.pl.orig p1.pl 
*** p1.pl.orig  2007-10-12 14:09:24.0 +1000 
--- p1.pl   2007-10-12 14:09:56.0 +1000 
*** 
*** 10,22  
  use constant  CACHE_DUMP  = 2 ; 
  use constant  PLUGIN_DUMP = 4 ; 
  
! use constant  DEBUG_LEVEL = 0 ; 
  # use constantDEBUG_LEVEL = CACHE_DUMP ; 
  # use constantDEBUG_LEVEL = LEAVE_MSG ; 
! # use constantDEBUG_LEVEL = LEAVE_MSG |
CACHE_DUMP ; 
  # use constantDEBUG_LEVEL = LEAVE_MSG |
CACHE_DUMP | PLUGIN_DUMP ; 
  
! use constant  DEBUG_LOG_PATH  = '/usr/local/nagios/var/' ; 
  # use constantDEBUG_LOG_PATH  = './' ; 
  use constant  LEAVE_MSG_STREAM= DEBUG_LOG_PATH .
'epn_leave-msgs.log' ; 
  use constant  CACHE_DUMP_STREAM   = DEBUG_LOG_PATH .
'epn_cache-dump.log' ; 
--- 10,22  
  use constant  CACHE_DUMP  = 2 ; 
  use constant  PLUGIN_DUMP = 4 ; 
  
! # use constantDEBUG_LEVEL = 0 ; 
  # use constantDEBUG_LEVEL = CACHE_DUMP ; 
  # use constantDEBUG_LEVEL = LEAVE_MSG ; 
! use constant  DEBUG_LEVEL = LEAVE_MSG | CACHE_DUMP ; 
  # use constantDEBUG_LEVEL = LEAVE_MSG |
CACHE_DUMP | PLUGIN_DUMP ; 
  
! use constant  DEBUG_LOG_PATH  = '/tmp/' ; 
  # use constantDEBUG_LOG_PATH  = './' ; 
  use constant  LEAVE_MSG_STREAM= DEBUG_LOG_PATH .
'epn_leave-msgs.log' ; 
  use constant  CACHE_DUMP_STREAM   = DEBUG_LOG_PATH .
'epn_cache-dump.log' ; 

and 

[EMAIL PROTECTED] nagios]# more /tmp/epn_leave-msgs.log 
Fri Oct 12 14:17:08 2007 eval_file: successfully compiled
/usr/lib/nagios/plugins/check_sysUpTime -R 10.208.1.254. 
Fri Oct 12 14:17:08 2007 run_package:
/usr/lib/nagios/plugins/check_sysUpTime -R 10.208.1.254 returning (0,
sysUpTime of router 1

0.208.1.254 is 231 days, 18:14:31.55). 
Fri Oct 12 14:17:17 2007 eval_file:
/usr/lib/nagios/plugins/check_sysUpTime already successfully compiled
and file has not changed; 

skipping compilation. 
Fri Oct 12 14:17:17 2007 run_package:
/usr/lib/nagios/plugins/check_sysUpTime -R 10.36.103.254 returning (0,
sysUpTime of router 

10.36.103.254 is 269 days, 00:03:26.48). 
Fri Oct 12 14:17:22 2007 eval_file: successfully compiled
/usr/lib/nagios/plugins/check_backuplinks -N BRUSW200. 
Fri Oct 12 14:17:22 2007 run_package:
/usr/lib/nagios/plugins/check_backuplinks -N BRUSW200 returning (0,
Ok. All links from br

usw200/10.0.254.167 to mtasw200 via Etherchannel  _are_ in up
operational status. Redundant topology Ok.). 
[EMAIL PROTECTED] nagios]# 

Unfortch, although the log stream should be unbuffered, it wasn't being
flushed while Nag was running. I had to restart Nag again to

get the messages flushed (when I changed the path for the log messages).


You prob should ensure that the problem plugin is scheduled frequently
(eg each 5 mins) and let it run for about 5 check periods.

Please post the results to the list. 

Thank you, 

Yours sincerely. 



Stanley Hopcroft 

Data Communications 

02 6211 6110 
0412 766 832 
  

Classification: UNCLASSIFIED



Re: [Nagios-users] Nagios 3.0b5 - ePN and perl caching [SEC=UNCLASSIFIED]

2007-10-10 Thread Stanley.Hopcroft
Dear Larry,

I am writing to thank you for your letter and say, 

 -Original Message-

 Message: 9
 Date: Tue, 9 Oct 2007 17:01:41 -0700
 From: Larry Low [EMAIL PROTECTED]
 Subject: [Nagios-users] Nagios 3.0b5 - ePN and perl caching
 
 The problem I am having is with subroutines inside of perl scripts ran

 under the ePN.  If a subroutine is defined the next time the script or

 similar script is executed the ePN compile fails with a duplicate 
 subroutine reported as the problem.
 
 I believe this problem does not exist under the 2.x code.

Do you mean that, after you change a plugin that Nagios is running, ePN
fails to compile the modified plugin with this (spurious) error code ?

If so, I am running 2.9 and have this problem.

Reluctantly I have found that the only way to deal with this is restart
Nagios after a plugin mod. 

It is a PITA. 

Patches welcome.

The 

 
 
 Larry Low


Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios 3.0b1 and perl check plugins [SEC=UNCLASSIFIED]

2007-09-07 Thread Stanley.Hopcroft
Dear Folks,

In the last 4 - 6 weeks there were reports of failures of embedded Perl
in Nagios 3.0 betas.

I am running 3.0b3 with the event broker and ePN with a full complement
of non standard Perl plugins known to work with ePN in 2.9 (my employers
production system).

This site is RHEL 3 + Perl 5.8.0 (EL 3 RPM).

There are a few strange behaviours (eg a warning apparently from a
standard Perl module Getopt::Long), but after about 20 mins running it
appers not too bad (ie all Perl plugins go belly up).

Can those that interested in helping sort out ePN with 3.0 contact me
(or mail to the
list) with bug reports or better still patches ?

Thank you,

Yours sincerely.

Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Perl checks in 3.0 [SEC=UNCLASSIFIED]

2007-09-07 Thread Stanley.Hopcroft
Dear Folks, 

There is a comment in base/checks.c that may be relevant to the probs
with embedded Perl in 3.0 betas. 

There was apparently a long standing bug of freeing memory associated
with the Perl plugin output _before_ 
it was copied to Nagios. 

The comment says the bug was corrected by a patch sent by Hendrik B. of
July this year. 

Nice one Hendrik. 

Yours sincerely. 



Classification: UNCLASSIFIED


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] SLA reports [SEC=UNCLASSIFIED]

2007-09-03 Thread Stanley.Hopcroft
Dear Matthew,

I am writing to thank you for your letter and say, 

 
 Hi there,
  
 I wish (well, have been told to..) to produce SLA type reports of our 
 IT systems for management. At the moment the requirements are rather 
 vague...
  
 As we are currently using NDO I am hoping that Jasper Reports may be 
 used to pull reports directly from the database.
 Poking around I can find no reference to people having done so.


I haven't heard of too many people taking the next big leap with Nagios,
namely, using the NDO infrastructure as the basis of availability
reporting. OTOH there are some who doing what you are proposing, one at
least with the NDO outage table.

At the moment, my employer has an event handler that stashes outage data
in a table and some home-brew (Perl/DBI/Spreadsheet::WriteExcel) to
generate some reports (including SLA reports) (ie this is NOT an NDO
application. However, obviously this is the way to go and once I get
enough time and energy, I would like to pursue this).

Doing an SLA report is basically filtering the outage times against the
SLA time period.

Amazingly enough, Nagios already does a lot of this sort of filtering
when it determines on the basis of time-periods whether or not to notify
contacts.

It may therefore possible that the Nagios core could provide more SLA
support than it does by only actioning outages that occur within the
SLA. However, irrespective of future core support, you could acheive
something like the same result by only running checks for the time
period corresp to your SLA and therefore you would only get outages
within the SLA.

If on the other hand you want to filter the outages in the NDO tables,
there is a Nagios::SLA that is used here, but since I have no idea what
Jasper reports is/does, you may not need this.

(if you are interested in Nagios::SLA let me know privately. It is not
published and may not be for quite a while since I am busy trying to
pass 642-901). 
  
 Any advice while I am still at the stage of working out what 
 management want?

Yep. Write the all singing all dancing Nag availability reporting
package and earn everlasting fame. For bonus marks, donate it to the
project (or maintain it).

Yours sincerely.


Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Log monitoring with SEC and Nagios. [SEC=UNCLASSIFIED]

2007-08-29 Thread Stanley.Hopcroft
Dear Risto

(Thank you very much for SEC, the king of event correlators).


 Message: 19
 From: Risto Vaarandi [EMAIL PROTECTED]
 Subject: [Nagios-users] Log monitoring with Nagios - recommendations?
 hi all,
 
 few weeks ago I posted a question to this list about passive service 
 checks - I was actually experimenting with Nagios as an event log 
 monitoring GUI. I am tracking event logs with SEC and also 
 sending out 
 alerts with it, but I would still like to see correlated log 
 messages in 
 Nagios web interface as well.


I used to use (and enjoy) SEC to inject passive service check results
to Nagios.

Is that an option in this case ?

Yours sincerely.
Classification: UNCLASSIFIED -

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Monitoring multicast - any ideas [SEC=UNCLASSIFIED]

2007-08-19 Thread Stanley.Hopcroft
Dear Folks,

Does anyone have any wisdom to offer about monitoring multicast
applications ?

The context is Cisco PIM, so the Cisco Mroute MIB is an obvious place to
start. What makes it a bit harder is that the application of interest
(TV broadcasting) uses an MS Media server that simply joins any old
group when it starts.

In other words, the clients learn from the Web page what group they
should join, and since the mroute table is indexed by group, the group
needs to be known or the whole table is checked for the presence of the
server.

Yours sincerely.
Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios 3.0b1 and perl check plugins [SEC=UNCLASSIFIED]

2007-08-16 Thread Stanley.Hopcroft
Dear Folks,

The reports about this matter are a bit perplexing in that

1 if the Nag internals have changed so that the ePN functions do not
have their return values processed correctly, or the ePN functions are
not called in the same way as 2.x, then the SEGV is expected with ePN.

2 However, if Nag is recompiled without ePN, the plugins are run by the
shell in exactly the same way that other plugins are, and therefore if
the Perl plugin works from the command line, it should work with Nagios
3.x

IIRC, the ePN log shows that the Perl harness is doing the right thing
and so there is something in Nags internals that needs investigating for
ePN to work properly. So 1 is just a bug and should get fixed when I can
focus some time and energy.

OTOH, 2 is inexplicable.

Are all Perl plugins failing to run under 3.x Nagios _without_ ePN.

Please would you list some of the plugins that work under 2.x but fail
under 3.x ?

It would probably be good to check out the appropriate DEBUG setting
(DEBUG_3) and copy the list with the result.

Yours sincerely.





Classification: UNCLASSIFIED

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_tacacs_plus.pl [SEC=UNCLASSIFIED]

2007-05-24 Thread Stanley.Hopcroft
Dear Folks,
 
 Message: 8
 Date: Tue, 22 May 2007 18:47:21 -0700
 From: Daniel Lacey [EMAIL PROTECTED]
 Subject: Re: [Nagios-users] Any experience with check_tacacs_plus.pl

 
 I don't know this platform, but
 
 A TACACS+ server's password database should be invisible to a 
 TACACS client.
 The server's purpose is to authenticate in a way that makes 
 such details 
 irrelevant.
 
 I would create a separate user for this with little to no 
 authorization... You just need to test the authentication server.
 The user and password will be stored somewhere in plain text 
 so that the 
 script using Authen::TACACSPlus will know how to connect to 
 the server.



There are source RPMS for Authen::TACACSPlus so the overhead of
this Perl plugin is not too bad.

check_tacacs_plus works nicely with the Cisco Secure ACS after 

1 the ACS is configured to recognise the Nagios hosts (ie names +
addresses
of all interfaces)

2 a user is created on the ACS that the plugin will use to check that
the
users password is validated.

A less attractive aspect of this plugin is that the TACACS+ secret key
needs to be
known to the Nagios host. Having a separate (from production) key seems
like a good idea
but since the plugin accepts username and pw as options, they are
visible to other
users on the Nagios host (unless you use ePN or hack the plugin).

I am grateful to the plugins authors (P Farmer et al) for this. Nice
job.

Thank you,

Yours sincerely.


Classification: UNCLASSIFIED

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Any experience with check_tacacs_plus.pl (NagiosExchange) or Authen::TACACSPlus [SEC=UNCLASSIFIED]

2007-05-22 Thread Stanley.Hopcroft
Dear Folks,

Please would you let me or the list know of experience checking the
TACACS+ server implemented by Cisco in their 'Secure ACS for Windows
3.3' product ?

Nagios Exchange has a plugin named check_tacacs_plus.pl that makes use
of the Authen::TACACSPlus module from CPAN.

I am not sure these will be helpful in checking a Secure ACS that uses
Windows/AD authentication. That said, since I am very ignorant about
TACACS+ I am probably wrong in thinking that ASCII, CHAP or MS-CHAP (the
alternatives supported by Authen::TACACSPlus) passwords don't sound
right for Windows/AD authentication.

check_tcp on port 49 is a useful standby but hopefully there are other,
non SNMP, alternatives.

Thank you,

Yours sincerely.

Classification: UNCLASSIFIED

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Potential BUG report Was: Nagios 3.x + ePN = Garbage data in status [SEC=UNCLASSIFIED]

2007-05-07 Thread Stanley.Hopcroft
Dear Folks,

I think there is a bug in the Nag 3.x processing of the plugin output
returned by an ePN check. 

 
 Message: 7
 Date: Mon, 7 May 2007 11:29:11 -0400
 From: James Whittington [EMAIL PROTECTED]
 Subject: Re: [Nagios-users] Nagios 3.x + ePN = Garbage data in status
   ..[SEC=UNCLASSIFIED]
 
 I turned on logging in the p1.pl and here's what I got.
 I have cut and pasted a couple of plugins as seen in the epn logfile 
 and through the webui of nagios.

 I think epn is getting the correct response from the plugin, and the 
 performance information looks good in the user interface, but the 
 status field in the user interface has garbage data mixed in with 
 valid data.
 
 This is nagios-3.0a2 by the way.
   
 From epn_leave-msgs.log :
 Mon May  7 10:49:08 2007 run_package: 
 /usr/lib/nagios/plugins/check_rfinput -H10.0.5.26 -Cpublic -Onagios 
 returning (0, -34 dBm|rf-input=34;58;60;22;80).

 
 From Nagios Web UI:
 Current Status:   OK  
  (for 6d 4h 51m 46s)
 Status Information: ?FdBm
 Performance Data: rf-input=34;58;60;22;80 Current Attempt: 
 1/1  (HARD state) Last Check Time: 05-07-2007 10:49:08 Check
 Type: ACTIVE Check Latency / Duration: 0.287 / 1.486 seconds


Nagios seems to be discarding the plugin output (that is normally put in
the Status Information field of the UI) but retaining the performance
data.
 
 
 
 From epn_leave-msgs.log :
 Mon May  7 10:58:03 2007 run_package: 
 /usr/lib/nagios/plugins/check_radio_status -H10.0.7.46 -Cpublic 
 returning (0, Status: No Alarms Uptime: 297 Days
 UAS: 0 SES: 0).
 
 From Nagios Web UI:
 Current Status:   OK  
  (for 24d 8h 39m 16s)
 Status Information: (No output returned from plugin)

This plugin returns no Perf data so unfortch you get yada in both of the
extended information panel fields.

 Performance Data:  
 Current Attempt: 1/1  (HARD state)
 Last Check Time: 05-07-2007 10:58:03
 Check Type: ACTIVE
 Check Latency / Duration: 156.153 / 3.484 seconds
 
 
 Please let me know if I need to try anything else.
 

One last matter: do you get beeped ?

Is it only the UI that is wrong or is Nagios also treating the plugin
response as a failure ?

If you are getting beeped (ie HARD error from ePN plugin) the fault is
prob in checks.c otherwise the CGIs may be the culprits.

I agree with your conclusion: this looks to me like a bug in the Nag
handling of the data returned by ePN.

The problem does _not_ appear to be ePNs return of the data since if
that was the case, there would be no Performance Data (which is data
appended to the plugin output following a pipe symbol): some plugin
output is getting back to Nagios so it must be Nagios incorrectly
processing the plugin output.

I think there's nothing more to be said apart from finding the bug,
probably in checks.c (although it could be in other code that also
processes data returned by Perl; event handlers for example).

I am hoping that the Nag developers will see this and comment.

 Thanks,
 
 James Whittington
 [EMAIL PROTECTED]


Classification: UNCLASSIFIED

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Does anyone use and love NDOUtils for availability reporting .. ? [SEC=UNCLASSIFIED]

2007-04-06 Thread Stanley.Hopcroft
Dear Folks,

I am writing to thank you for your letter and say,

 -Original Message-
 Message: 4
 Date: Fri, 6 Apr 2007 12:24:47 +0100
 From: Rob Blake [EMAIL PROTECTED]

 
  Would anyone like to comment on the use of NDOUtils (Nagios 2.x or
  later) for availability reporting ?
 
 
 NDOUtils will take the majority of the data associated with your 
 Nagios installation and send it to a database for you (currently only 
 mysql is supported). You can store information about your current 
 setup, notifications, current host/service status, the results of 
 checks etc. With this information in a database you are free to do 
 what you want with it. I believe the current plan is to leverage the 
 data that is stored in the database to faciliate a complete overhaul 
 of the current Nagios frontend.
 
 There is absolutely nothing stopping you from putting together your 
 own application that makes custom graphs, custom reports based around 
 the data available to you. You can use whatever language you like, 
 through whatever presentation medium you like. You are simply limited 
 by the connection to the database, and as I assume you will be 
 managing the database, this shouldn't be a problem.


This is excellent.

While the availability CGIs are excellent they do not

1 facilitate arbitrary presentation of the data (without say importing
the CSV output into a DB)

2 allow the combination of the outage data with other information such
as links to 'trouble ticket/service desk (for the ITIL inclined)'
systems for combining the outage data with other views of the 'incident'
(such as WTF caused it).

Having the outage data in tables should lead to an explosion of third
party/community developed reports and presentation frameworks in the
same way that the very clean architectural divisions in Cacti has lead
to that products extensibility and popularity.

Thank you.
 
 Rob

Yours sincerely.

Classification: UNCLASSIFIED

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Does anyone use and love NDOUtils for availability reporting ? [SEC=UNCLASSIFIED]

2007-04-05 Thread Stanley.Hopcroft
Dear Folks,

Would anyone like to comment on the use of NDOUtils (Nagios 2.x or
later) for availability reporting ?

I believe that NDOUtils inserts rows representing down times in an MySQL
table, making it much easier for DIY reporters to produce reports.

I am currently using an event handler for adding outage records to a
table but I am not happy with this method.

Thank you,

Yours sincerely.

Classification: UNCLASSIFIED

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Nagios memory Leaks

2007-01-23 Thread Stanley.Hopcroft
Dear Sir,

I am writing to thank you for your valuable letter and say, 

 
 From: Tobias Klausmann [EMAIL PROTECTED]
 Subject: [Nagios-users] Memory leaks
 
 Hi! 
 
 (First off: if this should also go to nagios-devel, just yell at
  me.)


I don't think so because it deals with the aspects of the implementation
that
are visible (and in fact, the letter doesn't propose detailed
solutions).
 
 Nagios 2.6 and 2.5 have memory leaks. They are not that big that
 within hours your machine will be swapping, but they degrade
 performance in other ways.
 
 First off, their approximate extent.
 
 2.5 and 2.6 without perl cache have the smallest memory leaks. A
 fairly busy Nagios server (hardware quoted below) with about 3000
 services on about 330 hosts will degrade from 330M used (that's
 *not* Nagios alone) to 368M used in about 16 hours. Or about 2.4
 MB per hour. The very same machine behaves neutral if Nagios is
 not running, so it's definitely Nagios itself.

Do you mean: 2.5 and 2.6 Nagios with embedded Perl but without the
Perl plugin cache option ?

If so, the fault is not Nagios, but the embedded Perl implementation and
or Perl.

Your next paragraph suggest that this is plain vanilla Nagios without
any Perl options to configure.

Is that correct ?

 
 Activating the embedded Perl interpreter and -cache will increase
 the amount of lost memory to about 5-6M per hour. In this case,
 however, sometimes the memory usage snaps back, i.e. some of the
 lost memory is collected. I've not yet found out what triggers
 the reclaim. Still, over the course of hours, more and more
 memory is lost. Still, it's roughly linear memory loss.
 

I have never witnessed memory being reclaimed after ePN leaks it.

I can't conceive of the process memory size being reduced while the
process is running (free() and friends only return the memory to the
process
heap).

I think the leak is caused by the ePN implementation. I a hoping to
trying
some measurements with several pilot implementations to see what is the
most 
promising way of doing this.

... (snip)

Yep. I agree. The leak is bad.


 The question that remains is, if this can (and will) be tackled
 before 3.0 is released. A related question is if Nagios 3 will be
 prone to the same problem.
 

Certainly it will if the current ePN implementation remains.

If (pretty big if) I can provide you stuff to try are you willing
to repeat your measurements on candidate implementations (wrt 2.5 or
2.6 code base) ?

I am not sure of my willingness/energy quotient but if they look Ok,
I may not have anything to show until March this year.

 Any thoughts, ideas etc. are appreciated.
 
 Regards,
 Tobias

Yours sincerely.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] ePN Was Performance issues.

2007-01-03 Thread Stanley.Hopcroft
Dear Sir,

I am writing to thank you for your letter and say, 

 -Original Message-
 From: Robert Hajime Lanning [EMAIL PROTECTED]
 Subject: Re: [Nagios-users] Performance issues, too1


  .. skipped helpful remarks.

 
 The perl code path that runs in the master Nagios process (after
 all pluggins have been compiled successfully and you remove argument
 caching) is:
 
 sub eval_file {
my ($filename, $delete, undef, $plugin_args) = @_ ;
my $mtime = -M $filename ;
if ( exists($Cache{$filename})  $Cache{$filename}[MTIME]
  ($Cache{$filename}[MTIME] = $mtime)) {
   if ( $Cache{$filename}[PLUGIN_ERROR] ) {
  ...
   } else {
  return $Cache{$filename}[PLUGIN_HNDLR];
   };
};
 };
 
 I am not sure where the leak is, unless it is in the interpreter
 itself.
 

It probably is, since most of the published documents (eg perlembed,
'Extending and
Embedding Perl') emphasise the _big_ tradeoffs with embedding Perl.

Thank you for repeating the code in your letter as I was trying to
remember how
it works and grappling with the fact that once the plugins are converted
into
Perl subroutines and compiled, the C caller (in checks.c) should simply
be able to load the 
Perl stack with the Plugin arguments - as is done in checks.c - and then
call Perl_call_sv()
with the subroutine reference returned by eval_file (the content of 
return $Cache{$filename}[PLUGIN_HNDLR]).

What happens is more complicated than this and I can't see why at the
moment.

(Part of the complexity is that the C args must be converted to Perl,
and 
I think I preferred a second call to Perl to do this [after which Perl
calls the subroutine
itself] rather than converting the arguments - which is tricky - in C
and then calling
the subroutine from C).

1 refactor the Perl/C interface with a view to improving
efficiency/readability/comprehensibility
(I thought I understood ..)

2 consider a different approach with PPerl.

BTW, for those wishing to play with this, contrib/new_mini_epn.c has
most of the guts
of the C interface (and uses the same Perl driver in p1.pl) may be the
easiest way
to start. (The main difference between this and the Nag code is that the
Nag code forks
for each plugin).

Thank you,

Yours sincerely.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Performance issues, too1

2007-01-02 Thread Stanley.Hopcroft
Dear Sir,

I am writing to thank you for your letter and say, 

 -Original Message-

 --
 
 Message: 2
 Date: Tue, 2 Jan 2007 08:26:34 +0100 (CET)
 From: Daniel Meyer [EMAIL PROTECTED]
 Subject: Re: [Nagios-users] Performance issues, too
 
 Hi there, and happy new year :-)
 
 Program Running Time: 10d 21h 22m 42s
 
 So, for almost eleven days nagios runs smoothly now, no more 
 latency problems. I'll try it again with EPN (but still 
 without perlcache) now.


Context is massive memory leak with ePN. Leak goes when ePN is removed.

Firstly, look at the caveats for ePN at
http://nagios.sourceforge.net/docs/2_0/embeddedperl.html

There should be added another major caveat to this: depending on your
plugins you may have a 
bigger or smaller leak, however leak it will.

For me, I wouldn't consider Nagios without ePN since I code most of my
plugins in Perl and the advantages
for me (and this installation) outweigh the leak.

Finally about the meaning of the configure switches for ePN.

1 --enable-embedded-perl

This builds Perl into the Nagios executable and at the least means that
your system does
__not__ fork a new process to run Perl plugins. Instead, Perl is parsed
and run by direct
calls to the Nagios binary.

So, setting this switch saves a context switch.

2 --with-perl-cache

If in addition, this switch is set, the Perl plugin is compiled only
once (otherwise, each time
Nagios goes to run a Perl plugin, it recompiles it). The resultant Perl
op code tree remains in
memory.

Unfortuntaely, for reasons that are not clear to me, this is the source
of the leak.



 
 Danny
 -- 

Yours sincerely.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios-users Digest, Vol 7, Issue 32

2006-12-20 Thread Stanley.Hopcroft
Dear Sir,

I am writing to thank you for your letter and say, 

 -Original Message-
 
 Message: 1
 Date: Tue, 19 Dec 2006 15:31:25 -0600
 From: Craig Van Tassle [EMAIL PROTECTED]
 Subject: [Nagios-users] Getting pie charts in host's history
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 My boss wants to be able to look at pie charts when ever we 
 are looking at
 history for various services. So far I have only been able to 
 get a bar graph
 that is showing the uptime and down time, but not I have not 
 been able to get a
 pie chart to display.
 
 I am using ubuntu 6.10, with nagios 2.4
 
 Any help or though would be appreciated.


Thoughts only.

Pie chart of what - a histogram of hosts in availability ranges (eg
0-90%, 90.1-98.5, 98.6 - 99.95,  99.95%) ?

If you want to do it with the standard tools you will have some work
counting numbers in each cathegory and then charting.

The approach might be

1 extract the availability data as CSV
2 import into the charting tool of your choice (eg OpenOffice)
3 chart

We 

1 have an event handler that inserts a record into a DB when a host
comes up (yes, this is pretty wonky)

2 have SQL that does the histogram stuff

3 use Perl Spreadsheet::WriteExcel to do the charting.

This isn't partic good but it provides the charts that managers insist
on (rightly).

HTH

 
 Craig

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios-users Digest, Vol 2, Issue 8

2006-07-09 Thread Stanley.Hopcroft
Dear Folks,

For the benefit of the archives, 

Dear Folks,

Is it possible to use the standard plugin distro check_ping to 
distinguish 
a reachability failure brought about by sluggish transport and 
one caused by a routing failure.


I think the best way to do this is with a plugin that returns either
OK or CRITICAL depending on whether the host is contactable (CRITICAL
also if the plugin times out).

This means that there is never any ambiguity between a congested
unresponsive
link and a host on an unreachable network.

One way of doing this is with an SNMP ping (eg check sysUpTime) to
a router on the subnet of interest.

(The actual application is determining if user subnets are reachable
when there is a routing protocol on each of the leaf [user subnet]
routers tunnelled through an MPLS transport. 

All sorts of wierd stuff can happen.
We use a routing protocol even on the single exit leaf nodes
because we cannot trust the provider [eg the only routing protocol they
provide is RIP ...]).

Yours sincerely.


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Really dumb question - how do folks distinguish unreachable from over threshold ?

2006-07-05 Thread Stanley.Hopcroft
Dear Folks,

Is it possible to use the standard plugin distro check_ping to
distinguish 
a reachability failure brought about by sluggish transport and one
caused
by a routing failure.

What occurs to me at the moment is to either 

1 don't use check_ping in cases of volatile routing (examine routing
with SNMP or CLI)

2 have a service event handler that reacts to 'plugin time out' (by
ultimately generating 
a HOST_DOWN passive service check result if the routing has failed).

Thank you,

Yours sincerely,


S Hopcroft

Data Communications
Dept of Education, Science and Training
Level 1, 240 City Walk
Canberra City  ACT  2601

 +61 2 6211 6110
Fax: +61 2 6123 6262

0412 766 832

[EMAIL PROTECTED]




Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Monitoring Cisco 3750 stacks - OIDs or traps ?

2006-06-25 Thread Stanley.Hopcroft
Dear Folks,

We like many others have happily deployed the Cisco 370 stackable
switch/routers in stacked
configurations.

Please would anyone with experience of monitoring these units with
Nagios comment on
how best to monitor the performance of the internals.

I am particuarly interested in checking that all the switches in the
stack are
present and correct. 

There appear to be 3 ways of checking this

1 Net::Telnet and parsing the output of 'show inventory'

2 Some OID with check_by_snmp (possibly from the CISCO-STACK-MIB)

3 Traps from the stack manager

What experience have people had with these methods ?

Thank you,

Yours sincerely.

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios-users Digest, Vol 1, Issue 3212

2006-06-07 Thread Stanley.Hopcroft
Dear Folks,

I am writing to thank you for your letter and say,

-Original Message-
Message: 1
Date: Wed, 7 Jun 2006 14:27:36 +0200
From: Rene Fertig [EMAIL PROTECTED]
Subject: Re: [Nagios-users] How to monitor complex websites?
To: nagios-users@lists.sourceforge.net
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain;  charset=iso-8859-1

check_http version 1.89 (which comes with nagios-plugins 
1.4.3) can set a 
User-Agent-String:

 -A, --useragent=STRING
   String to be sent in http header as User Agent


  ... snip

But probably you should make your own plugin if you need 
special cookie 
support.

bye, Rene


You may want to revisit writing your own, since there's a new
CPAN module FEAR::API for fearless programming of web clients.

From http://www.perl.com/lpt/a/2006/06/01/fear-api.html

'
FEAR::API's documentation says:

FEAR::API is a tool that helps reduce your time creating site scraping
scripts and helps you do it in an much more elegant way. FEAR::API
combines many strong and powerful features from various CPAN modules,
such as LWP::UserAgent, WWW::Mechanize, Template::Extract, Encode,
HTML::Parser, etc., and digests them into a deeper Zen.

(Here's an example that

 Fetch CPAN's homepage. 
 Extract data with a template. 
 Process links using a control structure. 
 Print fetched content to STDOUT. 
 Dump links in the page. 
 Use YAML to print extract results
)

It might be best to introduce FEAR::API by rewriting the previous
example:

   1use FEAR::API -base;
   2url(search.cpan.org);
   3fetch  [
   4  qr(foo) = _feedback,
   5  qr(bar) = \my @link,
   6  qr()= sub { 'do something here' }
   7];
   8fetch while has_more_links;
   9extmethod('Template::Extract');
  10extract($template);
  11print Dumper extresult;
  12print document-as_string;
  13print Dumper [EMAIL PROTECTED];
  14invoke_handler('YAML');
'

The article compares FEAR::API with the former standards WWW::Mechanize.

Even if you decide that FEAR::API, the standard Perl HTTP modules do

cookies
parse HTML - in partic, extract links
handle fill out forms

HTH,

Yours sincerely.


___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] RE: Nagios config parser in Perl

2006-05-30 Thread Stanley.Hopcroft
Dear Sir,

I am writing to thank you for your letter and say   

Message: 13
Date: Tue, 30 May 2006 16:47:02 +0200
From: Marc Haber [EMAIL PROTECTED]
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Parsing a Nagios 2 configuration file from perl

Hi,

I'd like to have a list of all host_name directives in host 
definitions in a nagios 2 template style configuration in a 
perl script.

Did anybody already write a nagios 2 configuration file parser in perl?


Yep. There's a (prob since I haven't used it) good one on CPAN in the
Nagios name space (search Nagios should find it) written by a Perl
luminary Al Tobey.

You need the Build module to get it installed.

There is also a rough-as-guts one that I use that is sometimes useful.
If you want to be the guinea pig ..


Here's an example of it in 'action'.


[EMAIL PROTECTED] sh1517]$ perl -MNagios::Config -e
'$x=Nagios::Config-new(/etc/nagios/nagios.cfg); @x=$x-grep(hosts,
q[$host_name =~ /mt[ab]sw21/i]); $x-pprint(hosts, [EMAIL PROTECTED])'
define host{
   host_nameMTBSW210
   address  10.0.254.149
   alias14 MORT BUILDING
   contact_groups
datacomms-admins,premier_support_group
   notification_period  24x7
   parents  MTASW200,BRUSW200
   use  generic-host
   }

define host{
   host_nameMTASW210
   address  10.0.254.169
   alias16 MORT BUILDING
   contact_groups
datacomms-admins,premier_support_group
   notification_period  24x7
   parents  MTASW200,BRUSW200
   use  generic-host
   }

extracting stuff.

I actually use it (for some definition of 'use') for batch adds. But it
is rough ...

Greetings
Marc

caveat computer.

Yours sincerely.


---
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnkkid7521bid$8729dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] RE: Jabber Custom Notification Script Not Working (Frederik Vanhee)

2006-05-24 Thread Stanley.Hopcroft
Dear Folks,

I am writing to thank you for your letter and say,

-Original Message-
Message: 1
Date: Wed, 24 May 2006 06:52:22 +0200
From: Frederik Vanhee [EMAIL PROTECTED]
To: Norman Harebottle [EMAIL PROTECTED]
CC: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Jabber Custom Notification Script 
Not Working

Norman Harebottle wrote:

 Hello Everyone,
  
 I am wondering if anyone has had success getting the Jabber
 Notification script written by David Cox to work.
  
  However, when running the 
 Perl script in the Embedded Perl environment, something 
happens (which 
 is not logged) which causes the script to not execute as desired.
  

embedded Perl Nagios like mod_perl is more sensitive to coding issues
than the 'fork a new Perl interpreter for each run' model.

There is every chance that a plugin that works fine from the command
line won't work under ePN.

You can turn ePN logging on by changing the DEBUG_LEVEL in p1.pl. Set it
to LEAVE_MSG (or whatever is mentioned in perldoc p1.pl). You need to
make
sure the log path and name suits you also.

In a former job, I found jabber/XMPP was very sensitive to the Perl
modules
and configuration of the Jabber server. I think we had an event handler
that
would publish stuff to jabber consoles with not much more than one of
the
example usages from the Perl jabber client. Unfortch my records are
elsewhere.

No experience unfortch with the jabber plugins.

Yours sincerely.


---
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnkkid7521bid$8729dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


RE: [Nagios-users] Undefined subroutine Embed::Persistent::eval_file called. in check_disk_smb

2006-05-23 Thread Stanley.Hopcroft
Dear Sir,

I am writing to thank you for your letter and say,

-Original Message-

Message: 15
Date: Tue, 23 May 2006 14:10:03 +0200 (CEST)
From: =?iso-8859-1?q?jacobo=20garc=EDa?= [EMAIL PROTECTED]
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Undefined subroutine 
Embed::Persistent::eval_file called.  in check_disk_smb

Under Nagios i had this problem when running
check_disk_smb

Undefined subroutine Embed::Persistent::eval_file
called. 


It sounds like that you have a Nagios built with embedded Perl.

(you can check by strings /Path/to/nagios | grep -i perl | head -5. If
you see
libperl.so
Perl_croak
Perl_markstack_grow
Perl_croak_nocontext
Perl_save_int

then you have a Perl interpreter built into Nagios).

If this is not the case, I can't help.

Unfortunately if your Nagios does have Perl in it, the Perl driver
(p1.pl) seems not to be where Nagios expects it.

Look at your nagios.cfg

(eg
[EMAIL PROTECTED] Dhcp]$ grep -i P1 /etc/nagios/nagios.cfg 
# P1.PL FILE LOCATION
# This value determines where the p1.pl perl script (used by the
p1_file=/usr/bin/p1.pl
[EMAIL PROTECTED] Dhcp]$ 
)

and check if p1.pl is in the location nagios.cfg says it should be.

If it is, I don't know what is happening.

If not, try and locate it - you can get it from the Nagios CVS or from
the corresponding dist tarball or the RPM/package - and put it there.
You could also check its not hiding somewhere else in the file system. 

If you can get a copy of p1.pl then either relocate it or put it in the
path specified by nagios.cfg (p1.pl is used only by Nagios so moving it
won't break anything). You may have to restart Nagios for the change to
take effect.

The file is plain text, pure Perl. p1.pl defines the subroutine that
Nagios is trying to have Perl call.

Finally, if you don't have a good reason to use embedded Perl, you
probably shouldn't be using it. If you don't want to use embedded Perl,
you can't turn it off at run time. The only option is replacing the
Nagios binary with one compiled without embedded Perl. The Dag Wieers
Redhat RPMs for Nagios all build with embedded Perl so if you use RPMs,
you would need to choose another RPM or hack the Dag SPEC file to not
build embedded Perl.



when i run from command line i response on 2 lines,
but it seems to be ok.

i dont know what to do.



Yours sincerely.


---
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnkkid7521bid$8729dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


RE: [Nagios-users] How many parent hosts in a parent directives ?

2006-05-21 Thread Stanley.Hopcroft
Dear Sir,

I am writing to thank you for your letter and say,

-Original Message-

I usually define HSRP as a child of the two parent routers 
that are participating. Both routers have to be down before 
the HSRP is marked unreachable. Devices behind those routers 
can then use the HSRP object as their parent. In cases where 
the HSRP address is closer to the Nagios server, obviously 
you'd flip the parent/child around


 Nagios
   |
   {Internet}
   |
 RtrA--+--RtrB
   \   /
\_/
   |
  RtrAB-HSRP
   |
 {other devices}

  



Firstly, thank you very much to all those who answered both on the list
and 
privately.

For the benefit of the archives, the consensus of the replies is that
the hosts behind
multiple routers (the canonical example being a subnet with two routers
sharing the
host gateway address with HSRP/VRRP, although a more common example may
be a single gateway that
has multiple paths back to the monitoring host. In this case, the
gateway will have one or more parents
corresponding to routers in each of the paths) should enumerate each of
the routers in the parents
directive.

Nagios checks each of the routers marked as a parent and if _any_ of
them are up, then
the host is marked as DOWN (probably a SOFT state); otherwise - if all
the 
parents are unreachable, then the host is marked as unreachable.

I think that Mr Eng's advice is consistent with this: the difference
being that the
HSRP is visble as a host and a service in the Nagios configuration and
that Nagios
will check the HSRP service by checking the reachability of the gateway
address (as
well as seperately checking the standby addresses).

Thank you,

Yours sincerely.


---
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid0709bid3057dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Re: Nagios 2.3.1, problems with perl plugins

2006-05-17 Thread Stanley.Hopcroft
Dear Folks,

I am writing to thank you for your letters and say,

-Original Message-

Message: 17
From: Michael =?iso-8859-1?q?H=FCttig?= 
[EMAIL PROTECTED]
Organization: MSP
To: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] RE: Nagios 2.3.1, problems with 
perl plugins
Date: Wed, 17 May 2006 15:40:22 +0200
Cc: [EMAIL PROTECTED]

Hi Stanley,
i=B4ve posted on monday the following message. I think there 
was a problem =  with=20 embedded perl. Which perl-plugins do you use?

I just upgraded my Nagios-2.0 to nagios-2.3. Both Versions 
were compiled wi= th=20 epn-support with 
--enable-embedded-perl --with-perlcache

Runnning nagios-2.0 all checks using perl-plugins 
(check_smart,=20 check_cisco_env, check_ifoperstatus and 
others) were doing fine.

Using nagios-2.3 i got the following errors:
check_ciscoenv.pl
;Cisco environmental health;UNKNOWN;HARD;3;UNKNOWN: Unable to 
resolve=20 destination address '-c'

check_load.pl
;Load;UNKNOWN;SOFT;2;**ePN 
/usr/local/nagios/libexec/check_load.pl: Argume= nt=20  
isn't numeric in numeric lt () at (eval 12) line 61,.

check_ifoperstatus.pl
;UNKNOWN;notify-by-email;**ePN 
/usr/local/nagios/libexec/check_ifoperstatus=
:=20
Option d requires an argument.

check_smart.pl
;S.M.A.R.T-Status;UNKNOWN;SOFT;1;**ePN 
/usr/local/nagios/libexec/check_smar=
t.pl:=20
Can't exec sudo: No such file or directory at (eval 15) line 119,.

check_traffic.pl
;Traffic ISDN-Interface;CRITICAL;notify-by-email;CRITICAL: 
Could not match=  ISDN Basic Rate Interface (S0)


Firstly thanks to Frederick and Michael for the notification about this
serious problem.

Unfortunately the situation as I see it is,

1 I am running 2.3 not 2.3.1 and so my limited Perl plugins (home-brew)
may not be picking up the problem. Also I lack a Nagios work bench at
the
moment so its going to be slow if heavy lifting is involved as it seems.

2 A quick glance at the CVS does not seem to show any relevant changes

2.1 there appear to be no changes in checks.c near the embedded Perl
code
2.2 the change to p1.pl was only to allow plugins to return more than
one
line of output (the nagios-snmp plugins do this I think).

3 If Michael or Frederick would enable the LOGGING options in the copy
of p1.pl they use for new_mini_epn (perldoc p1.pl should help).

IIRC you want to change this p1.pl to have

use constant  DEBUG_LEVEL = LEAVE_MSG ;

and make sure the plugin log path looks Ok.

This will leave messages like

Mon Mar  6 15:43:39 2006 run_package:
/usr/lib/nagios/plugins/check_rootport -H 10.0.254.167 -N BRUSW200
returning (0, Ok. No topology change: root port of
10.0.254.167/BRUSW200 has not changed from that expected: 513. See a
href=http://nms/cgi-bin/display_spanning_tree;Current spanning tree
graph/a.
).
Mon Mar  6 15:43:39 2006 eval_file:
/usr/lib/nagios/plugins/check_rootport already successfully compiled and
file has not changed; skipping compilation.
Mon Mar  6 15:43:39 2006 run_package:
/usr/lib/nagios/plugins/check_rootport -H 10.0.254.170 -N MTASW200
returning (0, Ok. No topology change: root port of
10.0.254.170/MTASW200 has not changed from that expected: 0. See a
href=http://nms/cgi-bin/display_spanning_tree;Current spanning tree
graph/a.
).
Mon Mar  6 15:43:39 2006 run_package:
/usr/lib/nagios/plugins/check_rootport -H 10.0.254.168 -N MTASW207
returning (0, Ok. No topology change: root port of
10.0.254.168/MTASW207 has not changed from that expected: 1. See a
href=http://nms/cgi-bin/display_spanning_tree;Current spanning tree
graph/a.
).

in the log file (also named in p1.pl).

This should provide some clues.

The only quick workaround that may be worth a _try_ is to replace pl.pl
for an older one from CVS (say 1.7).
However, I am not confident.

Perhaps my installation is a bit atypical: every one of the 2.0 series
(inc betas) has 
been in prod use with either heavy or light embedded Perl without a
hitch.

The last random thought is, could you have changed Perl or
Text::ParseWords around about the
time the problem started ? This module is responsible for argument
processing and this
appears to be breaking. OTOH, if it was the culprit, all versions would
be b0rked.

Good luck,

Yours sincerely.


---
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid0709bid3057dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] RE: CSV output out of availibility Reports

2006-04-06 Thread Stanley.Hopcroft
Dear Sir,

I am writing to thank you for your letter and say,

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
[EMAIL PROTECTED]
Sent: Friday, 7 April 2006 13:16
To: nagios-users@lists.sourceforge.net
Subject: Nagios-users digest, Vol 1 #3123 - 37 msgs

 ... 

Message: 5
Date: Thu, 6 Apr 2006 13:16:06 +0200
From: Sand Philipp [EMAIL PROTECTED]
Subject: [Nagios-users] CSV output out of availibility Reports

Hi there,

I remember a patch fort he avail.cgi, with which, you could 
generate a cs= v output out of each avail.cgi report. I 
already did some research in this list and with google, but I 
can't fin= d the patch any more. Can anyone please give me a 
hint, where I can download this patch? Has anyone tested this 
patch with the new version of the avail.cgi in Nag= ios 2.1?

avail.cgi has _always_ done CSV output (as long as you choose 
all hosts or all services).

It's up to you to filter the CSV by any means you care to choose.

If you want some help, try 

- either putting all the CSV records in a DB and use SQL

  (if you are serious about reporting you prob want to do this).

- Nagios::Report

  (munges and filters the CSV from avail.cgi).


Question for Ethan: why isn't this patch integrated into the 
avail.cgi by=  default? Is this planned for a future release?

Patches welcome.

BTW, there are bugs in avail.cgi relating to scheduled down time
that are probably more important than this.


Thanks in advance!

I probably didn't do your letter justice. It seems on reflection that 
you meant you want the patch for avail.cgi that generates CSV from
a specific availability report eg a new link in the report for 
a host or service that does CSV.

However, I think it has been made pretty clear from the Nag roadmap that
the CGIs have been end of lifed and will be replaced by PHP. I think
that
is a much better use of scare developer resources than trying
to fix difficult and fragile code.

As Radia Perlman said about sub-optimal routing, 'people should be
grateful
that their data is delivered at all'.

Yours sincerely.


---
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Ideas sought - reporting on monitored sites in TZs != Nagios TZ.

2006-04-03 Thread Stanley.Hopcroft
Dear Folks,

I am writing to invite ideas about how to report on availability in
cases where there is one
Nagios monitoring hosts located in other time zones (TZ).

In many cases, people want reports of availability in 'business hours'
and Nagios does that 
beautifully by (avail.cgi) accepting the time_period to compute
availability over.

However, 'business hours' at the Nagios site is not the same as that in
the other sites, so
outages that should be excluded are not and outages that should be
included are excluded.

The best I can think of is defining a 'business hours' time_period that
spans business hours
in all the time zones. This has the advantage of not excluding any
outage but the drawback
of including outages that fall inside the left hand side of the time
period (morning outages
at remote sites that are way to early to be included. They are included
because it is say 8
am at the site where Nagios is even though it may be 5 am at the remote
site).

Another solution that doesn't cut it for us is multiple copies of
Nagios, each reporting on
'business hours' in the TZ where each copy is located.

A real nasty hack would be to wrap the host_check plugin with logic that

1 determines where the host is

2 decides whether or not to pass the down back to Nag depending on the
locations TZ

Obviously this is useless since it suppresses outages you may want to
report on in a 24x7
view.

The only other idea I have is

1 abandon Nagios reporting

2 have a global host event handler log outages in a DB

3 report by accumulating outages (depending on TZ) from the DB.

None of these are attractive.

Any other suggestions are welcome.

Thank you,

Yours sincerely.


---
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] RE: Nagios::Report (end of dev cycle).

2006-03-20 Thread Stanley.Hopcroft
Dear Sir,

I am writing to thank you for your letter and say,

-Original Message-
Content-Disposition: inline

Hmm,

it would be a pitty not to develop this thing further. Like 
you say it provides the kind of reports management likes and 
up until now it is the only way I have found to create biased 
reporting.

What do I mean ? I mean I don't care about how many hours 
application or server xyz was down during the weekend. Those 
hours are of no importance to me in terms of management 
reports. The information and especially metrics outside of 
these hours need to be taken for administration purposes of 
course. But the weekend is used here for maintenance as is the 
case in many places. For my SLA's these hours should not be 
counted. Up until now your tool is the only tool I have found 
that can ignore data based on time-table= s in its reports.

So as far as I am concerned please keep up the good work.


The reason for saying that devel has prob ceased is that
I have run out of ideas for it and think it may have reached the 
limits imposed by some of the implementation choices.

OTOH, if there are things that need fixing or facilities I think can be
added, please let me know.

The other thing is that the modules reason for existence is to provide
an API 
for the only current source of availability data, namely avail.cgi.

avail.cgi is complicated and may not be maintained as responsively as
people 
may wish - I can't imagine submitting patches for it without a great
deal of hard
labour.

If on the other hand, somehow (as you say with the NEBs) outage details
can be stuffed by
the core into an RDBMS, then people can use all sorts of wonderful
software from the
DB/report world to generate very sophisticated reports.

In fact, when I heard about DBD::AnyData I thought this would blow the
original
API (of Nagios::Report) away because people would SELECT to their hearts
content.
As it turned out, the SQL implemented by the AnyData module is not as
powerful as
one would like and in fact is good for only basic filtering.

So my feeling is that Nagios::Report is a stop gap for me until
something better comes
along.

Another technique that I find useful is to have a Nag event handler
insert 
outage details into a table. This in principal allows one to
combine outages with (manual updates) with causes and commentary. 

Question though .. DB NEB modules ?


Thanks for your encouraging words.


Cheers,
Hans

Yours sincerely.


---
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Nagios-Report-0.002 on CPAN/NagiosExchange.

2006-03-16 Thread Stanley.Hopcroft
Dear Folks,

I am writing to say that Nagios::Report 0.002 has been 'released' and is
available at
the usual places.

This relase fixes a bug, and adds limited charting capability and a
weaker alternate interface (provided by the Perl DBD::AnyData module)
that allows client code to select the report data with SQL (the small
subset that AnyData accepts).

0.002 Fri Mar 17 14:44:36 EST 2006
- fix bug in mkreport() processing of MUNGE_CALLBACK (would not change
report values).

*** This entailed a change _non_ backward compatible change in the
MUNGE_CALLBACK interface.
*** Client code that calls the alter-() callback _requires_ changing.
*** The alter callback is now called with one parm, a ref to a hash of
the field values
*** indexed by field name. See examples/ for scripts that have been
changed.

- added to_dbh() method to allow DBD::AnyData provided use of SQL
(simple) on report data
- added primitive support for chart templates to excel_dump. The
workbook written by Spreadsheet::WriteExcel can contain _one_ (1) chart
of the availability data.

This project does not scale very well. It provides a limited capability
to provide a Data source for processing by Reporting tools such as
Excel. 

This module has probably reached the end of development (some may say it
would better have not started) apart from bug fixes.

If you are serious about reporting look at the DB NEB modules or Steve
Shipways stuff on NagiosExchange. This module provides however, a
limited capacity to provide reports in the format beloved by PHBs.

Yours sincerely.


---
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Potential bug in avail.cgi/Nagios 2.0.

2006-03-06 Thread Stanley.Hopcroft
Dear Folks,

I am writing to report a peculiar behaviour of the availability CGI with
Nagios 2.0

Firstly, I think the avail.cgi is a wonderful beast that turns the
Nagios logs into a very useful and desirable data source.

My reporting requirements are down times minus any down time scheduled.

I guess that total_time_down is in fact the sum of time_down_scheduled
and time_down_unscheduled.

But for some of my availability data (last month), I see (with an
unpublished SQL interface on top).

  DB29 p $SQL
SELECT host_name, total_time_down, time_down_scheduled,
time_down_unscheduled FROM tab_24x7 where total_time_down  0 and
time_down_unscheduled  1
  DB30 $s = $d-prepare($SQL)

  DB31 $s-execute   

  DB32 $s-dump_results  
'Lismore_Optus_router_PE_interface', '96712', '0', '96712'
'MTASW203', '800', '1', '4294958096'
'TODSW210', '429', '6771', '4294960954'
'TRASW202', '200', '7000', '4294960496'
'TRASW203', '1392', '5808', '4294962880'
'TRASW204', '1092', '6108', '4294962280'
6 rows

The problem is the too large values of time_down_unscheduled and the
fact that the
total_time_down is not the sum of sched and unsched downtime.

In this case, downtime was scheduled for the TOD\w+ and TRA\w+ hosts
(the Lismore entry is correct BTW).

What can I provide that may help the investigation progress ?

Yours sincerely.


---
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] RE: Nagios-users digest, Vol 1 #3076 - 9 msgs

2006-03-05 Thread Stanley.Hopcroft
Dear Folks,


Message: 8
To: Nagios-users@lists.sourceforge.net
From: John R. Daily [EMAIL PROTECTED]
Date: Sun, 5 Mar 2006 19:51:35 -0500
Subject: [Nagios-users] ePN: notification script

I've been trying to get a Perl script to work as part of the notify- 
by-email command as defined in minimal.cfg, and it's finally dawned  
on me that the RPM package I'm using has the embedded Perl  
interpreter compiled in.

(Is there a replacement for nagios -m in v2.0?  The documentation  
still refers to it, but it doesn't seem to work.)

Anyway, I'm not thrilled about having to deal with ePN for simple  
Perl utilities that aren't plugins, but I figured I could get it to  
work anyway.  However, now I'm less confident.


Give this a try with your Perl alternative to printf.

1 cd into /usr/bin (if RHEL; Nag bin path otherwise).

2 run new_mini_epn from there (the path to p1.pl should be set in the
new_mini_epn binary but is not).

(new_mini_epn has readline support so command line history and edit work
ok.)

eg

[EMAIL PROTECTED] bin]$ ./new_mini_epn 
plugin command line: /usr/lib/nagios/plugins/check_rootport -H
10.0.254.168
embedded perl plugin return code and output was: 0  Ok. No topology
change: root port of 10.0.254.168 has not changed from that expected: 1.
See a href=http://nms/cgi-bin/display_spanning_tree;Current spanning
tree graph/a.

plugin command line: /usr/lib/nagios/plugins/check_rootport -H
10.0.254.168
embedded perl plugin return code and output was: 0  Ok. No topology
change: root port of 10.0.254.168 has not changed from that expected: 1.
See a href=http://nms/cgi-bin/display_spanning_tree;Current spanning
tree graph/a.

plugin command line: /usr/lib/nagios/plugins/check_rootport -H
10.0.254.168
embedded perl plugin return code and output was: 0  Ok. No topology
change: root port of 10.0.254.168 has not changed from that expected: 1.
See a href=http://nms/cgi-bin/display_spanning_tree;Current spanning
tree graph/a.

plugin command line: /usr/lib/nagios/plugins/check_rootport -H
10.0.254.168
embedded perl plugin return code and output was: 0  Ok. No topology
change: root port of 10.0.254.168 has not changed from that expected: 1.
See a href=http://nms/cgi-bin/display_spanning_tree;Current spanning
tree graph/a.

plugin command line: q
embedded perl compiled plugin q with error: **ePN failed to open q:
No such file or directory at p1.pl line 168.
 - skipping plugin
plugin command line: 
That's all folks.
[EMAIL PROTECTED] bin]$ 

This is repeatedly running a Perl plugin check_rootport (simulating
reuse by ePN). It will display probs either

1 at compile time (ePN has some funny limits. See the mod_perl docco for
more info).

2 at run time.

Good luck.

Yours sincerely.


---
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] RE: Reporting bits and pieces.

2006-02-07 Thread Stanley.Hopcroft
Dear Folks,

Mainly off topic - related to Reporting

Firstly some observations and corrections to my recent letters about
Reporting.

1 Use of DBD::RAM to at one fell swoop download the Nagios all
hosts/services report and stash it
  in an in-core table (prior to filtering with SELECT and saving in an
RDBMS).

DBD::RAM no longer builds. It has been replaced by DBD::AnyData.

From the README of DBD::AnyDATA:

HISTORICAL NOTE: this module was formerly called DBD::RAM. Its name
was changed because many people were unaware that the module supports
file operations in addition to in-memory operations. See the Changes
file for a description of changes since the last release of DBD::RAM.

2 Another way of accounting for outages (apart from daily log file
parsing
or using avail.cgi directly) may be with an event handler that inserts
rows into a table with these columns 

HOST_NAME
HOST_DOWN
HOST_UP
OUTAGE

Since there are lot of good DB interfaces in various programming
languages
(Perl, Python, Ruby), this is pretty straight forward.

I only recently became aware of the $HOSTDOWNTIME$ macro that allows one
to
filter outages that are in scheduled downtime.

(At this low capability site, the event handler simply appends a row to 
a CSV file so that Excel can view/edit the data. While this is not great
it allows manual update with COMMENT and CAUSE [via Excel] so it allows
one in principal to combine problem analysis/resolution data with the
outage).

3 Would anyone like to post their Reporting schemas ?

/Mainly off topic - related to Reporting

and now back to our usual program ..

Yours sincerely.


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] RE: Reporting and misc rave.

2006-02-05 Thread Stanley.Hopcroft
Dear Folks,

Firstly thanks to all that answered either on the list or privately.

I will now attempt to emulate a journalling file system by summarising
the responses.

1 Having Nagios availability in a DB is a good thing.

Doing so reduces the cost of reporting since there is are data
representation/conversion problems and the
extraction can be done with SQL thereby minimising the script-hell
problem.


2 Availability data capture 

Mr Shipways approach is too process the Nag logs periodically with
private/in-house (AFAIK) code to extract the
entries of interest and insert them as rows in a table(s).

(Incidentally, this sounds very enterprising since the extraction code
has to deal with all the cases handled by
avail.cgi. The difficulty of extracting outages from the logs is why I
chose to use avail.cgi as a source of
availability data).

Other approaches include event handlers that insert a row at the end of
an outage. This is easy to code but 
unfortch since AFAIk, there is no macro that indicates if scheduled down
time was prevailing may require manual 
post processing to update the column 'IN_SCHED_DOWNTIME'.

3 Reporting

From Mr Shipway, Rouillard.

There are at least two DBs with ODBC connectors (SQL Lite and MySQL)
available.

This is very important since the availability of ODBC connectors make
available the wealth of
MS applications for 

  3.1 client programs eg update your DB with Excel

  3.2 reporting - use Excel charts for example

4 Re-use

Any site worth its salt will ultimately recognise the need for various
registries/directories that
reduce the cost of client coding.

Such registries/directories include

  4.1 Provider circuit IDs

  4.2 Addressing/subnets/VLANs etc etc

  4.3 Managed nodes

It would be helful if the Nagios config data could also be made
available as a DB.

Personally I think it would a bad thing if Nagios lost its template/text
driven 
config but the config data should be made available to other
applications so that there
is not the endless client code churn of mapping names between
applications.

One approach would be to use Al Tobeys Nagios::Config to load a config
DB.

Why is this useful ? At least one application is mapping structured node
names to
those used in Reports. What exec understands benrt200 ? What about
Bendigo ?


Thanks for your time.

Yours sincerely.


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] RE: Reporting and misc rave.

2006-02-02 Thread Stanley.Hopcroft
Dear Folks,

I am writing with mainly a rant about Graphing and Reporting.

Mainly OT rant

1 About graphing with Nagios

Why would one bother when 

  1.1 Cacti does such a good job

  1.2 Nagios could check the Cacti RRDs with either check_rrd, or by an 
  outboard (Cron scheduled) RRD poller that submits passive service
check results

  1.3 the graphs can be associated with Nag service checks by either

   - explict URL of the Cacti graph in the service check output

   - for the adventurous, a Wiki front end that displays some of the 
 Nag CGI service status and a link to the Cacti graph. 

  As a footnote, since Cacti supports RRD 1.2 with built-in supported
  Holt Winters forecasting RRAs, the poller could be smart and simply
  check the exception Data store to see if the current rate is in fact
  outside the normal seasonal variation (computed by the Holt Winters
  algorithm inside the RRD).

  Of course this would require the modification of the RRDs that Cacti
produces
  to add the HW RRAs (this doesn't require that the RRD content be
unloaded and
  reloaded IIRC).

2 About reporting

After writing a lot of code in Nagios::Report to extract and report on
Nagios availability data it occurs to me that a better way of doing
Reporting
is to

  2.1 put the availability data in a DB table (prob with an
auto-incremented index)

  2.2 use either

2.2.1 ad-hoc SQL queries, or

2.2.2 the reporting package of your choice (eg iReport)

I hope that Nagios::Report will be enhance to take advantage of
DBD::Ram, a Perl
module that very easily gets a CSV file with LWP and sticks it in an in
core
DB that can almost as easily be used as a Data source to insert rows
into the
DB of your choice (MySQL, or whatever).

/Mainly OT rant

and now back to our normal program.

Yours sincerely.


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] 2.0rc2 avail.cgi - possible bug in reporting of HOST DOWNTIME START/END EVENTS.

2006-02-02 Thread Stanley.Hopcroft
Dear Folks,

I am writing to report a possible anomaly/bug in avail.cgi for Nag 2.0
rc2

(RPM based on Dag Wieers for RHEL3).

The problem is that when a host has exited a period of scheduled
downtime, the 'Host log
entries' shown by avail.cgi look like

Event Start Time Event End TimeEvent Duration   Event/State Type
Event/State Information
01-02-2006 00:00:00 01-02-2006 14:27:54 0d 14h 27m 54s  HOST UP (HARD)
PING OK - Packet loss = 0%, RTA = 0.82 ms
02-02-2006 20:59:44 02-02-2006 20:59:44 0d 0h 0m 0s HOST DOWN (HARD)
CRITICAL - Plugin timed out after 10 seconds
02-02-2006 20:59:44 02-02-2006 21:06:53 0d 0h 7m 9s HOST DOWNTIME
START Start of scheduled downtime
02-02-2006 21:06:53 02-02-2006 22:59:44 0d 1h 52m 51s   HOST UP (HARD)
PING OK - Packet loss = 0%, RTA = 0.71 ms
02-02-2006 22:59:44 03-02-2006 11:35:39 0d 12h 35m 55s+ HOST DOWNTIME
END   End of scheduled downtime

and then the next time the Report is run the last line shows again how
long it was since the host
exited downtime (ie now minus the downtime end).

eg

Event Start Time Event End TimeEvent Duration  Event/State Type
Event/State Information
01-02-2006 00:00:00 01-02-2006 14:27:54 0d 14h 27m 54s HOST UP (HARD)
PING OK - Packet loss = 0%, RTA = 0.82 ms
02-02-2006 20:59:44 02-02-2006 20:59:44 0d 0h 0m 0sHOST DOWN (HARD)
CRITICAL - Plugin timed out after 10 seconds
02-02-2006 20:59:44 02-02-2006 21:06:53 0d 0h 7m 9sHOST DOWNTIME
START Start of scheduled downtime
02-02-2006 21:06:53 02-02-2006 22:59:44 0d 1h 52m 51s  HOST UP (HARD)
PING OK - Packet loss = 0%, RTA = 0.71 ms
02-02-2006 22:59:44 03-02-2006 11:41:51 0d 12h 42m 7s+ HOST DOWNTIME END
End of scheduled downtime

This looks a little peculiar to me. It's not a bug but unfortunately
violates the principle of least
surprise (don't know what I was expecting but ..) and for those of us
who mine the host log entries
it means some code modification.

The behaviour of the CGI seems Ok - the event duration is simply the
time to the last event - and seems reasonable.

Thanks for your time.

Yours sincerely.




---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Re: Nagios ignoring Perl Shebang

2006-01-26 Thread Stanley.Hopcroft
Dear Folks,

-Original Message-
From: [EMAIL PROTECTED] 
--__--__--

Message: 1
Date: Thu, 26 Jan 2006 23:12:03 +0100
From: Arno Lehmann [EMAIL PROTECTED]
To: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Nagios ignoring Perl Shebang - 
Was: Notification  script problems.. How do I debug

You've probably got a Nagios with embedded Perl running. There's a 
section in the manual with some hints how to write your Perl 
scripts in 
that case.

Arno

I don't think so. 

I haven't been following this thread so I don't have too much helpful to
say but embedded Perl doesn't care about the shebang.

If the plugin text contains the string '/bin/perl' - usually in the
shebang line, 
then the plugin is assumed to be Perl and is compiled by the Perl
compiler (called
by eval { }) once, and thereafter the in core op-codes executed without
recompilation.

In any case, all my plugins and all the standard plugins have a standard
shebang line
that works fine with embedded Perl.

The usual way to deal with _any_ misbehaving program that Nagios runs -
plugins, 
event handlers, the whole shebang - is to wrap the offender in a
reliable script that
captures the argv it was called with, invokes the program with those
args and then
logs args and stdout, stderr to somewhere convenient.

I know that this matter of wrappers has been discussed on this list
before
(my name and Andreas Ericsson).

Yours sincerely.


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] RE: iReport Was: RE: [Announce] Nagios_Simple_Report on

2006-01-23 Thread Stanley.Hopcroft
Dear Folks,

I am writing to thank you for your letter and say,

-Original Message-
Message: 1
Subject: RE: [Nagios-users] [Announce] Nagios_Simple_Report on 
NagiosExchange/CPAN.
From: Mels Kooijman [EMAIL PROTECTED]

Hi Hans,
=20
I use iReport, a good reporting tool 
http://ireport.sourceforge.net/index.php
=20
Mels
=20



iReport looks amazingly wonderful.

However, it appears to be a DB reporting tool.

Would you care to amplify on how you use it ?

Is it simply 

1 Periodic script to get the Nagios availability report (all
hosts/services) and load into
DB

2 Ad-hoc reports or canned iReport programs that report against the DB ?

Is the wizard up to it or are some Java skills needed ?

This is a _hot_ topic with most Nag users so your comments are welcome.

Yours sincerely.


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] [Announce] Nagios_Simple_Report on NagiosExchange/CPAN.

2006-01-19 Thread Stanley.Hopcroft
Dear Folks,

I am writing to announce that Nagios::Report, a Perl module to munge
data from the Nagios all hosts/services availability report is on CPAN
and NagiosExchange
(where it is called Nagios_Simple_Report).

The class treats the CSV data from the availability report as a flat
file database and allows

1 multiple reports corresponding to multiple time periods (eg 24x7 and
Business hours)

2 selection of rows - availability records - based on field values

3 specification of columns to appear in the reports

4 transformation of rows by adding columns computed from other col
values

5 sorting of the rows by col vals or functions of the column values

The report data can be output as

1 data on stdout

2 csv data (for use by an SQL processor such as DBI::CSV or simply to
load a database)

3 Excel spreadsheets (using the CPAN module Spreadsheet::WriteExcel
which you must install).

The module provides the capability to use ad-hoc or canned scripts (some
examples of which are included) to produce periodic reports; these 
scripts contain user specified callbacks to do the munging.

This site uses such scripts to generate monthly exception reports of all
the hosts that reported outages (as a spreadsheet), and in conjunction
with other tools such as Al Tobeys Nagios::Config module, an aggregate
report of availability per site (again as a spreadsheet with a bar
chart)
where the aggregation is done over the (dependent) nodes at a site and
the site names are extracted from the 'alias' attribute of the host 
configuration.

It is also useful for such things as 

[EMAIL PROTECTED] Nagios-Report-0.001_REL-DIST]$ host_down_report -h
'(?i)bendigo_optus' -t last9days

  24x7

HOST_NAME DOWNUP
OUTAGE

Bendigo_Optus_router_PE_i 11-01-2006 14:20:49 11-01-2006
14:25:59 5m 10s

Bendigo_Optus_router_PE_i 11-01-2006 15:06:22 11-01-2006
15:12:52 6m 30s

Bendigo_Optus_router_PE_i 11-01-2006 17:39:48 11-01-2006
17:48:02 8m 14s

Bendigo_Optus_router_PE_i 11-01-2006 17:55:22 11-01-2006
18:02:02 6m 40s

Bendigo_Optus_router_PE_i 11-01-2006 19:28:00 11-01-2006
19:33:10 5m 10s

Bendigo_Optus_router_PE_i 12-01-2006 11:40:23 12-01-2006
11:47:23 7m 0s

Bendigo_Optus_router_PE_i 12-01-2006 13:44:30 12-01-2006
14:04:40 20m 10s

[EMAIL PROTECTED] Nagios-Report-0.001_REL-DIST]$ 

In future, we will probably use this tool to load a database with the
monthly availability data.

4 accessors that make the raw or munged data available to other programs

This module does __NOT_ 

1 give you an SQL interface to the availability data

2 generate charts as such - at this stage it only generates workbooks or
flat file data.

Charts can be generated - relatively simply - using
Spreadsheet::WriteExcel (see
http://groups.google.com/group/spreadsheet-writeexcel/browse_thread/thre
ad/7bc303cb793ffebd/47de1b364366cf23?q=chartrnum=9#47de1b364366cf23 )
by

 - manually producing an Excel workbook with a chart linked to
worksheets containing data
- extracting the binary part of the workbook containing the chart macro
- generating with Spreadsheet::WriteExcel a new workbook that includes
the chart data from the last step
  and fills in the worksheet data that is linked to the charts.

However this requires standalone code 

3 give you a 'single sytem view' or 'business view' or any other buzz
word (unless your Nag monitoring provides that data)

Concluding notes.

This module is useful for me and may be for others. Nagios probaby needs
to have its availability data in a DB since
DBs have a huge range of reporting tools, DBs have standard syntax to
extract and munge data, and the
data conversion/parsing effort is less with DBs. That said, this module
can provide what management want and maybe
what they think they want, with less effort than doing it all from
scratch. 

The module _is_ on NagiosExchange but I made the fatal mistake of
uploading a file with a high version to CPAN
so it will probably take longer or have a different version number (like
.015).

Yours sincerely,


S Hopcroft

Data Communications
Dept of Education, Science and Training
Level 1, 240 City Walk
Canberra City  ACT  2601

 +61 2 6211 6110
Fax: +61 2 6123 6262

0412 766 832

[EMAIL PROTECTED]





---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and 

[Nagios-users] Reporting ideas sought.

2005-12-05 Thread Stanley.Hopcroft
Dear Folks,

I am writing to welcome clues about providing an itemised list of
outages and their causes from, 
'in some way', Nagios.

The Nagios availability report does ineed provide a useful list of
outages that can be wrapped and
processed to ones hearts content

(eg

HOST_NAME DOWN  UP
OUTAGE

Albany_DEST_router05-12-2005 04:10:59   05-12-2005 08:42:29   4h
31m 30s

Albany_Optus_router_PE_in 05-12-2005 04:10:59   05-12-2005 08:42:29   4h
31m 30s

Lismore_DEST_router   05-12-2005 16:11:30   05-12-2005 20:01:40   3h
50m 10s

Lismore_Optus_router_PE_i 05-12-2005 16:11:30   05-12-2005 20:01:40   3h
50m 10s

Kempsey_DEST_router   05-12-2005 13:16:39   05-12-2005 13:22:49   6m
10s

Kempsey_Optus_router_PE_i 05-12-2005 13:16:39   05-12-2005 13:22:49   6m
10s

Broken_Hill_Optus_router_ 05-12-2005 01:54:17   05-12-2005 01:57:27   3m
10s

Broken_Hill_DEST_router   05-12-2005 01:56:07   05-12-2005 01:57:27   1m
20s

)

but Nagios has AFAIK, no means of capuring event related data and
associating it with an outage
event to produce something like

HOST_NAME DOWN  UP
OUTAGE  CAUSE   COMMENT

Albany_DEST_router05-12-2005 04:10:59   05-12-2005 08:42:29   4h
31m 30s  1   BDR - down, provider

Albany_Optus_router_PE_in 05-12-2005 04:10:59   05-12-2005 08:42:29   4h
31m 30s  1   BDR - down, provider

Lismore_DEST_router   05-12-2005 16:11:30   05-12-2005 20:01:40   3h
50m 10s  2   router restart by power-on

Lismore_Optus_router_PE_i 05-12-2005 16:11:30   05-12-2005 20:01:40   3h
50m 10s  2   power failure

Kempsey_DEST_router   05-12-2005 13:16:39   05-12-2005 13:22:49   6m
10s  1   BDR - down, provider

Kempsey_Optus_router_PE_i 05-12-2005 13:16:39   05-12-2005 13:22:49   6m
10s  1   BDR - down, provider

Broken_Hill_Optus_router_ 05-12-2005 01:54:17   05-12-2005 01:57:27   3m
10s  5   dismiss

Broken_Hill_DEST_router   05-12-2005 01:56:07   05-12-2005 01:57:27   1m
20s  5   dismiss

In this case, cause is a coded value that classifies the fault and the
comment is free form text.

The best I can think of to create something like this is to

1 Append the outages to a file - possibly by having an event handler
run the code that extracts the outage from the availability CGI -
or better still all the data for an outage is prob provided by macros -
for the host or service and appending that to a file.

2 Have an admin edit the file and add the values when they become known.

The guts of the problem is Nagios does the right thing by automatically
changing the state of monitored entity; there is no opportuntity to 
'officially' close the 'fault' by collecting user-input and associating
it with an outage. Looked at another way, outages don't really exist as
first class objects (with their own methods and data). 

All comments are very welcome,

Yours sincerely.


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37alloc_id865op=click
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null