Re: [Nagios-users] check for the absence of a service

2009-07-09 Thread Wheeler, JF (Jonathan)
Use the negate plugin to reverse a check

Jonathan Wheeler 
e-Science Centre 
Rutherford Appleton Laboratory

 -Original Message-
 From: Kyle Dippery [mailto:k...@engr.uky.edu]
 Sent: 09 July 2009 16:08
 
 Is there an easy way to use nagios to check for the absence of a
 service?
 
 I want to have nagios monitor SMTP and a few other services on hosts
 that aren't supposed to be running them, and tell me if they
 suddenly get turned on.
 
 Is there a plugin for this, or a way to trick an existing plugin to
 make it work?  I suppose if nothing else I can write a wrapper for
 check_smtp or check_tcp to swap the OK and CRITICAL return values,
 but it'd be much easier if someone else has already done it...
 
 Cheers,
 Kyle
 --
 Kyle Dippery
 Engineering Computing Services  Phone: (859) 257-1346
 280 FPAT  0046  Fax:   (859) 323-3848
 
 UK - One Great Place to Work
 


---
 ---
 Enter the BlackBerry Developer Challenge
 This is your chance to win up to $100,000 in prizes! For a limited
time,
 vendors submitting new applications to BlackBerry App World(TM) will
have
 the opportunity to enter the BlackBerry Developer Challenge. See full
prize
 details at: http://p.sf.net/sfu/Challenge
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null
-- 
Scanned by iCritical.

--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] FW: NDO Utils Question

2009-06-29 Thread Wheeler, JF (Jonathan)
 -Original Message-
 From: Christopher McAtackney [mailto:crist...@gmail.com]
 Sent: 25 June 2009 16:33

  Results passing into the command pipe are stored.  The relevant
  parameters are in nagios.cfg; they are external_command_buffer_slots and
  check_result_buffer_slots - by default these are set to 4096 (see
  documentation within the configuration file).
 
 Great, that's what I was hoping for. Do you have any experience of
 setting this buffer to much higher values? Not that I necessarily
 intend to, but it's always useful to know the effects of pushing the
 system to its limits.

Routinely we have external_command_buffer_slots set to 40960 and 
check_result_buffer_slots set to 61440; this is because we have had problems 
with our SQL server (it gets very busy for other databases) which delays 
response to NDOUtils updates which fills up these buffers.  You can see the 
current, high-water mark and setting for these parameters by running command 
nagiostats (the highest we have reached recently was about 25k buffer slots 
used).

Jonathan Wheeler 
e-Science Centre 
Rutherford Appleton Laboratory
-- 
Scanned by iCritical.

--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: NDO Utils Question

2009-06-25 Thread Wheeler, JF (Jonathan)
 -Original Message-
 From: Christopher McAtackney [mailto:crist...@gmail.com]
 Sent: 25 June 2009 14:51
 
 Hi everyone,
 
 I have a quick question about Nagios and NDOUtils I was hoping someone
 could answer.
 
 What happens if the database that NDO Utils is using becomes
 unavailable? (e.g. the server has crashed).

The answer depends a lot on whether you are using Nagios v2 or Nagios
v3; we are using Nagios v2.11.  My understanding is that in Nagios v2,
the code that communicates with the event broker module in
single-threaded.  Therefore a problem with the SQL server can jam up
Nagios to the extent that it effectively stops running commands.  In
Nagios v3, threading has been rewritten and this problem no longer
exists.

 I'm assuming Nagios will continue to monitor and send notifications as
 normal, is this correct?

In Nagios v2 it seems that almost all activity is suspended when you are
using NDOutils and the MySQL server is unavailable; this continues until
the SQL server is restored.  One solution is to restart Nagios without
the broker_module.
 
 What about the service check results that would normally be passed to
 NDO Utils and then stored in the database? Are they queued somewhere?
 And if so, how is the capacity of this queue defined? If they are not
 queued, what happens? Will NDO Utils just throw an error for each
 result it tries to store in the database and fails? Will this affect
 the core Nagios process?

Results passing into the command pipe are stored.  The relevant
parameters are in nagios.cfg; they are external_command_buffer_slots and
check_result_buffer_slots - by default these are set to 4096 (see
documentation within the configuration file).

 Hopefully someone can provide some insight.

Hopefully this has

Jonathan Wheeler 
e-Science Centre 
Rutherford Appleton Laboratory
-- 
Scanned by iCritical.

--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] funny disk space message

2009-06-10 Thread Wheeler, JF (Jonathan)
From: Jeremiah Jester [mailto:jeremiahjes...@gmail.com] 
Sent: 09 June 2009 18:33

 Any one know why I'm getting this weird disk space message? 

 * Nagios *
 
 Notification Type: PROBLEM
 
 Service: DISK SPACE
 Host: prod
 Address: (ip)
 State: WARNING
 
 Date/Time: Mon Jun 8 23:52:12 UTC 2009
 
 Additional Info:
 
 DISK WARNING - free space: / 23146 MB (32 0node=99

What happens if you run the command (as userid nagios) on the system, as in:

ssh -l root prod
su - nagios
/usr/lib/nagios/plugins/check_disk -w WLIM -c CLIM -p /

where WLIM and CLIM are the warning and critical limits respectively.

What is the result of the command ssh -l root prod df -h / ?

Is the text given above exactly the output of the command ?

Jonathan Wheeler 
e-Science Centre 
Rutherford Appleton Laboratory



-- 
Scanned by iCritical.

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Mixing different versions of Nagios

2009-02-20 Thread Wheeler, JF (Jonathan)
With a master/slave(s) Nagios configuration, is it possible to run with
Nagios version 3 on the master and Nagios version 2 on the slaves, given
that the communication is by NSCA (slave returning results to master)
and NRPE (master checking that slaves are running)?  Is the other
arrangement possible (i.e. Nagios 3 on slaves and Nagios 2 on the
master)?

Jonathan Wheeler 
e-Science Centre 
Rutherford Appleton Laboratory
-- 
Scanned by iCritical.

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Nagios failed to notify and run event handler

2008-11-24 Thread Wheeler, JF (Jonathan)
Our configuration on the master server (running nagios 2.11) includes
the NDUUtils module which writes Nagios data into a set of MySQL tables.
The MySQL server is in a separate rack from the Nagios master server.
Late yesterday evening (Sunday) there was a network switch problem which
meant (among other things that you do not need to know about) that the
Nagios process lost contact with the MySQL server.  From that point on
there were no notifications nor event-handlers run.  My assumption is
that the loss of contact to the MySQL server caused the single-threaded
part of the Nagios process to stall until contact was restored; as a
result notifications and event-handlers did not run as they are also in
the single-threaded part of the code.  Is my assumption correct?  If
not, can anyone suggest an alternative explanation?  As far as I can
tell the Nagios process continued to run as the log continued to record
events - however log switching (at midnight) did not happen (also in the
single-threaded part of the code).

Jonathan Wheeler 
e-Science Centre 
Rutherford Appleton Laboratory
-- 
Scanned by iCritical.

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: Difference betwen down and unreachable

2008-07-18 Thread Wheeler, JF (Jonathan)
Copy for list

-Original Message-
From: Wheeler, JF (Jonathan) 
Sent: 18 July 2008 15:23
To: 'Stan Brown'
Subject: RE: [Nagios-users] Difference betwen down and unreachable

 -Original Message-
 From: Stan Brown [mailto:[EMAIL PROTECTED] 
 Sent: 18 July 2008 15:09
 To: Wheeler, JF (Jonathan)
 Cc: Stewart Flood
 Subject: Re: [Nagios-users] Difference between down and unreachable
 
 On Fri, Jul 18, 2008 at 01:04:53PM +0100, Wheeler, JF wrote:
   -Original Message-
   From: [EMAIL PROTECTED] On Behalf Of stan
   Sent: 18 July 2008 12:26
   
   I had a machine that was restored from an old backup tape, and did
not have
   it's external facing NIC configured for a few days last week.
Nagios
   reported it as down, rather than unreachable. How is this
determined?
  
  Down means the individual system if down, that is, the host check
has
  failed.
  
  Unreachable means not possible to test because the parent of the
host
  has failed (maybe a switch)
 Thanks.
 
 I did not realize that Nagios was sophisticated to understand that a
device
 could be dependent upon another device. Neat, I will look into how
to
 configure this functionality

Look at the parents directive under the host definition.  Note that
you can also define service dependencies as well - see the documentation
for more details

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: nagios actions

2008-07-15 Thread Wheeler, JF (Jonathan)
-Original Message-
From: [EMAIL PROTECTED] On Behalf Of Melanie
Pfefer
Sent: 15 July 2008 09:04

 Can nagios trigger an action when an alert is received?

 For example, if /var is at warning, can nagios execute a script that
cleans the logs?

Look at event handlers in the documentation; these do exactly what you
require.  Your service needs to specify the event handler script and
also have event_handlers_enabled=1.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Monitoring large (ish) numbers of servers with exceptions to the rules...

2008-06-17 Thread Wheeler, JF (Jonathan)
 -Original Message-
 From: nagios-users On Behalf Of Matthew Macdonald-Wallace
 Sent: 17 June 2008 13:14
 
 I currently help maintain and monitor around 50 servers across various
 parts of the UK using Nagios 2.  At the moment, we have a
configuration
 file for each host (%hostname%.cfg) and in that file we specify all
the
 services for the named host.
 
 We are trying to reduce the number of configuration files as we take
on
 more and more servers because there are a large number checks that we
 need to be rolled out to all servers and we feel that we are
 duplicating our workload.
 
 I'm open to ideas on how to achieve this however my thoughts were a
 setup along the lines of the following:
 
  - A master host template is created in which all services are
defined
for a host.
 
  - If a check does not need to be run for a given host (for example it
is not a web server), a stanza is added to that particular host's
config file that effectively tells nagios don't check for this
service on this host
 
 I've tried defining all the services in a master templates file and
 this works perfectly however when I come to exclude certain services,
I
 am at a loss on how to do it.
 
 Initially I tried adding a stanza with the same service name and
 register 0 as one of the options, however this didn't work.
 
 We have used HostGroups in the past to achieve a similar goal, however
 we ran into the issue that whilst we need to check the CPU Usage on
all
 of the servers, a few of the servers that we monitor can take a lot
 more of a beating than the majority.  This lead to us defining the CPU
 checks on a per-host basis as if we defined it separately from the
 hostgroup for the more powerful servers we presented with a load of
 errors regarding duplicate service names.
 
 I hope I've made myself clear on what we're after and I look forward
to
 receiving your input on this.

One thing that I use in the configuration that I maintain is to have
something like this:

define service{
use generic-hung-mounts
hostgroup_name  experiments
hosts   !lfc0448
contact_groups  experiments
}

where lcg0448 is a host in host group experiments and I want to
apply the generic-hung-mounts check to all hosts in that group except
for lcg0448.

This can lead to configuration like this:

define service{
use check-pbs-offline
hostgroup_name  workers
hosts   !lcg0614,!lcg0617,!lcg0618,!lcg0626
contact_groups  tier1a
}
define service{
use check-pbs-offline
hosts   lcg0614,lcg0617,lcg0618,lcg0626
contact_groups  tier1a,grid-team
}

where the only difference is that the hosts in the second definition
have a second contact group.

HTH

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] No output from plugin

2008-06-09 Thread Wheeler, JF (Jonathan)
I have a new command which I have implemented as a Nagios plugin.
Running the command as user nagios on the client gives the correct
output (currently the string  : test ()) and return code 2 which is
what I require.
Running the command /usr/bin/nagios/plugins/check_nrpe -H CLIENT -t 30
-c COMMAND on a Nagios server (in this case a slave server) also gives
the correct output and return code.  Running the command as a plugin
gives the reply (No output from plugin).  I have checked that the
script puts its output to standard output and have added the line use
lib /usr/lib/nagios/plugins; use plugins to the command script (as
suggested by the FAQ) without any change.  Does anyone have any
suggestions to correct this problem ?

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Nagios alarms going from WARNING to CRITICAL state

2008-05-29 Thread Wheeler, JF (Jonathan)
We are running Nagios 2.11.  When a check fails, Nagios configuration
allows a number of retries of the check before the error becomes HARD;
we find that this works well for checks which start OK and go
CRITICAL.  However does the retry mechanism apply when a check goes
from WARNING to CRITICAL ?  In other words, if a check is in
WARNING state and then goes CRITICAL, does it first become
CRITICAL/SOFT, or does it become CRITICAL/HARD straightaway ?

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: Problems with nagios

2008-05-02 Thread Wheeler, JF (Jonathan)
-Original Message-
Sent: 14 March 2008 10:52
To: nagios-users@lists.sourceforge.net

 In the past I have reported problems when our master server has failed
 with Out of memory problems caused by all server memory and swap
space
 being used up.  I have largely (but not completely) solved these by
 increasing the number of Command and Check result buffers.

Regular readers of the list will remember that reported this problem
which was affecting our nagios installation.  I finally solved the
problem about a month ago.  The key is that I am using the NDOUTILS
package to write the Nagios logs and configuration to a MySQL database.
On the MySQL server there is a cron job which uses a program called
mysqlhotcopy to create a snapshot of all of the MySQL databases.  It
does this by locking the tables whilst they are being copied.  This
causes the Nagios daemon on the master server to wait until the latest
write request to MySQL is completed.  Whilst the Nagios daemon is
waiting the NSCA daemon is busy writing results to the command file
which cannot be processed until the MySQL table locks are released.
However the number of commands is too many to be processed before the
command reaper starts again.  This uses up command buffer slots and
eventually the system runs out of memory and swap space, processes are
killed by the OOM hander (Linux OS) and possibly the system crashes
because all memory is used up.  The solution to the problem was to
exclude the nagios database from consideration by the mysqlhotcopy
backup (there is a configuration option to do this).  The lesson to
learn is that when there is a problem you need to consider what is
happening on all the computer systems involved in Nagios.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Nagios service and host event handlers during service and host down time

2008-05-02 Thread Wheeler, JF (Jonathan)
We have started using Nagios service and host event handlers to trigger
24 hour callouts for our critical hosts and services.  However today we
had a situation when a host was put into downtime, but callouts were
triggered for a number of services on this host.  Does a host downtime
period have any effect on service checks on that host ?  Do we have to
put the services into downtime as well if the host is still up and known
to Nagios ?  I did check the on-line documentation but I could not see
any explanation of situations like this.  Any help would be much
appreciated.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory



-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nrpe timeout after 10 seconds

2008-03-13 Thread Wheeler, JF (Jonathan)

-Original Message-
From: nagios-users On Behalf Of Meylikhov
Sent: 13 March 2008 12:04

 I have 4 linux servers that are monitored by nagios. Sometimes I get
notification on my 
 contact e-mail:
 CHECK_NRPE: Socket timeout after 10 seconds. 
 Notifications stating that nrpe timed out come for ALL services and
for ALL hosts randomly
 every 1-2-3 hours.
 Then I get another notification stating that everything is fine. This
flapping events take
 place every 1-2-3-4-5 hours randomly.
 Nagios and monitoring servers are situated in the same network,
therefore I have no
 intermediary between monitoring servers and nagios.
 Can you help me to diagnose what's wrong? Can I increase socket
timeout variable on my
 nagios server? I think it could help.

There is a timeout on the nrpe command which you can set using -t option
(default is 10 secs).  Try adding -t 30 to your nrpe command, probably
in checkcommands.cfg

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Checking for a stale NFS connection

2008-01-30 Thread Wheeler, JF (Jonathan)
 -Original Message-
 From: [EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
 Sent: 29 January 2008 21:28
 
 Anyone have an idea on how to have nagios check for a stale 
 NFS network connection?

We use the attached plugin

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory


check_stale_nfs.sh
Description: check_stale_nfs.sh
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Timeouts for send_nsca program

2007-10-09 Thread Wheeler, JF (Jonathan)
In /var/log/nagios/nagios.log on (at least) one of my slave servers, I
am seeing messages like:

[1191905959] Warning: OCSP command
'/usr/lib/nagios/plugins/tier1/submit_check_result.sh HOST
SERVICE_CHECK OK MESSAGE for service SERVICE NAME on host HOST
timed out after 5 seconds

There have been 712 occurrences today (so far).  Can anyone offer an
explanation ?  As far as I can tell there is no configuration to
increase the timeout limit (can it be increased by installing from
source ?), but perhaps the message indicates another problem (network ?)

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: Problem with NDOUtils 1.4b6 and MySQL

2007-10-03 Thread Wheeler, JF (Jonathan)
-Original Message-
 From: [EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
 Sent: 02 October 2007 14:52

 Thank you Mr Wheeler and Hugo for your advice. I have snipped the
output of the suggested command from
 Mr. Wheeler. 

 checking for mysql_store_result in -lmysqlclient... no 

 *** MySQL library could not be located... ** 

Do you have the mysql-devel RPM installed ?  This RPM contains the
/usr/include/mysql files and would be required by the build.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: Problem with NDOUtils 1.4b6 and MySQL

2007-10-02 Thread Wheeler, JF (Jonathan)
-Original Message-
From: [EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: 02 October 2007 14:17

 I have been trying this for a day now and it is time to ask for some
help. I have included the full 
 output of the configure, as well as RPM output and directory listings.
Any help would be greatly 
 appreciated. It seems that NDO cannot find what it is looking for in
regards to mysql yet AFAIK
 everything is there. Please advise if something is missing, or if I
should compile mysql from source, or
 any other fix. Here is all the relevant information I can think of
this morning: 

 ~/ndoutils-1.4b6 # ./configure --with-mysql-lib=/usr/lib/mysql 

Use the following make command:

./configure --with-mysql-lib=/usr/lib/mysql
--with-mysql-inc=/usr/include/mysql --disable-pgsql

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] FW: Network tuning for Nagios with slave servers

2007-09-07 Thread Wheeler, JF (Jonathan)
-Original Message-
 From: Andreas Ericsson [mailto:[EMAIL PROTECTED] 
 Sent: 07 September 2007 13:03

 Wheeler, JF (Jonathan) wrote:
 Our configuration is quite large (830 hosts, 160700+ services),

 You run more than 193 checks against each host? Good gods, you must
 be *really* curious about the state of those hosts :)

Oops, I meant 16700+ services !

 Nope, but you could try doing

   sysconf net.ipv4.tcp_fin_timeout=30

 to halve the default tcp timeout in the kernel, which should reduce
 the number of half-open connections you have.

Thanks for the suggestion.  We are beginning to suspect a switch issue
as there are other applications that are suffering packet loss in
various ways.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Network tuning for Nagios with slave servers

2007-09-07 Thread Wheeler, JF (Jonathan)
Our configuration is quite large (830 hosts, 160700+ services), so have
implemented a master/slave configuration for Nagios (the Nagios servers
are running Linux).  The master server only runs checks if a check
becomes stale; i.e. it should have been checked by a slave but no result
has been received, but I find that (for example), in the last days log
there are 80,000 + warning messages saying the master has run a check
because it has become stale.  On further investigation I find that on
all of our 5 slaves the command netstat shows that there are a large
number of TCP sockets in CLOSE_WAIT state (more .  My question is, has
anyone done any network tuning to improve Nagios network performance ?

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: Network tuning for Nagios with slave servers

2007-09-07 Thread Wheeler, JF (Jonathan)
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of [EMAIL PROTECTED]
Sent: 07 September 2007 14:22

 Could always try:
 
 net.core.rmem_max = 16777216
 net.core.wmem_max = 16777216
 net.ipv4.tcp_rmem = 4096 87380 16777216
 net.ipv4.tcp_wmem = 4096 65536 16777216
 net.ipv4.tcp_timestamps = 0
 net.ipv4.tcp_sack = 0

Thanks for the suggestions.  We are beginning to suspect a switch issue
as there are other applications that are suffering packet loss in
various ways.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Using contributed check

2007-08-22 Thread Wheeler, JF (Jonathan)
We have recently started trying to use the contributed plugin
/usr/lib/nagios/plugins/contrib/check_snmp_process_monitor.pl to run
checks from our Linux Nagios servers on a Solaris system.  Using perl
from userid nagios we get successful output:

[EMAIL PROTECTED] ~]# su - nagios
-sh-3.00$ cd /usr/lib/nagios/plugins/contrib/
-sh-3.00$ perl check_snmp_process_monitor.pl -H 130.246.183.131 -C
public -e arrayd -w 0,3 -c 1,2 -s --memory --cpu
OK - 1 process(es) found resembling
'arrayd'|count=1:memory=1216:cpu=0.08

However Nagios returns this text:

**ePN /usr/lib/nagios/plugins/contrib/check_snmp_process_monitor.pl:
Reference found where even-sized list expected at (eval 1) line 194,. 

Now I understand that the problem is that the code is not compatible
with Embedded Perl Interpreter in Nagios, but an someone help me further
understand and solve this problem.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: Monitoring service for every machine in hostgroupEXCEPT ONE

2007-07-24 Thread Wheeler, JF (Jonathan)
-Original Message-
From: [EMAIL PROTECTED] On Behalf Of Kelly
Jones
Sent: 23 July 2007 02:55

I did not see any reply to this message, so here is my effort:

 I've created a hostgroup of 20 machines, and want to monitor 10
 services on each machine (easy). I now want to monitor an 11th service
 on 19 of the 20 machines. What's the best way to do this?

 Two ugly ways I don't like:

 % Create a separate hostgroup for the 19 machines I do want to
monitor.

 % Monitor the service on all 20 machines, but schedule infinite
 downtime for the service on the 20th machine.

 Is there a better way?

Yes.  In the configuration for the service, use an entry like this:

define service{
use generic-service # From a template
or add other options
hostgroup_name  mygroup # Group of 20 machines
hosts   !nothisone  # Machine not to
be tested (note the !)
contact_groups  us  # 
}

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: Problems with distributed setup, master overload?

2007-06-13 Thread Wheeler, JF (Jonathan)
-Original Message-
From: nagios-users On Behalf Of Jeffrey Lensen
Sent: 10 June 2007 08:28

 I recently extend our distributed Nagios setup of 1 master and 2
distributed slaves (in 
 which the master also had a lot of checks running), to 1 master and 5
distributed slaves
 (in which the master does no checking at all, except for host checks).

 This setup had 556 hosts and roughly 7000 service checks. Ever since I
modified this
 setup, the Nagios master host has been giving me problems. 

 The symptoms:
 - When starting both Nagios and NSCA, I see NSCA accepting checks in
my logfiles, but none
 get processed by Nagios.
 - After a few minutes NSCA processes are starting to build up,
increasing with 5-10
 processes per second. In a few minutes it reaches a few thousand
processes and the machine
 starts hanging.
 - Sometimes the number of Nagios processes start increasing, instead
of the NSCA
 processes. Same result, the machine starts hanging.

I have seen similar problems, though in my case (1 master, 2 slaves, 824
hosts, 16000+ services) the queued NSCA processes are eventually
flushed.  However the Nagios master server also suffers from memory
leaks; it eventually (after a period of 1 - 5 days) crashes with a
kernel panic because there is no free memory or reaches a state where
the kernel has killed all useful processes (e.g. nagios, nsca, sshd,
ntpd, etc) in attempt to cure OOM (Out Of Memory) problems.
Interestingly trying to strace the first daughter nsca process seems to
bring everything into life and the queue of NSCA processes quickly
flushes.

I have tried running nagios using option -s to get configuration
recommendations and nagiostats to get usage information on both master
and slave servers, but they do not reveal anything useful.  My current
plan is to introduce 3 more slave servers as I have heard that this
helps.

Any comments would be helpful to me as well.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Service check timing out

2007-06-05 Thread Wheeler, JF (Jonathan)
I have a service check that takes more than 60 seconds to run.  Despite
calling check_nrpe with option -t 120, the check times out with the
message NRPE: Command timed out after 60 seconds.  The parameter
service_check_timeout in nagios.cfg is set to 120 seconds as well.
Any ideas ?  Is there a maximum timeout in check_nrpe ?

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: Question about Freshness Checking

2007-05-23 Thread Wheeler, JF (Jonathan)
There are several things that I do in situations like this, usually on
the master server:

a) Acknowledge the service or host problem which will prevent
notifications
b) change the configuration to suppress the service check for this host
or remove the host from the configuration and restart Nagios on both
host and slave (distributed) servers
c) I believe that you can also schedule downtime for either host or
specific service

Of course in each situation above you have to remember to reverse the
change once the service/host is available again.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Shumard - DefenseWeb Technologies
Sent: 22 May 2007 16:17
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Question about Freshness Checking


I didn't here anything back on my issue or question.  If anyone has
information on this I would appreciate it.

Thank you

-Original Message-
From: [EMAIL PROTECTED]
I am running a Distributed Nagios configuration.  On each of my passive
service checks I am also doing freshness checks just encase the
distributed host goes down and can't run the check.  I am able to log
into the distributed hosts Web Interface and shut off active checks if I
don't want to run checks for a temporary amount of time on a specific
hosts and it is service with one click to disable active checks for all
services.  This works with out any problems but once my freshness checks
is hit the Centralized Nagios hosts starts doing the active checks
because it doesn't receive an update from the Distributed Hosts.  I am
aware this is what should be happening and it is working great.  Is
there a way to disable the freshness check for all the services for a
host just like you can for active checks?  I know if I shut off
receiving passive checks for one service this disables the freshness
checks.  Has someone configured a patch or know how to activate this
feature to disable passive checks for all services on a host through the
Nagios cgi.

Jeff


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: wild cards with exceptions?

2007-05-08 Thread Wheeler, JF (Jonathan)
-Original Message-
From: nagios-users On Behalf Of dave stern - e-mail.pluribus.unum
Sent: 08 May 2007 16:12

 I'm trying to streamline my nagios config using wildcards.
 Unfortunately, not all services I wish to define via wildcard
 follows a clean set of rules. Is it possible to define a service
 with a host list of something like *,!linux1, !linux2

 I suspect the answer is no and what I'd need to do is use
 a combination of hostgroups and hosts eg
 define service {
 hostgroup  unix, ultrix, sco
 service_description 
 }
 define service {
 host_name host1, host2, host3, host4
 ...
 }

 Anyone find a way around this?

I have found that within the same service definition I can use both
hostgroup and host_name records, specifically I have definitions like:

define service {
service_description 
hostgroup  unix, ultrix, sco
host_name !host1, !host2, !host3, etc

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Master and slave servers for Nagios

2007-04-25 Thread Wheeler, JF (Jonathan)
As I have reported in the past I have 2 slave servers and a master
server; all checks should be run from the slave servers and passed back
to the master server.  I have been recently trying the understand why
the master server still has kernel Out of memory problems such that
the kernel starts killing active processes and, in some cases, panics
because there are no more processes to kill (this happens perhaps once
or twice per week usually around 4:50 - 5:10 in the morning).  As part
of my investigations I have noticed that for a typical host 40% of tests
are reported from the slave and 60% are run by the master.  I can tell
this because 40% of messages for this typical host in /var/log/nagios on
the master server begin EXTERNAL_COMMAND and 60% of messages begin
Warning:.   My question is why this should be ?  Here is a copy of
nagios.log from the master server for one test of one host for today (so
far):

[1177369200] CURRENT SERVICE STATE: csflnx119;SPACE_TMP;OK;HARD;1;DISK
OK - free space: /tmp 672 MB (70% inode=99%):
[1177369894] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 41 seconds (threshold=1817 seconds).  I'm
forcing an immediate check of the service.
[1177370925] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;csflnx119;SPACE_TMP;0;DISK OK - free space:
/tmp 672 MB (70% inode=99%):
[1177373014] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 43 seconds (threshold=2052 seconds).  I'm
forcing an immediate check of the service.
[1177374874] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 43 seconds (threshold=1816 seconds).  I'm
forcing an immediate check of the service.
[1177376734] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 41 seconds (threshold=1817 seconds).  I'm
forcing an immediate check of the service.
[1177377158] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;csflnx119;SPACE_TMP;0;DISK OK - free space:
/tmp 672 MB (70% inode=99%):
[1177379494] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 33 seconds (threshold=2305 seconds).  I'm
forcing an immediate check of the service.
[1177381354] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 39 seconds (threshold=1818 seconds).  I'm
forcing an immediate check of the service.
[1177383214] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 43 seconds (threshold=1816 seconds).  I'm
forcing an immediate check of the service.
[1177387073] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;csflnx119;SPACE_TMP;0;DISK OK - free space:
/tmp 660 MB (68% inode=99%):
[1177389102] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 13 seconds (threshold=5089 seconds).  I'm
forcing an immediate check of the service.
[1177390507] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;csflnx119;SPACE_TMP;0;DISK OK - free space:
/tmp 660 MB (68% inode=99%):
[1177392635] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 11 seconds (threshold=2118 seconds).  I'm
forcing an immediate check of the service.
[1177394495] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 39 seconds (threshold=1818 seconds).  I'm
forcing an immediate check of the service.
[1177396362] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 36 seconds (threshold=1823 seconds).  I'm
forcing an immediate check of the service.
[1177397210] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;csflnx119;SPACE_TMP;0;DISK OK - free space:
/tmp 660 MB (68% inode=99%):
[1177399813] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 47 seconds (threshold=2562 seconds).  I'm
forcing an immediate check of the service.
[1177401674] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 40 seconds (threshold=1818 seconds).  I'm
forcing an immediate check of the service.
[1177403749] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 28 seconds (threshold=1931 seconds).  I'm
forcing an immediate check of the service.
[1177404093] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;csflnx119;SPACE_TMP;0;DISK OK - free space:
/tmp 660 MB (68% inode=99%):
[1177406037] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 42 seconds (threshold=1902 seconds).  I'm
forcing an immediate check of the service.
[1177410112] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 184 seconds (threshold=2853 seconds).  I'm
forcing an immediate check of the service.
[1177410863] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;csflnx119;SPACE_TMP;0;DISK OK - free space:
/tmp 660 MB (68% inode=99%):
[1177413485] Warning: The results of service 'SPACE_TMP' on host
'csflnx119' are stale by 30 seconds (threshold=2579 seconds).  I'm
forcing an immediate check of the service.
[1177415948] Warning: The results of service 

Re: [Nagios-users] Master and slave servers for Nagios

2007-04-25 Thread Wheeler, JF (Jonathan)
 -Original Message-
 From: Jason Qualkenbush [mailto:[EMAIL PROTECTED] 
 Sent: 25 April 2007 11:55
 
 Wheeler, JF (Jonathan) wrote:
  As I have reported in the past I have 2 slave servers and a master
  server; all checks should be run from the slave servers and passed
back
  to the master server.  I have been recently trying the understand
why
  the master server still has kernel Out of memory problems such
that
  the kernel starts killing active processes and, in some cases,
panics
  because there are no more processes to kill (this happens perhaps
once
  or twice per week usually around 4:50 - 5:10 in the morning).  As
part
  of my investigations I have noticed that for a typical host 40% of
tests
  are reported from the slave and 60% are run by the master.  I can
tell
  this because 40% of messages for this typical host in
/var/log/nagios on
  the master server begin EXTERNAL_COMMAND and 60% of messages begin
  Warning:.   My question is why this should be ?  Here is a copy of
  nagios.log from the master server for one test of one host for today
(so
  far):
 
 Sounds like this has to do more with the freshness of the passive 
 check.  If the master server thinks the check isn't fresh, it will
then 
 run an active check to see for itself.  I'd tune in the freshness, and

 keep in mind the scheduling of the checks.  If you configure your 
 freshness to expire at five minutes, and the slave server schedules
that 
 check for once every six minutes, you are going to get behaviour like
you 
 mentioned.

Thanks for your reply.  However the tests are scheduled to run every 30
minutes on both master and slave servers (confirmed by checking in
retention.dat file).  If you look in the original message you will see
that the master server is correctly running the command by freshness
checking (Warning messages) every 30 minutes, but the slave results
are at longer intervals (EXTERNAL messages) though roughly at some
number of 30 minute intervals.
What are the possibilities for results from command issued by the slave
getting lost ?  Why are OK results not recorded in the slave server logs
?

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] no output returned from plugin

2007-04-25 Thread Wheeler, JF (Jonathan)
-Original Message-
From: nagios-users On Behalf Of Valdinger, Stephen (DOV, MSX)
Sent: 25 April 2007 14:21

 Any ideas as to what could be causing this???

Usually because the plugin has returned nothing on STDOUT.  So has the
plugin worked before ?  Does it work if run by hand on the system being
tested for user name nagios ?  Is the test failing in an unusual way ?

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Option append_to_file in nsca.cfg

2007-03-15 Thread Wheeler, JF (Jonathan)
 -Original Message-
 From: [EMAIL PROTECTED] On Behalf Of Marc
Powell
 Sent: 13 March 2007 16:17
 
  -Original Message-
  From: nagios-users On Behalf Of Wheeler, JF (Jonathan)
  Sent: Tuesday, March 13, 2007 9:59 AM
  To: nagios-users@lists.sourceforge.net
  Subject: [Nagios-users] Option append_to_file in nsca.cfg
  
  As I have said before my configuration consists of 1 master server
and 2
  slaves with about 700 hosts and 16000 checks.  In the file nsca.cfg
  which configures the nsca daemon, there is an append_to_file option
  which is (by default) set to 0 for writing to the command file
rather
  than 1 for appending to it.  Please would someone explain why
appending
  to the command file is deprecated.  I ask because I can have several
 
 Semi-educated commentary follows -- The 'command file' is more
properly
 named a 'command pipe'. It's not a real file and therefore appending
to
 it makes no sense. A pipe is essentially a FIFO buffer. Data is
written
 to it by one process and read by another in a sequential fashion. If
the
 reading process can't keep up with the writing process, your kernel
will
 buffer the writes up to a point depending on the OS. For linux kernel
  2.6.11 the buffer was 4096 bytes. For  2.6.11, the buffer is 65535
 bytes. Nagios also has its own internal buffers to help process the
pipe
 faster. With nagios-2.7, these are controlled by the
 external_command_buffer_slots option in nagios.cfg. You can also
control
 how often nagios checks for data in the pipe with the
command_check_interval
 setting. You certainly want that to be -1 and not
 every 4 seconds. -1 tells nagios to check as often as possible.
 
 Depending on your check frequency, it sounds like nagios isn't able to
 keep up with your check submissions, almost certainly related to your
 checking the pipe every 4 seconds only. At ~100 bytes per check, you
 could only accept 40 results in 4 seconds before dropping. If you're
 doing 16,000 checks every 5 minutes that's ~213 check results every 4
 seconds. You can do the math based on your actual sizes/intervals...
 
 Verify that you have a good amount of buffer slots (use nagiostats to
 see current utilization) and that you're checking external commands as
 fast as possible.
 
 I'm only doing 1/4 of the passive checks you are so you may be hitting
 limits that I haven't experienced yet but it doesn't appear so at this
 point.

Sorry, I think that I have confused the discussion by not appreciating
the difference between service_reaper_frequency (which is 4 secs) and
command_check_interval (which is -1).
After restarting nagios this morning (Thur 15/03 - the master server had
panic'ed due to lack of memory), I issued the command wc -l
/var/log/nagios/rw/nagios.cmd and got the answer 1003 (this is command
pipe).  If I understand you correctly, there were 1003 commands in the
pipe waiting to be processed by the server (understandable as the master
server had just restarted and the slaves had plenty of commands waiting
to be processed), but the operation of command wc actually discarded
these commands by reading through the pipe.

At present I am running nagios 2.6.  I want to upgrade to nagios 2.8,
but as I also use ndoutils I need to compile the latest version of that
and update the SQL tables that it writes.  If the problem does not go
away with the latest version of the server, I will raise the issues
again, but nay other comments would be much appreciated.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Option append_to_file in nsca.cfg

2007-03-13 Thread Wheeler, JF (Jonathan)
As I have said before my configuration consists of 1 master server and 2
slaves with about 700 hosts and 16000 checks.  In the file nsca.cfg
which configures the nsca daemon, there is an append_to_file option
which is (by default) set to 0 for writing to the command file rather
than 1 for appending to it.  Please would someone explain why appending
to the command file is deprecated.  I ask because I can have several
nsca processes running every second; if each of them writes to the
command file, the output from previous nsca processes has been lost;
this would explain why my master server issues so many tests itself
because test results become stale.  I should add that the command file
is a pipe and it is reaped every 4 seconds.  Perhaps I am
misunderstanding something, about the nature of a pipe ?  Or perhaps the
documentation in the configuration file is misleading/wrong ?

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Out of memory failures on Nagios master server

2007-03-09 Thread Wheeler, JF (Jonathan)
My configuration has a master server and 2 slave servers with about 730
hosts and 16000 service checks.  All our systems are running Linux.  For
some time now the master server has been running out of memory between
4:50 and 5:00 such that the server either kernel panics (rarely) or it
kills all useful processes.  To try and investigate the problem I have
been running at commands to run vmstat 15 160 and date; ps -ef; sleep
15 (160 times) to record system activity at 15 second intervals for 40
minutes, i.e. from 4:30 until 5:10.  This has revealed that the problem
is caused by a) nsca processes starting and not being completed (today's
maximum count was 4447) until they all suddenly complete at about 4:50.
During this time vmstat shows that memory usage increases slowly, but it
is all released when the nsca processes run.  About 10 minutes later
there are many separate nagios processes which do not complete (183); as
the nagios process is quite large this fills system memory and swap
space which effectively kills the system.  You might think, given the
time that this is happening, that this is affected by cron, but for this
morning I had retimed cron.daily to run at 10:02 rather than 4:02.  Has
anyone seen anything like this ?  I can say from the master server logs
that no tests seem to be recorded from about 4:00 onwards; if they
system survives they start after that.  Any help would be appreciated.
The server is a blade server with a single CPU but it is running with
hyper-threading on (if that makes a difference); the kernel is
2.6.0-42.0.8

Any suggestions would be appreciated.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] FW: Reports using NDOutils?

2007-02-21 Thread Wheeler, JF (Jonathan)
 -Original Message-
 From: nagios-users On Behalf Of Marcel Mitsuto Fucatu Sugano
 Sent: 15 February 2007 14:58
 
 On Thu, 2007-02-15 at 10:17 +, Wheeler, JF (Jonathan) wrote:
  -Original Message-
  From: nagios-users On Behalf Of Marcel Mitsuto Fucatu Sugano
  Sent: 14 February 2007 19:27
(big snip)
   2) Is it crazy to think I can keep *all* the NDO data 
 forever?  (~500 hosts / 6000+ srvcs)
  
   Well, considering that only state changes matters, it isn't that
crazy.
  
  The only place where I have had to do anything is with the
logentries
  table which (in our case) has written more records than is allowed
by
  MySQL and sometimes generates MySQL errors.  Deleting old entries
solves
  the problem (I have a script that deletes entries more than 6 weeks
  old).
 
 Deleting old entries didn't wacks historical state change data?

I do not see a need to keep the log data for more than six weeks in the
SQL tables; these are separate from the log files on the Nagios server.
Note that there is no cleanup of the Nagios logs (as far as I am aware),
so these need to be cleared out every so often as well.  I have a
separate script which compresses all log files except the last six and
only keeps 190 files (about 6 months of data) in the log archives
directory.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: Reports using NDOutils?

2007-02-15 Thread Wheeler, JF (Jonathan)
-Original Message-
From: nagios-users On Behalf Of Marcel Mitsuto Fucatu Sugano
Sent: 14 February 2007 19:27

 On Wed, 2007-02-14 at 09:58 -0800, Trask wrote:
 Are there any projects, addons, or home-made scripts out there that
 people are using that pulls data from the NDO output and creates
 reports?  I've done a good bit of searching and haven't found
 anything, but it seems like such a logical thing to have that I
figure
 someone has done this already.

 I am waiting for same sort of thing as well. I am doing some
researching
 around NDOUtils too, 'cause I'll need Nagios to watch over Service
Level
 Agreement thresholds. 

I have a PHP script which gets information from the NDOUtils MySQL
tables to display machine status on a web page (we have a home-grown
script which provides a single page display of our farm of 800+
servers).

I also plan to write scripts that will get plain-text output from the
MySQL tables for use when administrators do not have access to the
Nagios web pages

(snip)

 2) Is it crazy to think I can keep *all* the NDO data forever?  (~500
 hosts / 6000+ srvcs)

 Well, considering that only state changes matters, it isn't that
crazy.

The only place where I have had to do anything is with the logentries
table which (in our case) has written more records than is allowed by
MySQL and sometimes generates MySQL errors.  Deleting old entries solves
the problem (I have a script that deletes entries more than 6 weeks
old).

(snip)

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: Memory leaks

2007-01-26 Thread Wheeler, JF (Jonathan)
-Original Message-
From: [EMAIL PROTECTED] On Behalf Of Andreas
Ericsson
Sent: 24 January 2007 11:15

 Tobias Klausmann wrote:
(big snip)
 
 For vanilla Nagios, at least it's clear that in whatever way
 memory is wasted, it also slows Nagios down - a possibility would
 be a linked list that is walked and gets appended over and over.
 But I guess those with knowledge of the inner workings of Nagios
 have more clue about this than I do.

 Anyone wanting to look into it should probably take a look at the
 event scheduling queue.

I have also been experiencing memory leaks, such that the kernel has
been taking drastic action by killing processes starting with nagios and
often including httpd, sshd etc.  This all seems to happen at about 4:45
every morning.  A reboot solves the problem and everything starts up
again, but yesterday I decided to reboot using the single processor
kernel (most of our nodes are dual processor, some are dual core as
well) and there is no sign of a memory leak today !

Does that give anyone any clues ?

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: Memory leaks

2007-01-23 Thread Wheeler, JF (Jonathan)
-Original Message-
From: nagios-users On Behalf Of Tobias Klausmann
Sent: 23 January 2007 15:32

 Nagios 2.6 and 2.5 have memory leaks. They are not that big that
 within hours your machine will be swapping, but they degrade
 performance in other ways.

I have also had problems with memory leaks, such that the kernel
(2.6.0-42.0.3) reaches the stage of killing processes to try to preserve
the system.  In my experience the first processes killed are nagios and
nsca.  Our configuration is relatively large with just under 16,000
services and 750 hosts.  As a consequence we run two slave servers which
run the checks and report to the master; on the master all checks are
passive except local checks.  We have only seen the out of memory
problems on the master.  I had thought that the problems were caused by
NagiosGrapher which we were running, but were not using; certainly the
problem was reduced by removing that process from the mix.  For us the
problem seems to start (according to the message log) at about 4:45 in
the morning, so perhaps there is another factor as well (cron jobs ?).
Any input would be welcome, though I will continue to investigate as I
have time.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: log rotation - how many files are kept?

2007-01-18 Thread Wheeler, JF (Jonathan)
-Original Message-
From: nagios-users On Behalf Of Stijn Gruwier
Sent: 18 January 2007 07:37

 I'm aware that nagios is able to copy the nagios.log file to the 
 archives directory on an hourly/daily/weekly/monthly basis. It seems 
 that nagios keeps that files forever since I've got 11 archived weekly

 logs. But the word 'rotation' suggests that at some time the old ones 
 are removed and replaced by newer logs. Is this the case? I searched
the 
 mailing list and the documentation but I couldn't find the answer.

I also came across this problem and have written a script to organise
our archived logs.  At present I run it manually, but it could be a cron
script.  What is does is to keep up to 180 logs (1 per day for 6
months), but all but most recent 5 are zipped.  The numbers in the
previous sentence are parameters at the head of the script.
Unfortunately the form of the name of the archived logs is not suitable
for processing with standard logrotate.  I am happy to let others use
this script if a) no one has anything better, and b) if someone can tell
me where to submit it.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: NDOUtils and NDO2BD

2006-12-12 Thread Wheeler, JF (Jonathan)
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Sullivan
Sent: 08 December 2006 19:22

 Has anyone created an interface that uses data from the NDOUtils 
 package?  I am have it all setup and logging to MySql.  I am in need
of 
 a simple interface for tier 1 support personnel.

If you are talking about an HEP Tier1 site, then we are also one.  I
have adapted a script that we already had to issue some MySQL queries to
the NDOutils tables.  I would be happy for you to see them if that that
would help.

Jonathan Wheeler
Tier1A Service Team
e-Science Centre
Rutherford Appleton Laboratory

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] FW: FW: NDOUtils and NDO2BD

2006-12-12 Thread Wheeler, JF (Jonathan)
 -Original Message-
 From: Jeff Sullivan [mailto:[EMAIL PROTECTED] 
 Sent: 12 December 2006 14:01
 To: Wheeler, JF (Jonathan)
 Subject: Re: [Nagios-users] FW: NDOUtils and NDO2BD
 
 
 That would be great.. Those darn NDO2DB tables really need a 
 dictionary.map..

These queries are extracts from a PHP script; note that the database
name nagios is included because there are other queries in the script
using a different database.  The first query is checking for the status
of a host (host name in $SHORT[$n]):

$got = mysql_query(select last_state_change from nagios.ndo_hoststatus,
.
nagios.ndo_objects where host_object_id=object_id and .
name1='.$SHORT[$n].' and
nagios.ndo_hoststatus.current_state=1);
if ($got and mysql_num_rows($got)) {
  $st = down; $txt = $node ($st - Nagios);

All it is doing is checking if there is a record in ndo_hoststatus for
the host where ndo_hoststatus is 1; host_object_id is a field in
ndo_hoststatus which matches object_id in ndo_objects; name1 is the name
of the host from ndo_objects

The second query is doing something similar for alarms for hosts which
are not down:

# Check for Nagios alarms if system is not down
  if (strncmp($st, down, 4) != 0) {
$got = mysql_query(select output from nagios.ndo_servicestatus, .
nagios.ndo_objects where objecttype_id=2 and
name1='.$SHORT[$n].'.
 and current_state=2 and service_object_id=object_id);
if ($got and mysql_num_rows($got)) { $st .= _a; }

In this query object_id, objecttype_id and name1 are fields in
ndo_objects (you need both objecttype_id and name1 because there is a
multi-field index built on objecttype_id and name1 in that order);
current_state and service_object_id are fields in ndo_servicestatus

This third query is extracting all the alarms for a host (this is a
different script so $SHORT is not an array here):

$got = mysql_query(select current_state, output, unix_timestamp() - .
unix_timestamp(last_hard_state_change) from nagios.ndo_objects, .
nagios.ndo_servicestatus where current_state!=0 and .
service_object_id=object_id and name1='.$SHORT.');
if ($got and mysql_num_rows($got)) {
  print div class=\sub\Alarms for
.htmlspecialchars($NODE)./div\n;
  $warns = $crits = $unkns = ;
  while ($r = mysql_fetch_row($got)) {
$txt1 = tr valign=\top\tdspan class=;
$txt2 = /span/td td$r[1]/td td nowrapspan
class=\time\.
prettytime($r[2])./span/td/tr\n;
switch ($r[0]) {
case 1: $warns .= $txt1.\warn\WARNING.$txt2; break;
case 2: $crits .= $txt1.\crit\CRITICAL.$txt2; break;
case 3: $unkns .= $txt1.\unkn\UNKNOWN.$txt2; break;
default: $unkns .= $txt1.\unkn\UNKNOWN (bad type).$txt2;
}
  }
  print divtable border=\0\ cellpadding=\0\ cellspacing=\2\ .
width=\100%\\n$crits$warns$unkns/table/div\n;
}

I hope that this helps.  Please ask for more explanations if required.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Viewing a subset of systems checked by Nagios

2006-10-25 Thread Wheeler, JF (Jonathan)
I have a Nagios installation with ~700 hosts and ~11000 services.  What
we would like to do is to allow some users to view (via the web
interface) only a subset of the systems being monitored; in the current
instance this is just one host, but there could be other instances
requiring a number of hosts .  Is this possible ?  I suspect not, but
any comments would be useful.  I am aware of the possibility that most
users who have access to the web view should not have the ability to run
commands.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Archive logs

2006-09-06 Thread Wheeler, JF (Jonathan)
What do people do about their archive logs ?  I am running a
configuration on Scientific Linux with nearly 600 hosts and 13000+
services which generates quite large log files.  As far as I can tell
the logs are moved to the archive and retained there indefinitely; my
/var partition is now getting quite full.  I have tried using logrotate,
but the log file names do not seem to allow logrotate to work correctly.
A browse through the mailing list archives does not show anyone else
asking about this problem.  Any suggestions ?

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Question about NRPE configuration file

2006-08-16 Thread Wheeler, JF (Jonathan)
 -Original Message-
 From: Jason Martin [mailto:[EMAIL PROTECTED] 
 Sent: 14 August 2006 17:11
 
 On Mon, Aug 14, 2006 at 11:25:40AM +0100, Wheeler, JF (Jonathan)
wrote:
  I am in the process of migrating from a configuration with a single
  Nagios server to one with a master and a slave server.  As part of
this
  migration I have updated the NRPE configuration that is installed on
the
  clients to include both hosts as allowed_hosts for NRPE calls.
However
  I noticed that at NRPE restart, the following messages are issued:
  
  Aug 14 09:24:15 NODENAME nrpe[2592]: Unknown option specified in
config
  file '/etc/nagios/nrpe.cfg' - Line 41 
  Aug 14 09:24:15 NODENAME nrpe: nrpe startup succeeded
  Aug 14 09:24:15 NODENAME nrpe[2593]: Starting up daemon
  Aug 14 09:24:15 NODENAME nrpe[2593]: Warning: Daemon is configured
to
  accept command arguments from clients!
  
  Line 41 of /etc/nagios/nrpe.cfg is the allowed_hosts line which
reads
  allowed_hosts=III.III.III.111, III.III.III.222
 Try removing the space after the comma.
 
 -Jason Martin

Thanks for the suggestion.  I tried removing the space, but the message
remains.  However it is clear from testing the clients that both
configurations work anyway !  You may (all) gather that I am just
starting with distributed monitoring, so I am learning as I go.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Question about NRPE configuration file

2006-08-14 Thread Wheeler, JF (Jonathan)
I am in the process of migrating from a configuration with a single
Nagios server to one with a master and a slave server.  As part of this
migration I have updated the NRPE configuration that is installed on the
clients to include both hosts as allowed_hosts for NRPE calls.  However
I noticed that at NRPE restart, the following messages are issued:

Aug 14 09:24:15 NODENAME nrpe[2592]: Unknown option specified in config
file '/etc/nagios/nrpe.cfg' - Line 41 
Aug 14 09:24:15 NODENAME nrpe: nrpe startup succeeded
Aug 14 09:24:15 NODENAME nrpe[2593]: Starting up daemon
Aug 14 09:24:15 NODENAME nrpe[2593]: Warning: Daemon is configured to
accept command arguments from clients!

Line 41 of /etc/nagios/nrpe.cfg is the allowed_hosts line which reads
(with context):

# ALLOWED HOST ADDRESSES
# This is a comma-delimited list of IP address of hosts that are allowed
# to talk to the NRPE daemon.
#
# NOTE: The daemon only does rudimentary checking of the client's IP
#   address.  I would highly recommend adding entries in your
#   /etc/hosts.allow file to allow only the specified host to
connect
#   to the port you are running this daemon on.
#
# NOTE: This option is ignored if NRPE is running under either inetd or
xinetd

allowed_hosts=III.III.III.111, III.III.III.222

where III.III.III.111 and III.III.III.222 are the IP addresses of the
Nagios servers

Is the error message (from /var/log/messages) misleading ?  Or is there
an error in the configuration ?

Any help would be appreciated

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null