Re: [Nagios-users] Monitoring large (ish) numbers of servers with exceptions to the rules...

2008-06-17 Thread Anthony Montibello
Hi,

Using REgExp and Object Templats is a key for optimizing maintenance.

I read some good details on handling what needs to be configured and what
can be inhereted and automatically associated in the current Nagios 3
Documentation.   I think much of the framework was in Nagios 2, but the
documentaiton is a bit easier to read in nagios 3 so look at that for some
tips. then check the nagios 2 docs to see if the option is also in there.

A few years ago I converted a nagios 1.2 were all hosts and services were
defined in a single to file to a scalable configuration similar to what was
initialy described here.

I found that if you have a need of suporting different clients with daily
changes it was convient to have one Config directory for each clinet then in
that directory have a single host file, and for each host a seperate Config
file.

on a host being removed it is just a matter of removing it from the Host
file configuration and renaming its Config file.
on adding a new host is was only adding it to the host file, then adding
copy an existing service file and then cut and past to get all the services
defined.

then maintain the entire directory substructer through CVS or some other
version controle.
This as noted does get tedious to maintain, but it alows for customization
of services per host without much thinking.
The Disadvantage of this is the time involved for maintaining,  when there
are few changes getting made.

OTHER options using templates work well,
setting up Inheritance, using REG EXP as well as , other techniques using
HostGroups all assist with orginizing the files but depending on skill
levels  somtimes lead to less readability (Whle for other admins it would
lead to easier maintenance)

Hope this helps,


On Tue, Jun 17, 2008 at 8:22 AM, Wheeler, JF (Jonathan) <
[EMAIL PROTECTED]> wrote:

> > -Original Message-
> > From: nagios-users On Behalf Of Matthew Macdonald-Wallace
> > Sent: 17 June 2008 13:14
> >
> > I currently help maintain and monitor around 50 servers across various
> > parts of the UK using Nagios 2.  At the moment, we have a
> configuration
> > file for each host (%hostname%.cfg) and in that file we specify all
> the
> > services for the named host.
> >
> > We are trying to reduce the number of configuration files as we take
> on
> > more and more servers because there are a large number checks that we
> > need to be rolled out to all servers and we feel that we are
> > duplicating our workload.
> >
> > I'm open to ideas on how to achieve this however my thoughts were a
> > setup along the lines of the following:
> >
> >  - A "master" host template is created in which all services are
> defined
> >for a host.
> >
> >  - If a check does not need to be run for a given host (for example it
> >is not a web server), a stanza is added to that particular host's
> >config file that effectively tells nagios "don't check for this
> >service on this host"
> >
> > I've tried defining all the services in a master templates file and
> > this works perfectly however when I come to exclude certain services,
> I
> > am at a loss on how to do it.
> >
> > Initially I tried adding a stanza with the same service name and
> > "register 0" as one of the options, however this didn't work.
> >
> > We have used HostGroups in the past to achieve a similar goal, however
> > we ran into the issue that whilst we need to check the CPU Usage on
> all
> > of the servers, a few of the servers that we monitor can take a lot
> > more of a beating than the majority.  This lead to us defining the CPU
> > checks on a per-host basis as if we defined it separately from the
> > hostgroup for the more powerful servers we presented with a load of
> > errors regarding duplicate service names.
> >
> > I hope I've made myself clear on what we're after and I look forward
> to
> > receiving your input on this.
>
> One thing that I use in the configuration that I maintain is to have
> something like this:
>
> define service{
>use generic-hung-mounts
>hostgroup_name  experiments
>hosts   !lfc0448
>contact_groups  experiments
> }
>
> where "lcg0448" is a host in host group "experiments" and I want to
> apply the "generic-hung-mounts" check to all hosts in that group except
> for "lcg0448".
>
> This can lead to configuration like this:
>
> define service{
>use check-pbs-offline
>hostgroup_name  workers
>hosts   !lcg0614,!lcg0617,!lcg0618,!lcg0626
>contact_groups  tier1a
> }
> define service{
>use check-pbs-offline
>hosts   lcg0614,lcg0617,lcg0618,lcg0626
>contact_groups  tier1a,grid-team
> }
>
> where the only difference is that the hosts in the second definition
> have a second contact group.
>
> HTH
>
> Jonathan Wheeler
> e-Science Centre
> Rutherford Appleton Labor

Re: [Nagios-users] Monitoring large (ish) numbers of servers with exceptions to the rules...

2008-06-17 Thread Wheeler, JF (Jonathan)
> -Original Message-
> From: nagios-users On Behalf Of Matthew Macdonald-Wallace
> Sent: 17 June 2008 13:14
> 
> I currently help maintain and monitor around 50 servers across various
> parts of the UK using Nagios 2.  At the moment, we have a
configuration
> file for each host (%hostname%.cfg) and in that file we specify all
the
> services for the named host.
> 
> We are trying to reduce the number of configuration files as we take
on
> more and more servers because there are a large number checks that we
> need to be rolled out to all servers and we feel that we are
> duplicating our workload.
> 
> I'm open to ideas on how to achieve this however my thoughts were a
> setup along the lines of the following:
> 
>  - A "master" host template is created in which all services are
defined
>for a host.
> 
>  - If a check does not need to be run for a given host (for example it
>is not a web server), a stanza is added to that particular host's
>config file that effectively tells nagios "don't check for this
>service on this host"
> 
> I've tried defining all the services in a master templates file and
> this works perfectly however when I come to exclude certain services,
I
> am at a loss on how to do it.
> 
> Initially I tried adding a stanza with the same service name and
> "register 0" as one of the options, however this didn't work.
> 
> We have used HostGroups in the past to achieve a similar goal, however
> we ran into the issue that whilst we need to check the CPU Usage on
all
> of the servers, a few of the servers that we monitor can take a lot
> more of a beating than the majority.  This lead to us defining the CPU
> checks on a per-host basis as if we defined it separately from the
> hostgroup for the more powerful servers we presented with a load of
> errors regarding duplicate service names.
> 
> I hope I've made myself clear on what we're after and I look forward
to
> receiving your input on this.

One thing that I use in the configuration that I maintain is to have
something like this:

define service{
use generic-hung-mounts
hostgroup_name  experiments
hosts   !lfc0448
contact_groups  experiments
}

where "lcg0448" is a host in host group "experiments" and I want to
apply the "generic-hung-mounts" check to all hosts in that group except
for "lcg0448".

This can lead to configuration like this:

define service{
use check-pbs-offline
hostgroup_name  workers
hosts   !lcg0614,!lcg0617,!lcg0618,!lcg0626
contact_groups  tier1a
}
define service{
use check-pbs-offline
hosts   lcg0614,lcg0617,lcg0618,lcg0626
contact_groups  tier1a,grid-team
}

where the only difference is that the hosts in the second definition
have a second contact group.

HTH

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Monitoring large (ish) numbers of servers with exceptions to the rules...

2008-06-17 Thread Matthew Macdonald-Wallace
Hi All,

I currently help maintain and monitor around 50 servers across various
parts of the UK using Nagios 2.  At the moment, we have a configuration
file for each host (%hostname%.cfg) and in that file we specify all the
services for the named host.

We are trying to reduce the number of configuration files as we take on
more and more servers becuase there are a large number checks that we
need to be rolled out to all servers and we feel that we are
duplicating our workload.

I'm open to ideas on how to achieve this however my thoughts were a
setup along the lines of the following:

 - A "master" host template is created in which all services are defined
   for a host.

 - If a check does not need to be run for a given host (for example it
   is not a web server), a stanza is added to that particular host's
   config file that effectively tells nagios "don't check for this
   service on this host"

I've tried defining all the services in a master templates file and
this works perfectly however when I come to exclude certain services, I
am at a loss on how to do it.

Initially I tried adding a stanza with the same service name and
"register 0" as one of the options, however this didn't work.

We have used HostGroups in the past to achieve a similar goal, however
we ran into the issue that whilst we need to check the CPU Usage on all
of the servers, a few of the servers that we monitor can take a lot
more of a beating than the majority.  This lead to us defining the CPU
checks on a per-host basis as if we defined it separately from the
hostgroup for the more powerful servers we we presented with a load of
errors regarding duplicate service names.

I hope I've made myself clear on what we're after and I look forward to
receiving your input on this.

Kind regards,

Matt
-- 
Matt Wallace
[EMAIL PROTECTED]
http://www.truthisfreedom.org.uk/

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null