[Nagios-users] nagios future?

2009-06-09 Thread Meyer Jerome
Hi

 

As somebody already heard about icinga http://www.icinga.org?

 

Now, we planned to install nagios on a productiv server and nagios is
very fine and we're happy with it.

I just want to have your point of view about this new products and about
the future of nagios?

 

Best regards

Jerome

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] nagios future?

2009-06-09 Thread Andreas Ericsson
Meyer Jerome wrote:
 Hi
 

Hi there!

  
 
 As somebody already heard about icinga http://www.icinga.org?
 

Yes. It was discussed quite a lot some few weeks back on the
nagios-devel mailing list. Browse the archives for the full
discussion.

  
 
 Now, we planned to install nagios on a productiv server and nagios is
 very fine and we're happy with it.
 
 I just want to have your point of view about this new products and about
 the future of nagios?
 

The future of Nagios is looking quite bright. In all honesty, that is
in part thanks to the Icinga fork, which has sparked a flurry of activity
within the Nagios developer community.

First of all, we'll be releasing 3.1.1 soon, containing a plethora of
bug- and performance fixes. Ethan's working on automating the release
process so that Ton and I can cut releases without having to update a
bunch of webpages, sourceforge downloads area, documentation, etc, etc.
3.1.1 will be the first live test of that automated process. If it drags
out another week or so though, we'll probably just go ahead and do it
manually anyway, as 3.1.1 really has a lot of important fixes that the
Nagios users really should get their hands on.

Nagios will get a new GUI, dubbed Ninja sometime during or after the
summer. Ninja is available for download already and is usable but has
some warts and is still incomplete according to Ninja maintainer Per
Åsberg. You can find out more about it at
http://www.op5.org/community/projects/ninja. This was announced at the
Nordic Meet on Nagios which was held in Stockholm just last week. Note
that it's not necessarily easy to install yet as it's still a work in
progress. Bug-reports or enhancement requests are ofcourse very welcome,
and documentation patches for the installation procedures even more so.

Hope that answers your questions :-)

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Standard Nagios CGI Usage Documentation

2009-06-09 Thread Matthew Jurgens
So I take it documentation is not a big thing for people on this list 
...


I have started some documentation available at
http://www.smartmon.com.au/docs/

The starting page for this specific part of the documentation is:
http://www.smartmon.com.au/docs/tiki-index.php?page=Monitoring%20Operations%20%E2%80%93%20Using%20The%20Nagios%20Web%20Interfacestructure=User%20Guide

I will be continuing to add to it and hope it helps someone. If you have 
any suggestions or submissions let me know.


Matthew Jurgens wrote:
Has anyone every come across some documentation that is aimed at new 
Nagios users that describes how to use the standard CGI interface, 
explains concepts of acknowledgements, downtime, etc etc?


--
Smartmon System Monitoring http://www.smartmon.com.au
www.smartmon.com.au


--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get



___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


--
Smartmon System Monitoring http://www.smartmon.com.au
www.smartmon.com.au
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Recovery notifications after escalations

2009-06-09 Thread Ulf Karlsson
Hi,

We have a a situation here where we would like notify the on-call
group after 60 minutes and the support group after 240 minutes. If
services go down and then recover, everyone who has received a
notification of a host problem should also receive the recovery
notification. See the configuration below.

Now, our problem is that when the second escalation has been activated
and the support group has received the notification, only the support
group will receive the recovery notification - the on-call group will
never see the recovery notification.

We do not want to send out multiple notifications to the on-call group
four the same issue since they then would be spammed by Nagios
unnecessarily.

define host{
namegeneric-host
...
contactsroot ; This will be stored in
a local mailbox that no one sees
notification_interval   60
notification_optionsd,u,r
register0
}

# First escalation for on call group (notification after 60 minutes)

define hostescalation{
host_name   *
first_notification  2
last_notification   2
notification_interval   180
contact_groups  on-call
}
# Second escalation for support group (notification after 240 minutes)
# Problem: when this escalation has been activated, on-call does not
receive recovery notifications anymore
# (we do not want to send multiple notifications about the same
problem to on-call)

define hostescalation{
host_name   *
first_notification  3
last_notification   3
notification_interval   0
contact_groups  support
}

Is it possible to achieve what we want using escalations?

Best regards,
Ulf Karlsson

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Check_NRPE

2009-06-09 Thread Sebastian Gosenheimer - proIO Network eSolutions e.K.
Hi Eduardo,

is the nrpe-daemon started and installed correctly on the host? Do you 
have set the ip address from your nagios server in the nrpe.cfg 
(allowed_host).

Mit freundlichen Grüßen / With kind regards,

Sebastian Gosenheimer

Eduardo Barreto schrieb:
 HI,
 
 When try to check a service on a remote host, this message appears 
 CHECK_NRPE: Error receiving data from daemon. What might it be?
 Does anybody knows what should I do?
 
 Thanks in advance
 
 Eduardo
 
 
 
 
 --
 Crystal Reports - New Free Runtime and 30 Day Trial
 Check out the new simplified licensing option that enables unlimited
 royalty-free distribution of the report engine for externally facing 
 server and web deployment.
 http://p.sf.net/sfu/businessobjects
 
 
 
 
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null


Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. 
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten 
haben, 
informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. 
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind nicht 
gestattet. 

This e-mail may contain confidential and/or privileged information. 
If you are not the intended recipient (or have received this e-mail in error) 
please notify 
the sender immediately and destroy this e-mail.  
Any unauthorized copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden. 



--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nagios future?

2009-06-09 Thread Michael Friedrich

Hi,

Andreas Ericsson wrote the following on 09.06.2009 11:10:

Yes. It was discussed quite a lot some few weeks back on the
nagios-devel mailing list. Browse the archives for the full
discussion.
  
try 
http://sourceforge.net/mailarchive/message.php?msg_id=E03A84B43BAE443888372020DDBFE0E7%40int.consol.de


If you're interested in reading a bit more, try

http://sourceforge.net/mailarchive/forum.php?forum_name=icinga-users
http://sourceforge.net/mailarchive/forum.php?forum_name=icinga-devel



The future of Nagios is looking quite bright. In all honesty, that is
in part thanks to the Icinga fork, which has sparked a flurry of activity
within the Nagios developer community.
  
And also popping up many community based sites, beside the existing 
ones. Not that bad, but a bit misleading for new users imho. But let's 
see how it resolves in a bit.
Hopefully Nagios will be on GIT soon to merge knowledge from both 
projects together. Dunno what plans are going on concerning the NDO and 
other similar core parts but I think there's much potential to share 
ideas and kniowledge between Nagios and Icinga.

First of all, we'll be releasing 3.1.1 soon, containing a plethora of
bug- and performance fixes. Ethan's working on automating the release
process so that Ton and I can cut releases without having to update a
bunch of webpages, sourceforge downloads area, documentation, etc, etc.
3.1.1 will be the first live test of that automated process. If it drags
out another week or so though, we'll probably just go ahead and do it
manually anyway, as 3.1.1 really has a lot of important fixes that the
Nagios users really should get their hands on.
  
It would be great to mention that all even releases are stable while odd 
remains testing. On nagios.org 3.1.0 is only mentioned as latest 
version and after clicking the download link it is marked as testing - 
bit confusing, but not really a problem for experienced users.

Nagios will get a new GUI, dubbed Ninja sometime during or after the
summer. Ninja is available for download already and is usable but has
some warts and is still incomplete according to Ninja maintainer Per
Åsberg. You can find out more about it at
http://www.op5.org/community/projects/ninja. This was announced at the
Nordic Meet on Nagios which was held in Stockholm just last week. Note
that it's not necessarily easy to install yet as it's still a work in
progress. Bug-reports or enhancement requests are ofcourse very welcome,
and documentation patches for the installation procedures even more so.
  
By announcing Ninja as new GUI, the rumors get into Merlin for DB usage. 
I've read several posts about that but my question is, how far would 
that be realistic? For what I know Merlin uses the libdbi (just as 
modified IDOUtils for Icinga) so it would be possible to use different 
db types. Are there any plans to realize that? :-)


Kind regards,
Michael

--
DI (FH) Michael Friedrich
michael.friedr...@univie.ac.at
Tel: +43 1 4277 14359

Vienna University Computer Center
Universitaetsstrasse 7 
A-1010 Vienna, Austria  

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] nagios -- ndo2db -- centreon

2009-06-09 Thread James Pifer
 Thanks for all the help!!! Was able to get everything working. 
 
 James

I spoke a little bit too soon. Although a couple of the hosts showed up,
that's all I can get to work. I've added more, and the nagios configs
get updated, yet nagios doesn't show any changes. So I'm missing
something on the nagios/configuration side.

If I take a bracket out of one of the configs, then nagios won't
restart, so I know it's reading these cfg files. In the files
below, srv-xen02.mydomain.com and srv-xen03.mydomain.com are working,
but none of the others are. 

If I remove one of these hosts (using centreon frontend) it removes it
from the configs, nagios is restarted, but nagios is not updated. It
still shows the same two hosts.

Anyone know what I might be missing? Here are some of my configs. The
contacts listed in these configs all exist. 

Thanks,
James

For example, my config files are at /etc/nagios
#
# cat hostgroups.cfg
define hostgroup{
hostgroup_name  Linux_Servers
alias   All linux servers
members srv-xen02.mydomain.com, 
srv-xen03.mydomain.com, srv-xen04.mydomain.com, srv-xen05.mydomain.com
}

define hostgroup{
hostgroup_name  MY_routers
alias   MY routers
members SLW-E11.mydomain.com
}



# cat hosts.cfg

define host{
namegeneric-host
alias   generic-host
check_command   check_host_alive
max_check_attempts  5
active_checks_enabled   1
passive_checks_enabled  0
check_period24x7
contact_groups  netcool, Supervisors
notification_interval   0
notification_period 24x7
notification_optionsd,r
notifications_enabled   0
register0
}

define host{
nameServers-Linux
use generic-host
alias   Linux Servers
register0
}

define host{
host_name   srv-xen02.mydomain.com
use Servers-Linux
alias   srv-xen02
address 192.168.4.152
hostgroups  Linux_Servers
check_command   check_host_alive
max_check_attempts  10
check_interval  1
active_checks_enabled   1
passive_checks_enabled  1
check_period24x7
obsess_over_host0
check_freshness 0
flap_detection_enabled  0
process_perf_data   0
retain_status_information   0
retain_nonstatus_information0
contact_groups  netcool
notification_interval   1
notification_period 24x7
notification_optionsd,u
notifications_enabled   1
}

define host{
host_name   srv-xen03.mydomain.com
use Servers-Linux
alias   srv-xen03
address 192.168.4.153
hostgroups  Linux_Servers
check_command   check_host_alive
max_check_attempts  10
check_interval  1
active_checks_enabled   0
passive_checks_enabled  0
check_period24x7
obsess_over_host0
check_freshness 0
flap_detection_enabled  0
process_perf_data   0
retain_status_information   0
retain_nonstatus_information0
contact_groups  netcool
notification_interval   1
notification_period 24x7
notification_optionsd,u
notifications_enabled   1
}

define host{
host_name   srv-xen04.mydomain.com
use Servers-Linux
alias   srv-xen04
address 192.168.4.154
hostgroups  Linux_Servers
check_command   check_host_alive
max_check_attempts  10
check_interval   

[Nagios-users] disk IO for windows?

2009-06-09 Thread dave stern - e-mail.pluribus.unum
Anyone know of a plug-in or mechanism to log local disk I/O on windows?

My nagios server is currently using check_nt to connect to windows hosts
via nsclient++. I was hoping perhaps COUNTER has something buried
within it to pull down this info.

TIA

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nagios future?

2009-06-09 Thread Andreas Ericsson
Michael Friedrich wrote:
 Hi,
 
 Andreas Ericsson wrote the following on 09.06.2009 11:10:
 Yes. It was discussed quite a lot some few weeks back on the
 nagios-devel mailing list. Browse the archives for the full
 discussion.
   
 try 
 http://sourceforge.net/mailarchive/message.php?msg_id=E03A84B43BAE443888372020DDBFE0E7%40int.consol.de
  
 
 
 If you're interested in reading a bit more, try
 
 http://sourceforge.net/mailarchive/forum.php?forum_name=icinga-users
 http://sourceforge.net/mailarchive/forum.php?forum_name=icinga-devel
 

Thanks for those links. I'm far too lazy to look them up myself ;-)


 The future of Nagios is looking quite bright. In all honesty, that is
 in part thanks to the Icinga fork, which has sparked a flurry of activity
 within the Nagios developer community.
   
 And also popping up many community based sites, beside the existing 
 ones. Not that bad, but a bit misleading for new users imho. But let's 
 see how it resolves in a bit.
 Hopefully Nagios will be on GIT soon to merge knowledge from both 
 projects together. Dunno what plans are going on concerning the NDO and 
 other similar core parts but I think there's much potential to share 
 ideas and kniowledge between Nagios and Icinga.

Nagios will move to git when 3.2.0 is out the door. Ethan wants some
time to manage patches and stuff like he's used to without having to
learn another tool. I'm sure he'll curse himself for not switching
sooner when he learns the benefits of git, but at least we're getting
there.

One of the annoying things about the icinga-fork though is that they've
mainly done a lot of renaming and not so much actual patching. This will
ofcourse merge cleanly but in an unsatisfactory way for Nagios. Messy,
but certainly possible to work around.

 First of all, we'll be releasing 3.1.1 soon, containing a plethora of
 bug- and performance fixes. Ethan's working on automating the release
 process so that Ton and I can cut releases without having to update a
 bunch of webpages, sourceforge downloads area, documentation, etc, etc.
 3.1.1 will be the first live test of that automated process. If it drags
 out another week or so though, we'll probably just go ahead and do it
 manually anyway, as 3.1.1 really has a lot of important fixes that the
 Nagios users really should get their hands on.
   
 It would be great to mention that all even releases are stable while odd 
 remains testing. On nagios.org 3.1.0 is only mentioned as latest 
 version and after clicking the download link it is marked as testing - 
 bit confusing, but not really a problem for experienced users.

Oh, right. I'd actually forgotten that.

 Nagios will get a new GUI, dubbed Ninja sometime during or after the
 summer. Ninja is available for download already and is usable but has
 some warts and is still incomplete according to Ninja maintainer Per
 Åsberg. You can find out more about it at
 http://www.op5.org/community/projects/ninja. This was announced at the
 Nordic Meet on Nagios which was held in Stockholm just last week. Note
 that it's not necessarily easy to install yet as it's still a work in
 progress. Bug-reports or enhancement requests are ofcourse very welcome,
 and documentation patches for the installation procedures even more so.
   
 By announcing Ninja as new GUI, the rumors get into Merlin for DB usage. 
 I've read several posts about that but my question is, how far would 
 that be realistic?


Very realistic. We're already using it for development to that purpose,
and it's working just fine. One problem with NDOUtils is that the database
schema makes it impossible to write stuff for it that scale linearly.
That's totally unacceptable for us, so we had to come up with something
new. Fortunately, Lars Hjemli of the NagVis project has been very friendly
and cooperative in helping us add support for the Merlin database schema
in NagVis. Given how simple the Merlin schema is, I have no doubt that
we'll provide patches to other projects to achieve the same thing.

 For what I know Merlin uses the libdbi (just as 
 modified IDOUtils for Icinga) so it would be possible to use different 
 db types. Are there any plans to realize that? :-)
 

It's been planned, implemented, tested and available since 2009-03-17.
Additional bugfixes happened later, but libdbi has been in use in
Merlin almost three months now.

I'm working (but very slowly) on some patches to address the multiple
memory allocations required to use libdbi for quoting strings etc,
since it prevents us from using a static arena to do the quoting etc
in, but that will take a while to complete so we're living with that
microscopic deficiency for now.

$ git show 084cdc85
commit 084cdc85d7b0c8a4f721804476979e904e4afe7a
Author: Andreas Ericsson a...@op5.se
Date:   Tue Mar 17 10:44:47 2009 +0100

Use libdbi for database abstraction

In some ways it's worse, since we're now forced to allocate
and deallocate a lot of memory for each request, but in other

[Nagios-users] Problems with a parameter when executing check_procs via check_by_ssh

2009-06-09 Thread Stefan-Michael Guenther
Hi,

I want to execute check_procs via check_by_ssh with the following command:

./check_by_ssh -H 172.24.1.70 -t 120 -C /usr/local/bin/check_procs -C 
zeiterf -c 1:1 -a './zeiterf -z'

The result is the following error:

Remote command execution failed: /usr/local/bin/check_procs: option 
requires an argument -- z

The problem is the parameter -z because after removing it, I get the 
expected result.

./check_by_ssh -H 172.24.1.70 -t 120 -C /usr/local/bin/check_procs -C 
zeiterf -c 1:1 -a './zeiterf'

PROCS CRITICAL: 2 processes with command name 'zeiterf', args './zeiterf'

Does anyone know how to included the parameter correctly?

Thanks for your help,

Stefan


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] disk IO for windows?

2009-06-09 Thread Andreas Ericsson
dave stern - e-mail.pluribus.unum wrote:
 Anyone know of a plug-in or mechanism to log local disk I/O on windows?
 
 My nagios server is currently using check_nt to connect to windows hosts
 via nsclient++. I was hoping perhaps COUNTER has something buried
 within it to pull down this info.
 

There are indeed counters for that, but due to Microsoft's stupidity the
counter-names are different depending on which base-language you've
used for your windows servers.

I don't know what they're named for english platforms (or any other
for that matter), but you should be able to view them with that thing
you can pop up when pressing ctrl-alt-del (task manager or whatever it's
called).

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Recovery notifications after escalations

2009-06-09 Thread Marcus Rejås
On 06/09 12:06, Ulf Karlsson wrote:
 Hi,
 
 We have a a situation here where we would like notify the on-call
 group after 60 minutes and the support group after 240 minutes. If
 services go down and then recover, everyone who has received a
 notification of a host problem should also receive the recovery
 notification. See the configuration below.
 
 Now, our problem is that when the second escalation has been activated
 and the support group has received the notification, only the support
 group will receive the recovery notification - the on-call group will
 never see the recovery notification.

 We do not want to send out multiple notifications to the on-call group
 four the same issue since they then would be spammed by Nagios
 unnecessarily.

I don't (at least not yet) have a good answer. But maybe I can put some ideas
in your head.

My first thought is that if they want the recovery notification maybe they
would not mind the extra one either. The extra one actually tells them that
the issue was escalated and might be useful information. If they don't want
the issue to escalate, they should acknowledge it (sticky).

In order do fix it to work like you asks I have two suggestions. None of them
is good.

If you do not have that many contacts, create an additional one for each
member in the on-call with only recovery-alerts and put them in a group, e.g.
on-call-recovery and escalate to that one. They will now get the recovery
notification.

An other alternative is to modify your notification-command to take notice of
the macros $SERVICENOTIFICATIONNUMBER$ and maybe $HOSTNOTIFICATIONNUMBER$ and
build the logic you wish. Make sure to do it right so you don't miss
important notifications.

But, as I said, I don't like any of the ideas. There are very smart people on
this list and someone will probably give you some more advice.

Regards,

  /Marcus


-- 
Marcus Rejås  jabber:   mar...@jabber.rejas.se  ,= ,-_-. =. 
Rejås Datakonsult e-mail:   mar...@rejas.se((_/)o o(\_))
Kaserngatan 1 web:  http://www.rejas.se `-'(. .)`-' 
s-761 46 Norrtäljegpg-key:  http://gpg.rejas.se \_/ 

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nagios future?

2009-06-09 Thread Michael Friedrich
Hi,


Andreas Ericsson wrote the following on 09.06.2009 15:09:
 Thanks for those links. I'm far too lazy to look them up myself ;-)

So do I - Oracle makes me kind of crazy ;-)

 Nagios will move to git when 3.2.0 is out the door. Ethan wants some
 time to manage patches and stuff like he's used to without having to
 learn another tool. I'm sure he'll curse himself for not switching
 sooner when he learns the benefits of git, but at least we're getting
 there.
Well some common aliases from cvs for git will help too ;-) I've been 
looking onto git for about 3 weeks and I like to use this cheatsheet a 
lot: http://ktown.kde.org/~zrusin/git/git-cheat-sheet-medium.png

 One of the annoying things about the icinga-fork though is that they've
 mainly done a lot of renaming and not so much actual patching. This will
 ofcourse merge cleanly but in an unsatisfactory way for Nagios. Messy,
 but certainly possible to work around.
Yep that is true but to say Hey it's like Nagios but not the same all 
names had to be removed/changed. But concerning merging patches it 
shouldn't be that big problem. Current Nagios patches have been pulled 
over and merged into actual Icinga source. So backwards it should work 
then too.

 Very realistic. We're already using it for development to that purpose,
 and it's working just fine. One problem with NDOUtils is that the 
 database
 schema makes it impossible to write stuff for it that scale linearly.
 That's totally unacceptable for us, so we had to come up with something
 new. Fortunately, Lars Hjemli of the NagVis project has been very 
 friendly
 and cooperative in helping us add support for the Merlin database schema
 in NagVis. Given how simple the Merlin schema is, I have no doubt that
 we'll provide patches to other projects to achieve the same thing.
Yeah i like that move because everyone is holding back on the DB schema 
of the NDO which is far too normalized and doesn't scale. And my biggest 
concern right now, Oracle limits table and column names to max 30 
characters (varchar2(30)). Maybe you'll keep an eye on that while 
testing your schema.

 It's been planned, implemented, tested and available since 2009-03-17.
 Additional bugfixes happened later, but libdbi has been in use in
 Merlin almost three months now.
Ok good to hear that - some query normalizations and other database 
specific stuff will pop up for sure. I've been hitting on the 
libdbi-driver for Oracle and it seems to work (connection using the 
IDOUtils to remote Oracle-Server). When everything works out I hope to 
push source for libdbi Oracle soon to Icinga IDOUtils. Even though IDO 
and Merlin are different, but I think  hope libdbi knowledge can be 
shared in this case :)

Kind regards,
Michael


 I'm working (but very slowly) on some patches to address the multiple
 memory allocations required to use libdbi for quoting strings etc,
 since it prevents us from using a static arena to do the quoting etc
 in, but that will take a while to complete so we're living with that
 microscopic deficiency for now.

 $ git show 084cdc85
 commit 084cdc85d7b0c8a4f721804476979e904e4afe7a
 Author: Andreas Ericsson a...@op5.se
 Date:   Tue Mar 17 10:44:47 2009 +0100

Use libdbi for database abstraction
   In some ways it's worse, since we're now forced to allocate
and deallocate a lot of memory for each request, but in other
ways it's pure win as we can now let users use whatever
database type they want.
   Signed-off-by: Andreas Ericsson a...@op5.se


-- 
DI (FH) Michael Friedrich
michael.friedr...@univie.ac.at
Tel: +43 1 4277 14359

Vienna University Computer Center
Universitaetsstrasse 7 
A-1010 Vienna, Austria  


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Recovery notifications after escalations

2009-06-09 Thread Andreas Ericsson
Marcus Rejås wrote:
 On 06/09 12:06, Ulf Karlsson wrote:
 Hi,

 We have a a situation here where we would like notify the on-call
 group after 60 minutes and the support group after 240 minutes. If
 services go down and then recover, everyone who has received a
 notification of a host problem should also receive the recovery
 notification. See the configuration below.

 Now, our problem is that when the second escalation has been activated
 and the support group has received the notification, only the support
 group will receive the recovery notification - the on-call group will
 never see the recovery notification.

 We do not want to send out multiple notifications to the on-call group
 four the same issue since they then would be spammed by Nagios
 unnecessarily.
 
 I don't (at least not yet) have a good answer. But maybe I can put some ideas
 in your head.
 
 My first thought is that if they want the recovery notification maybe they
 would not mind the extra one either. The extra one actually tells them that
 the issue was escalated and might be useful information. If they don't want
 the issue to escalate, they should acknowledge it (sticky).
 
 In order do fix it to work like you asks I have two suggestions. None of them
 is good.
 
 If you do not have that many contacts, create an additional one for each
 member in the on-call with only recovery-alerts and put them in a group, e.g.
 on-call-recovery and escalate to that one. They will now get the recovery
 notification.
 

I don't think they will. There are checks to make sure recovery notifications
are only sent to contacts who have received the previous problem notification.

 An other alternative is to modify your notification-command to take notice of
 the macros $SERVICENOTIFICATIONNUMBER$ and maybe $HOSTNOTIFICATIONNUMBER$ and
 build the logic you wish. Make sure to do it right so you don't miss
 important notifications.
 
 But, as I said, I don't like any of the ideas. There are very smart people on
 this list and someone will probably give you some more advice.
 

Sending a patch to make sure each problem object in the Nagios core contains a
concatenated list of normal and escalated contacts would be favourite, since
that would mean everyone who received the problem notification will also get
the recovery notification. This would best be implemented by building a linked
list with only unique elements to operate on. The list should probably contain
a marker to mention which contacts were added from the escalation, so the
original contacts do not get notified if they don't want to get the escalated
notifications.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nagios future?

2009-06-09 Thread Andreas Ericsson
Michael Friedrich wrote:
 Andreas Ericsson wrote the following on 09.06.2009 15:09:

 Nagios will move to git when 3.2.0 is out the door. Ethan wants some
 time to manage patches and stuff like he's used to without having to
 learn another tool. I'm sure he'll curse himself for not switching
 sooner when he learns the benefits of git, but at least we're getting
 there.
 Well some common aliases from cvs for git will help too ;-) I've been 
 looking onto git for about 3 weeks and I like to use this cheatsheet a 
 lot: http://ktown.kde.org/~zrusin/git/git-cheat-sheet-medium.png

Ugh. I absolutely hate that, because it just tells you do this, do that
but doesn't explain *why*. It never mentions why the index is there, or
how you can use it when you run into stuff that's actually *hard*, such
as an 8-way merge that suddenly went wahoonie-shaped.

But to each his own, I guess.


 One of the annoying things about the icinga-fork though is that they've
 mainly done a lot of renaming and not so much actual patching. This will
 ofcourse merge cleanly but in an unsatisfactory way for Nagios. Messy,
 but certainly possible to work around.
 Yep that is true but to say Hey it's like Nagios but not the same all 
 names had to be removed/changed. But concerning merging patches it 
 shouldn't be that big problem. Current Nagios patches have been pulled 
 over and merged into actual Icinga source. So backwards it should work 
 then too.
 

It has? I'll have to take a look at that, I think. The hard part will
be to separate the cruft from the code, so that only the real changes
appear in a diff. Some simple sed magic will probably do the trick
though.

 Very realistic. We're already using it for development to that purpose,
 and it's working just fine. One problem with NDOUtils is that the 
 database
 schema makes it impossible to write stuff for it that scale linearly.
 That's totally unacceptable for us, so we had to come up with something
 new. Fortunately, Lars Hjemli of the NagVis project has been very 
 friendly
 and cooperative in helping us add support for the Merlin database schema
 in NagVis. Given how simple the Merlin schema is, I have no doubt that
 we'll provide patches to other projects to achieve the same thing.

 Yeah i like that move because everyone is holding back on the DB schema 
 of the NDO which is far too normalized and doesn't scale. And my biggest 
 concern right now, Oracle limits table and column names to max 30 
 characters (varchar2(30)). Maybe you'll keep an eye on that while 
 testing your schema.

I haven't actually thought about it. A quick glance reveals that the
serviceescalation_contactgroup junction table is the one with the
longest name, weighing in at 31 characters. That can be fixed quite
easily though, since junction table names are determined by a function
which can easily special-case this particular one.


 It's been planned, implemented, tested and available since 2009-03-17.
 Additional bugfixes happened later, but libdbi has been in use in
 Merlin almost three months now.

 Ok good to hear that - some query normalizations and other database 
 specific stuff will pop up for sure. I've been hitting on the 
 libdbi-driver for Oracle and it seems to work (connection using the 
 IDOUtils to remote Oracle-Server). When everything works out I hope to 
 push source for libdbi Oracle soon to Icinga IDOUtils. Even though IDO 
 and Merlin are different, but I think  hope libdbi knowledge can be 
 shared in this case :)
 

Since libdbi provides a database-agnostic api (it would be quite useless
if it didn't), a simple thing such as loading the correct driver should
suffice to make it work with Merlin as well. Which driver to use can be
specified in the Merlin configuration file. However, there's currently
no oracle driver for Kohana that I'm aware of, and that means Ninja
won't be able to benefit from an Oracle database even if Merlin can
write to it.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nagios monitoring for hack

2009-06-09 Thread Andrew Davis
I'd look into the various hardening and monitoring tools available 
(Bastille, Tripwire, chroot, etc). There's different tools for different 
purposes, obviously. We chroot all our BIND and Apache stuff. Bastille 
is great for hardening the environment. Tripwire monitors for changes to 
key files. Each program has its own logging mechanisms. So once you have 
your tool in place, you can use Nagios to watch the log file(s) and 
generate alerts based on keywords (ALERT, WARN, CRIT, etc). You can also 
dump your logs to an alternate server and have Nagios watch them from 
there, but in the case of DDoS attack, your bandwidth may be affected 
for remote syslog and/or Nagios network checks.


 A. Davis
 Email: ncc...@gmail.com

 There is no limit to what a man can accomplish
  if he doesn't care who gets the credit. - Ronald Reagan



shadih rahman wrote:
our web sites got hacked and we were subjected to ddos for last few 
days.  I wanted to know what can I do for monitoring to find out if I 
am hacked or not.  By the way, we were hacked by php exploits.  Please 
advise on this.  Thanks


--
Cordially,
Shadhin Rahman


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.

http://p.sf.net/sfu/businessobjects


___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] ndo utils question

2009-06-09 Thread shadih rahman
All,
   I have been running ndoutils with nagios for a while.  When I initially
setup my nagios, I played around with a lot of different service checks and
changed around a lot of config parameters.  Now, I have a solid setup and I
have not changed configuration for a while.  When I go into the database and
look at nagios_objects tables, I see all sorts of old objects which do not
exist in my current setup.  Does ndoutils clean up and throw away old config
when we start nagios?  Please advise on this.  Thanks

-- 
Cordially,
Shadhin Rahman
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] check_hpasm (3.5) problem on RHEL3

2009-06-09 Thread CSingh
I have check_hpasm installed successfully on RHEL3 but I am unable to
get it to work (I have it working correctly for me on RHEL4 and 5). Here
is the problem that I face with RHEL3:

 

[nag...@mysys nagios]$ /usr/local/nagios/libexec/check_nrpe -H localhost
-c check_hpasm

UNKNOWN - insufficient rights to call /usr/sbin/hpacucli

 

[nag...@mysys nagios]$ sudo /usr/local/nagios/libexec/check_hpasm -v

UNKNOWN - insufficient rights to call /usr/sbin/hpacucli

 

My /etc/sudoers has

 

nagios
ALL=NOPASSWD:/sbin/hpasmcli,/usr/sbin/hpacucli,/usr/local/nagios/libexec
/check_hpasm

 

 

Calling /usr/bin/hpacucli works correctly using sudo:

 

[nag...@mysys nagios]$ sudo /usr/sbin/hpacucli

HP Array Configuration Utility CLI 7.40.7.0

Detecting Controllers...Done.

Type help for a list of supported commands.

Type exit to close the console.

 

=

 

[nag...@mysys nagios]$ sudo /usr/sbin/hpacucli -s help

 

To enter the ACU CLI console type:

   hpacucli

 

Commands can also be executed from outside the

ACU CLI console using the syntax:

   hpacucli target command [param[=value]]

 

All targets, commands, parameters, and values must be entered in
lowercase.

The only exceptions to this are user-specified names, such as
chassisname.

 

target command [param[=value]]

target is of format:

  [controller all|slot=#|wwn=#|chassisname=AAA] [array all|id]

  [physicaldrive all|#:#:#|allunassigned] [logicaldrive all|#]

  Note: The first # in physicaldrive is only needed for systems that

specify port:box:bay. Other physical drive targeting schemes

are box:bay and port:id.

Example targets:

   controller all

   controller slot=5

   controller chassisname=Lab C

   controller serialnumber=P21DA2322S

   controller wwn=500308B300701011

   controller slot=1 array all

   controller slot=7 array A

   ctrl slot=1 pd allunassigned

   controller slot=2 logicaldrive all

   controller slot=5 ld 5

   controller slot=5 physicaldrive 1:5

   controller slot=5 physicaldrive 1E:2:3

 

command can be create,delete,modify,show,rescan

For detailed command information type any of the following:

   help add

   help create

   help delete

   help modify

   help remove

   help shorthand

   help show

   help target

   help rescan

 

 

What else could I try?

 

Thanks!

Charanbeer


This email is intended only for the named person or entity to which it is 
addressed and contains valuable business information that is proprietary, 
privileged, confidential and/or otherwise protected from disclosure. 
Dissemination, distribution or copying of this email or the information herein 
by anyone other than the intended recipient, or an employee or agent 
responsible for delivering the message to the intended recipient, is 
prohibited. If you have received this email by mistake, please delete it from 
your system immediately and notify the sender. Email transmission cannot be 
guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The 
sender therefore does not accept liability for any errors or omissions in the 
contents of this message, which arise as a result of email transmission.
 
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] DNS down and false alerts...

2009-06-09 Thread Andrew Davis
I've observed an interesting issue with Nagios. Our environment is a mix 
of UNIX, Linux, Apple, and Windows. The core of the network is Active 
Directory including two AD servers that are both our primary, internal 
DNS servers. All non-Windows systems have a resolv.conf that looks like:


   *nameserver 10.1.1.13
   nameserver 10.1.1.14
   domain int.our.domain
   search int.our.domain*

About half of the servers have the nameserver entries inverted (ie: .14 
first, .13 second).


The issue is that anytime one of the nameservers is rebooted (at least 
once a month if staying current on patches thanks to Black Tuesdays), 
whichever hosts have that nameserver listed first in its resolv.conf 
start throwing the following errors:


   *CRITICAL - Plugin timed out while executing system call.*

This occurs for multiple tests for each host. Obviously, there's a name 
resolution correlation here. If the nameserver with .13 is rebooted, all 
hosts (about half of them) that list this IP first in their resolve.conf 
then timeout for multiple tests. If the .14 server is rebooted, all the 
other hosts timeout. Interestingly, none of the Windows clients issue 
errors... only UNIX, Linux, and Mac's... only those with an 
/etc/resolv.conf. The end result is a host of false positives, but 
more importantly it looks bad on availability reports and causes 
phones/pagers to go ballistic with unneeded emails.


I'm trying to find a solution and I can't find one that I like:

Solution 1) is to cluster the DNS servers. We have lots of clusters 
here. This isn't good, though, as you don't normally cluster DNS 
servers... they're meant to be redundant for a reason... one fails and 
it uses the next one.


Solution 2) is to setup a service/host dependency. My thought would be 
either a host dependency that says if either .13 or .14 are down, then 
don't alert for any other host that uses them. Or a service to host 
dependency... if the DNS service is down, then don't alert on any of 
these dependent hosts. Honestly, I'm not sure if you can mix host and 
service dependencies like this... plus... if the DNS server is actually 
down, then the DNS service is down, so better to use a host dependency. 
The problem is that now we're not alerting on any dependent hosts which 
themselves could have a legitimate issue we want to know about. Plus, 
what happens if the DNS server actually dies and take a few hours/days 
to rebuild/restore? At this point, the dependent hosts aren't watched 
for a very long time.


Solution 3) is to setup a UNIX/Linux DNS server that slaves all zones 
from the AD servers and have all UNIX/Linux/Apple clients query from 
this server. This would work except that A) I need two of them to keep 
redundancy and B) I've now added an extra layer of complication to 
resolve an application (Nagios)... not exactly good practice.


Solution 4) is to set the timeout value of a host querying a DNS server. 
Perhaps adjust the client to timeout on the first listed nameserver 
after only 10 seconds, then try the next one? Since most Nagios tests 
have a minimum timeout value of 30 seconds, if the first DNS query timed 
out after 10 seconds, it would go to the next one with, hopefully, 
enough time to respond. The downside is having to adjust every single 
server.


Has anyone else seen this? Anyone else using Windows AD servers to 
provide DNS for *nix servers?


--


 A. Davis
 Email: ncc...@gmail.com

 There is no limit to what a man can accomplish
  if he doesn't care who gets the credit. - Ronald Reagan

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Log2ndo Not Placing Historical Logs into DB

2009-06-09 Thread Derek J. Morris
My old logs arent going into the DB.

I am running as follows:

./log2ndo -s /usr/local/nagios/var/archives/*.log -d
/usr/local/nagios/var/ndo.sock -i default -t unix -p 5668

but nothing is going in to the db, i see db connections successful in log and
disconnect successful but nothing is being entered. Running single instance of
nagios, default setup of db. NDOUtils 1.47b and Nagios 3.1.0.

-Derek



--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] DNS down and false alerts...

2009-06-09 Thread Randal, Phil
Option 5:  Install a local caching DNS server on your nagios box, and
put 127.0.0.1 at the top of resolv.conf.
 
Cheers,
 
Phil
-- 
Phil Randal | Networks Engineer 
Herefordshire Council | Deputy Chief Executive's Office | I.C.T.
Services Division 
Thorn Office Centre, Rotherwas, Hereford, HR2 6JT 
Tel: 01432 260160 
email: pran...@herefordshire.gov.uk 

Any opinion expressed in this e-mail or any attached files are those of
the individual and not necessarily those of Herefordshire Council.

This e-mail and any attached files are confidential and intended solely
for the use of the addressee. This communication may contain material
protected by law from being passed on. If you are not the intended
recipient and have received this e-mail in error, you are advised that
any use, dissemination, forwarding, printing or copying of this e-mail
is strictly prohibited. If you have received this e-mail in error please
contact the sender immediately and destroy all copies of it.

 



From: Andrew Davis [mailto:ncc...@gmail.com] 
Sent: 09 June 2009 16:19
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] DNS down and false alerts...


I've observed an interesting issue with Nagios. Our environment is a mix
of UNIX, Linux, Apple, and Windows. The core of the network is Active
Directory including two AD servers that are both our primary, internal
DNS servers. All non-Windows systems have a resolv.conf that looks like:


nameserver 10.1.1.13
nameserver 10.1.1.14
domain int.our.domain
search int.our.domain


About half of the servers have the nameserver entries inverted (ie: .14
first, .13 second).

The issue is that anytime one of the nameservers is rebooted (at least
once a month if staying current on patches thanks to Black Tuesdays),
whichever hosts have that nameserver listed first in its resolv.conf
start throwing the following errors:


CRITICAL - Plugin timed out while executing system call.


This occurs for multiple tests for each host. Obviously, there's a name
resolution correlation here. If the nameserver with .13 is rebooted, all
hosts (about half of them) that list this IP first in their resolve.conf
then timeout for multiple tests. If the .14 server is rebooted, all the
other hosts timeout. Interestingly, none of the Windows clients issue
errors... only UNIX, Linux, and Mac's... only those with an
/etc/resolv.conf. The end result is a host of false positives, but
more importantly it looks bad on availability reports and causes
phones/pagers to go ballistic with unneeded emails.

I'm trying to find a solution and I can't find one that I like:

Solution 1) is to cluster the DNS servers. We have lots of clusters
here. This isn't good, though, as you don't normally cluster DNS
servers... they're meant to be redundant for a reason... one fails and
it uses the next one.

Solution 2) is to setup a service/host dependency. My thought would be
either a host dependency that says if either .13 or .14 are down, then
don't alert for any other host that uses them. Or a service to host
dependency... if the DNS service is down, then don't alert on any of
these dependent hosts. Honestly, I'm not sure if you can mix host and
service dependencies like this... plus... if the DNS server is actually
down, then the DNS service is down, so better to use a host dependency.
The problem is that now we're not alerting on any dependent hosts which
themselves could have a legitimate issue we want to know about. Plus,
what happens if the DNS server actually dies and take a few hours/days
to rebuild/restore? At this point, the dependent hosts aren't watched
for a very long time.

Solution 3) is to setup a UNIX/Linux DNS server that slaves all zones
from the AD servers and have all UNIX/Linux/Apple clients query from
this server. This would work except that A) I need two of them to keep
redundancy and B) I've now added an extra layer of complication to
resolve an application (Nagios)... not exactly good practice.

Solution 4) is to set the timeout value of a host querying a DNS server.
Perhaps adjust the client to timeout on the first listed nameserver
after only 10 seconds, then try the next one? Since most Nagios tests
have a minimum timeout value of 30 seconds, if the first DNS query timed
out after 10 seconds, it would go to the next one with, hopefully,
enough time to respond. The downside is having to adjust every single
server.

Has anyone else seen this? Anyone else using Windows AD servers to
provide DNS for *nix servers? 

-- 


  A. Davis
  Email: ncc...@gmail.com

  There is no limit to what a man can accomplish
   if he doesn't care who gets the credit. - Ronald Reagan
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for 

[Nagios-users] Synchronizing turning off notifications.

2009-06-09 Thread Mike lucker
We have two Nagios servers with one acting as a fallback.  We run a sync
program every time there is an update.  This sync copies the config files
from mon1 to mon2, stops and restarts the backup.  Works well, but the
problem we're having is that it is not syncing when we turn notification off
on mon1.  If it falls back to mon2 it will page for that device.

Does anyone know where Nagios stores the notification=off option when it
is changed via the web interface?  I suspect we're missing the copy of that
file.

Thanks,

Michael Lucker
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] nagios future?

2009-06-09 Thread Michael Friedrich


Andreas Ericsson wrote the following on 09.06.2009 16:33:

 It has? I'll have to take a look at that, I think. The hard part will
 be to separate the cruft from the code, so that only the real changes
 appear in a diff. Some simple sed magic will probably do the trick
 though.
Would be bad if it hasn't - as long as both projects are mainly 
compatible they can profit from each other. GIT is really nice for that 
but you should ask Hendrik instead how to deal with that :-)

https://git.icinga.org/index?p=icinga-core.git;a=summary
 I haven't actually thought about it. A quick glance reveals that the
 serviceescalation_contactgroup junction table is the one with the
 longest name, weighing in at 31 characters. That can be fixed quite
 easily though, since junction table names are determined by a function
 which can easily special-case this particular one.
Mh thanks for the tip I need to think about that in more deep.

 Since libdbi provides a database-agnostic api (it would be quite useless
 if it didn't), a simple thing such as loading the correct driver should
 suffice to make it work with Merlin as well. Which driver to use can be
 specified in the Merlin configuration file. However, there's currently
 no oracle driver for Kohana that I'm aware of, and that means Ninja
 won't be able to benefit from an Oracle database even if Merlin can
 write to it.
The thing which I am missing in libdbi-implementation is parameter 
binding which really is a performance tweak with lots of queries with 
different values. Another headache but maybe I'll hack that and send a 
patch to the developers. Mostly it is important for Oracle meanwhile.

About Kohana - had a short look into the Database drivers. Oracle 
support won't be that big problem to implement but that won't be me. 
Hopefully it will be done because then Ninja and Merlin combined to 
Nagios would be an option alternatively to Icinga with optimized IDO for 
Oracle (which is my main task right now).

Kind regards,
Michael

-- 
DI (FH) Michael Friedrich
michael.friedr...@univie.ac.at
Tel: +43 1 4277 14359

Vienna University Computer Center
Universitaetsstrasse 7 
A-1010 Vienna, Austria  


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] DNS down and false alerts...

2009-06-09 Thread Russell Adams
Really the best choice is to using caching DNS on the Nagios
server. I'd recommend dnsmasq, it just does caching locally without
needing to do big zone transfers. It has low overhead and simple
configuration as a result.

Enjoy.

On Tue, Jun 09, 2009 at 11:19:20AM -0400, Andrew Davis wrote:
 I've observed an interesting issue with Nagios. Our environment is a mix  
 of UNIX, Linux, Apple, and Windows. The core of the network is Active  
 Directory including two AD servers that are both our primary, internal  
 DNS servers. All non-Windows systems have a resolv.conf that looks like:

*nameserver 10.1.1.13
nameserver 10.1.1.14
domain int.our.domain
search int.our.domain*

 About half of the servers have the nameserver entries inverted (ie: .14  
 first, .13 second).

 The issue is that anytime one of the nameservers is rebooted (at least  
 once a month if staying current on patches thanks to Black Tuesdays),  
 whichever hosts have that nameserver listed first in its resolv.conf  
 start throwing the following errors:

*CRITICAL - Plugin timed out while executing system call.*

 This occurs for multiple tests for each host. Obviously, there's a name  
 resolution correlation here. If the nameserver with .13 is rebooted, all  
 hosts (about half of them) that list this IP first in their resolve.conf  
 then timeout for multiple tests. If the .14 server is rebooted, all the  
 other hosts timeout. Interestingly, none of the Windows clients issue  
 errors... only UNIX, Linux, and Mac's... only those with an  
 /etc/resolv.conf. The end result is a host of false positives, but  
 more importantly it looks bad on availability reports and causes  
 phones/pagers to go ballistic with unneeded emails.

 I'm trying to find a solution and I can't find one that I like:

 Solution 1) is to cluster the DNS servers. We have lots of clusters  
 here. This isn't good, though, as you don't normally cluster DNS  
 servers... they're meant to be redundant for a reason... one fails and  
 it uses the next one.

 Solution 2) is to setup a service/host dependency. My thought would be  
 either a host dependency that says if either .13 or .14 are down, then  
 don't alert for any other host that uses them. Or a service to host  
 dependency... if the DNS service is down, then don't alert on any of  
 these dependent hosts. Honestly, I'm not sure if you can mix host and  
 service dependencies like this... plus... if the DNS server is actually  
 down, then the DNS service is down, so better to use a host dependency.  
 The problem is that now we're not alerting on any dependent hosts which  
 themselves could have a legitimate issue we want to know about. Plus,  
 what happens if the DNS server actually dies and take a few hours/days  
 to rebuild/restore? At this point, the dependent hosts aren't watched  
 for a very long time.

 Solution 3) is to setup a UNIX/Linux DNS server that slaves all zones  
 from the AD servers and have all UNIX/Linux/Apple clients query from  
 this server. This would work except that A) I need two of them to keep  
 redundancy and B) I've now added an extra layer of complication to  
 resolve an application (Nagios)... not exactly good practice.

 Solution 4) is to set the timeout value of a host querying a DNS server.  
 Perhaps adjust the client to timeout on the first listed nameserver  
 after only 10 seconds, then try the next one? Since most Nagios tests  
 have a minimum timeout value of 30 seconds, if the first DNS query timed  
 out after 10 seconds, it would go to the next one with, hopefully,  
 enough time to respond. The downside is having to adjust every single  
 server.

 Has anyone else seen this? Anyone else using Windows AD servers to  
 provide DNS for *nix servers?

 -- 


  A. Davis
  Email: ncc...@gmail.com

  There is no limit to what a man can accomplish
   if he doesn't care who gets the credit. - Ronald Reagan


 --
 Crystal Reports - New Free Runtime and 30 Day Trial
 Check out the new simplified licensing option that enables unlimited
 royalty-free distribution of the report engine for externally facing 
 server and web deployment.
 http://p.sf.net/sfu/businessobjects
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null


--
Russell Adamsrlad...@adamsinfoserv.com

PGP Key ID: 0x1160DCB3   http://www.adamsinfoserv.com/

Fingerprint:1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3

--
Crystal Reports - New Free 

Re: [Nagios-users] Synchronizing turning off notifications.

2009-06-09 Thread Marc Powell

On Jun 9, 2009, at 10:55 AM, Mike lucker wrote:

 We have two Nagios servers with one acting as a fallback.  We run a  
 sync program every time there is an update.  This sync copies the  
 config files from mon1 to mon2, stops and restarts the backup.   
 Works well, but the problem we're having is that it is not syncing  
 when we turn notification off on mon1.  If it falls back to mon2 it  
 will page for that device.

 Does anyone know where Nagios stores the notification=off option  
 when it is changed via the web interface?  I suspect we're missing  
 the copy of that file.

It's stored in memory and periodically written out the the retention  
file and status file based on your schedule or on shutdown. I haven't  
tried it but I'd suggest shutting down nagios on the master to ensure  
that the retention file is up-to-date, shut down the backup, rsync and  
restart both. The retention file is the one you want as the status  
file is recreated on startup based on config+retention.dat.

--
Marc


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Synchronizing turning off notifications.

2009-06-09 Thread Frost, Mark {PBG}

From: Mike lucker [mailto:mike.luc...@gmail.com] 
Sent: Tuesday, June 09, 2009 11:56 AM
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Synchronizing turning off notifications.

We have two Nagios servers with one acting as a fallback.  We run a sync 
program every time there is an update.  This sync copies the config files 
from mon1 to mon2, stops and restarts the backup.  Works well, but the 
problem we're having is that it is not syncing when we turn notification off 
on mon1.  If it falls back to mon2 it will page for that device.

Does anyone know where Nagios stores the notification=off option when it is 
changed via the web interface?  I suspect we're missing the copy of that file.

Thanks,

Michael Lucker

Michael,

This information is stored in the status.dat file as part of the running 
configuration
from mon1.  If mon2 is actually running, I'd recommend sending an external 
command to
that running instance to disable notifications.

See

http://www.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=7


You could then run the enable notifications command at some point when you need 
it to act
as the real server

http://www.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=8

Mark

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] DNS down and false alerts...

2009-06-09 Thread Marc Powell

On Jun 9, 2009, at 10:42 AM, Randal, Phil wrote:

 Option 5:  Install a local caching DNS server on your nagios box,  
 and put 127.0.0.1 at the top of resolv.conf.

My reading of the issue, and I believe that I've seen it in the past  
as well, is that the problem isn't with DNS resolution on the nagios  
box but DNS resolution happening on the target boxes. Installing a  
caching nameserver on the nagios box isn't going to help any. The  
target system is trying to do a DNS lookup on the connecting host  
(nagios). The OP isn't specific on how he's checking these boxes so it  
could be xinetd, nrpe, whatever... The default timeout for DNS server  
failure detection in the resolver libraries is too long so the plugin  
times out. I'd personally look at changing that timeout and rotation  
between servers in resolv.conf (options timeout:x rotate).

--
Marc


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] DNS down and false alerts...

2009-06-09 Thread Martin Melin
I don't know if I'm misreading the OP, but if the plugins start timing out
on only the boxes whose primary DNS is being rebooted, would adding a
caching DNS server to the Nagios box really make a difference?

I think the root cause to these timeouts is that the Nagios plugin timeout
is happening before the connection to the primary DNS on the target machine
has a chance to time out and then connect to the secondary DNS.

The correct course of action to resolve this would be to either make sure
that the DNS connection on the target machines fail quicker, or that
Nagios/the plugin waits longer for a result from the check. The DNS failover
is working as designed here but you're not giving it enough time to kick in.

On Tue, Jun 9, 2009 at 5:37 PM, Russell Adams rlad...@adamsinfoserv.comwrote:

 Really the best choice is to using caching DNS on the Nagios
 server. I'd recommend dnsmasq, it just does caching locally without
 needing to do big zone transfers. It has low overhead and simple
 configuration as a result.

 Enjoy.

 On Tue, Jun 09, 2009 at 11:19:20AM -0400, Andrew Davis wrote:
  I've observed an interesting issue with Nagios. Our environment is a mix
  of UNIX, Linux, Apple, and Windows. The core of the network is Active
  Directory including two AD servers that are both our primary, internal
  DNS servers. All non-Windows systems have a resolv.conf that looks like:
 
 *nameserver 10.1.1.13
 nameserver 10.1.1.14
 domain int.our.domain
 search int.our.domain*
 
  About half of the servers have the nameserver entries inverted (ie: .14
  first, .13 second).
 
  The issue is that anytime one of the nameservers is rebooted (at least
  once a month if staying current on patches thanks to Black Tuesdays),
  whichever hosts have that nameserver listed first in its resolv.conf
  start throwing the following errors:
 
 *CRITICAL - Plugin timed out while executing system call.*
 
  This occurs for multiple tests for each host. Obviously, there's a name
  resolution correlation here. If the nameserver with .13 is rebooted, all
  hosts (about half of them) that list this IP first in their resolve.conf
  then timeout for multiple tests. If the .14 server is rebooted, all the
  other hosts timeout. Interestingly, none of the Windows clients issue
  errors... only UNIX, Linux, and Mac's... only those with an
  /etc/resolv.conf. The end result is a host of false positives, but
  more importantly it looks bad on availability reports and causes
  phones/pagers to go ballistic with unneeded emails.
 
  I'm trying to find a solution and I can't find one that I like:
 
  Solution 1) is to cluster the DNS servers. We have lots of clusters
  here. This isn't good, though, as you don't normally cluster DNS
  servers... they're meant to be redundant for a reason... one fails and
  it uses the next one.
 
  Solution 2) is to setup a service/host dependency. My thought would be
  either a host dependency that says if either .13 or .14 are down, then
  don't alert for any other host that uses them. Or a service to host
  dependency... if the DNS service is down, then don't alert on any of
  these dependent hosts. Honestly, I'm not sure if you can mix host and
  service dependencies like this... plus... if the DNS server is actually
  down, then the DNS service is down, so better to use a host dependency.
  The problem is that now we're not alerting on any dependent hosts which
  themselves could have a legitimate issue we want to know about. Plus,
  what happens if the DNS server actually dies and take a few hours/days
  to rebuild/restore? At this point, the dependent hosts aren't watched
  for a very long time.
 
  Solution 3) is to setup a UNIX/Linux DNS server that slaves all zones
  from the AD servers and have all UNIX/Linux/Apple clients query from
  this server. This would work except that A) I need two of them to keep
  redundancy and B) I've now added an extra layer of complication to
  resolve an application (Nagios)... not exactly good practice.
 
  Solution 4) is to set the timeout value of a host querying a DNS server.
  Perhaps adjust the client to timeout on the first listed nameserver
  after only 10 seconds, then try the next one? Since most Nagios tests
  have a minimum timeout value of 30 seconds, if the first DNS query timed
  out after 10 seconds, it would go to the next one with, hopefully,
  enough time to respond. The downside is having to adjust every single
  server.
 
  Has anyone else seen this? Anyone else using Windows AD servers to
  provide DNS for *nix servers?
 
  --
 
 
   A. Davis
   Email: ncc...@gmail.com
 
   There is no limit to what a man can accomplish
if he doesn't care who gets the credit. - Ronald Reagan
 

 
 --
  Crystal Reports - New Free Runtime and 30 Day Trial
  Check out the new simplified licensing option that enables unlimited
  royalty-free distribution of the report 

Re: [Nagios-users] nagios future?

2009-06-09 Thread Andreas Ericsson
Michael Friedrich wrote:
 
 
 Andreas Ericsson wrote the following on 09.06.2009 16:33:

 It has? I'll have to take a look at that, I think. The hard part will
 be to separate the cruft from the code, so that only the real changes
 appear in a diff. Some simple sed magic will probably do the trick
 though.
 Would be bad if it hasn't - as long as both projects are mainly 
 compatible they can profit from each other.

Right. I'll have to revisit it and see what's new.

 GIT is really nice for that 

I know. I helped write it after all :p

 but you should ask Hendrik instead how to deal with that :-)
 

I think I'll find a way. Thanks for the tip though.

 I haven't actually thought about it. A quick glance reveals that the
 serviceescalation_contactgroup junction table is the one with the
 longest name, weighing in at 31 characters. That can be fixed quite
 easily though, since junction table names are determined by a function
 which can easily special-case this particular one.
 Mh thanks for the tip I need to think about that in more deep.

Well, it'll be ambiguous even if one char is stripped from it, so just
cutting the name at 30 chars might be worthwhile if we're on oracle.


 Since libdbi provides a database-agnostic api (it would be quite useless
 if it didn't), a simple thing such as loading the correct driver should
 suffice to make it work with Merlin as well. Which driver to use can be
 specified in the Merlin configuration file. However, there's currently
 no oracle driver for Kohana that I'm aware of, and that means Ninja
 won't be able to benefit from an Oracle database even if Merlin can
 write to it.
 The thing which I am missing in libdbi-implementation is parameter 
 binding which really is a performance tweak with lots of queries with 
 different values. Another headache but maybe I'll hack that and send a 
 patch to the developers. Mostly it is important for Oracle meanwhile.
 
 About Kohana - had a short look into the Database drivers. Oracle 
 support won't be that big problem to implement but that won't be me. 
 Hopefully it will be done because then Ninja and Merlin combined to 
 Nagios would be an option alternatively to Icinga with optimized IDO for 
 Oracle (which is my main task right now).
 

Well, unless I'm mistaken IDO will have the same database layout as NDO,
so it will still suck for performance, and writing good queries for it
will still be a major headache. When the storage model of the algorithm
algorithm is broken, tweaking it doesn't really help. Only a rewrite
can save you then.

It would be neat to see Merlin adapted to Oracle though, so if you're
interested in working on that we'd sure help as much as we can.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] ndo utils question

2009-06-09 Thread Andreas Ericsson
shadih rahman wrote:
 All,
I have been running ndoutils with nagios for a while.  When I initially
 setup my nagios, I played around with a lot of different service checks and
 changed around a lot of config parameters.  Now, I have a solid setup and I
 have not changed configuration for a while.  When I go into the database and
 look at nagios_objects tables, I see all sorts of old objects which do not
 exist in my current setup.  Does ndoutils clean up and throw away old config
 when we start nagios?

No, but it marks them as inactive (or some such).

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] passive service check where display_name larger then 128 characters

2009-06-09 Thread Paul Vaes
Hello,

The issue is that if I define a nagios service where the service description
is larger than 128 characters, everything seem to work properly except that
running a script which sends a passive service check via send_nsca the
service in the Nagius gui is not updated although the send_nsca says it was
successfully sent. Looking in the nagios log, I see that the passive check
is arrived but that the service description is chopped at 128 chars.

I wonder if anyone fixed this problem already? It looks to me that following
line in include/common.h causes the issue
#define MAX_DESCRIPTION_LENGTH128

I assume that I need to recompile nsca ( for the server ) and send_nsca (
for the client where I need to use a service description longer then 128).
The problem is that as soon as I will use the new nsca binary on the server,
I expect problems with all the servers which still are using the original
send_ncsa.

Anybody any idea's, suggestions or solutions.

I am using the latest nsca and send_nsca 2.7.2
nsca is running on SUSE 10.2
send_nsca on different Unix and Linux falvours


Thanks in advance,

-- 
Groetjes,

Paul
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] DNS down and false alerts...

2009-06-09 Thread Andrew Davis
Hey... I'm the OP. We're using a mix of client tools. For Windows 
systems (which aren't affected by this) we use nsclient++. For our Linux 
servers, NRPE... for UNIX (Solaris) and OS X we're using check_by_ssh. 
Both the NRPE and check_by_ssh clients are affected by this.


I'm willing to give the caching nameserver on the server a try, but as 
others have noted, I don't think it will make a difference as its the 
local test on the client that's failing to resolv. I surely cannot do a 
caching nameserver setup on all clients...


 A. Davis
 Email: ncc...@gmail.com

 There is no limit to what a man can accomplish
  if he doesn't care who gets the credit. - Ronald Reagan



Martin Melin wrote:
I don't know if I'm misreading the OP, but if the plugins start timing 
out on only the boxes whose primary DNS is being rebooted, would 
adding a caching DNS server to the Nagios box really make a difference?


I think the root cause to these timeouts is that the Nagios plugin 
timeout is happening before the connection to the primary DNS on the 
target machine has a chance to time out and then connect to the 
secondary DNS.


The correct course of action to resolve this would be to either make 
sure that the DNS connection on the target machines fail quicker, or 
that Nagios/the plugin waits longer for a result from the check. The 
DNS failover is working as designed here but you're not giving it 
enough time to kick in.


On Tue, Jun 9, 2009 at 5:37 PM, Russell Adams 
rlad...@adamsinfoserv.com mailto:rlad...@adamsinfoserv.com wrote:


Really the best choice is to using caching DNS on the Nagios
server. I'd recommend dnsmasq, it just does caching locally without
needing to do big zone transfers. It has low overhead and simple
configuration as a result.

Enjoy.

On Tue, Jun 09, 2009 at 11:19:20AM -0400, Andrew Davis wrote:
 I've observed an interesting issue with Nagios. Our environment
is a mix
 of UNIX, Linux, Apple, and Windows. The core of the network is
Active
 Directory including two AD servers that are both our primary,
internal
 DNS servers. All non-Windows systems have a resolv.conf that
looks like:

*nameserver 10.1.1.13
nameserver 10.1.1.14
domain int.our.domain
search int.our.domain*

 About half of the servers have the nameserver entries inverted
(ie: .14
 first, .13 second).

 The issue is that anytime one of the nameservers is rebooted (at
least
 once a month if staying current on patches thanks to Black
Tuesdays),
 whichever hosts have that nameserver listed first in its resolv.conf
 start throwing the following errors:

*CRITICAL - Plugin timed out while executing system call.*

 This occurs for multiple tests for each host. Obviously, there's
a name
 resolution correlation here. If the nameserver with .13 is
rebooted, all
 hosts (about half of them) that list this IP first in their
resolve.conf
 then timeout for multiple tests. If the .14 server is rebooted,
all the
 other hosts timeout. Interestingly, none of the Windows clients
issue
 errors... only UNIX, Linux, and Mac's... only those with an
 /etc/resolv.conf. The end result is a host of false positives, but
 more importantly it looks bad on availability reports and causes
 phones/pagers to go ballistic with unneeded emails.

 I'm trying to find a solution and I can't find one that I like:

 Solution 1) is to cluster the DNS servers. We have lots of clusters
 here. This isn't good, though, as you don't normally cluster DNS
 servers... they're meant to be redundant for a reason... one
fails and
 it uses the next one.

 Solution 2) is to setup a service/host dependency. My thought
would be
 either a host dependency that says if either .13 or .14 are
down, then
 don't alert for any other host that uses them. Or a service to host
 dependency... if the DNS service is down, then don't alert on any of
 these dependent hosts. Honestly, I'm not sure if you can mix
host and
 service dependencies like this... plus... if the DNS server is
actually
 down, then the DNS service is down, so better to use a host
dependency.
 The problem is that now we're not alerting on any dependent
hosts which
 themselves could have a legitimate issue we want to know about.
Plus,
 what happens if the DNS server actually dies and take a few
hours/days
 to rebuild/restore? At this point, the dependent hosts aren't
watched
 for a very long time.

 Solution 3) is to setup a UNIX/Linux DNS server that slaves all
zones
 from the AD servers and have all UNIX/Linux/Apple clients query from
 this server. This would work except that A) I need two of them
to keep
 redundancy and B) I've now added an extra layer of complication to

Re: [Nagios-users] Recovery notifications after escalations

2009-06-09 Thread Marcus Rejås
On 06/09 15:47, Andreas Ericsson wrote:
 Marcus Rejås wrote:
  
  If you do not have that many contacts, create an additional one for each
  member in the on-call with only recovery-alerts and put them in a group, 
  e.g.
  on-call-recovery and escalate to that one. They will now get the recovery
  notification.
  
 
 I don't think they will. There are checks to make sure recovery notifications
 are only sent to contacts who have received the previous problem notification.

You are absolutely right (as always, however this time I took the time to
test and prove myself wrong...). 

I am, and was, aware of the checks you are referring to and they really do
make sense in most places e.g. leaving and entering timeperiods. But in this
context they are confusing.

If I set up a contact with:

host_notifications_enabled  1
service_notifications_enabled   1
service_notification_period 24x7
host_notification_period24x7
host_notification_options   r
service_notification_optionsr

It will never ever get any notifications. To be honest I cannot see any
practical use of this contact but until I tested now I would say that this
contact should get only recovery notifications. This is not something I would
see fixed but it might be good to point this out on host- and
service_notification_options in the manual.


 Sending a patch to make sure each problem object in the Nagios core contains a
 concatenated list of normal and escalated contacts would be favourite, since
 that would mean everyone who received the problem notification will also get
 the recovery notification. This would best be implemented by building a linked
 list with only unique elements to operate on. The list should probably contain
 a marker to mention which contacts were added from the escalation, so the
 original contacts do not get notified if they don't want to get the escalated
 notifications.

I agree :-)

-- 
Marcus Rejås  jabber:   mar...@jabber.rejas.se  ,= ,-_-. =. 
Rejås Datakonsult e-mail:   mar...@rejas.se((_/)o o(\_))
Kaserngatan 1 web:  http://www.rejas.se `-'(. .)`-' 
s-761 46 Norrtäljegpg-key:  http://gpg.rejas.se \_/ 

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] passive service check where display_name larger then 128 characters

2009-06-09 Thread Ton Voon
Hi Paul,

On 9 Jun 2009, at 18:13, Paul Vaes wrote:

 The issue is that if I define a nagios service where the service  
 description is larger than 128 characters, everything seem to work  
 properly except that running a script which sends a passive service  
 check via send_nsca the service in the Nagius gui is not updated  
 although the send_nsca says it was successfully sent. Looking in the  
 nagios log, I see that the passive check is arrived but that the  
 service description is chopped at 128 chars.

 I wonder if anyone fixed this problem already? It looks to me that  
 following line in include/common.h causes the issue
 #define MAX_DESCRIPTION_LENGTH128

 I assume that I need to recompile nsca ( for the server ) and  
 send_nsca ( for the client where I need to use a service description  
 longer then 128). The problem is that as soon as I will use the new  
 nsca binary on the server, I expect problems with all the servers  
 which still are using the original send_ncsa.

 Anybody any idea's, suggestions or solutions.

 I am using the latest nsca and send_nsca 2.7.2
 nsca is running on SUSE 10.2
 send_nsca on different Unix and Linux falvours

Yes, we've spotted this too.

There is a limitation in NSCA where the hostname is 63 characters, the  
service description is limited to 127 characters and the output is  
limited to 511 bytes. The overall NSCA packet size is 716 bytes.

We've been looking into making this packet size variable while still  
maintaining compatibility with existing send_nsca clients (we've done  
something similar with NRPE: 
http://opsview-blog.opsera.com/dotorg/2008/08/enhancing-nrpe.html) 
. Contact me off list if you are interested in sponsoring Opsera to  
develop this functionality.

Ton


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] disk IO for windows?

2009-06-09 Thread Anthony Montibello
That is partialy right,

The Alt delete thing is To get to TaskManager but only on New versions of
windows does it give access to some counter names,

The best place to go is Performance Monitor Since thats on all version of
windows since 2000
Control Panel-- Administrative Tools-- Computer Managment-- then
Performance Counter
on newer systems from Computer Management -- Reliability and performance
-- Monitoring Tools -- Performance Monitor
Once you FIND performance Monitor then click the Green + to get into the
add counters
Click the Checkbox to Show the Counter description then click arround till
you find what you need
Look for Disks for Drive stuff,

Tony (Author of NC_NEt)

On Tue, Jun 9, 2009 at 9:24 AM, Andreas Ericsson a...@op5.se wrote:

 dave stern - e-mail.pluribus.unum wrote:
  Anyone know of a plug-in or mechanism to log local disk I/O on windows?
 
  My nagios server is currently using check_nt to connect to windows hosts
  via nsclient++. I was hoping perhaps COUNTER has something buried
  within it to pull down this info.
 

 There are indeed counters for that, but due to Microsoft's stupidity the
 counter-names are different depending on which base-language you've
 used for your windows servers.

 I don't know what they're named for english platforms (or any other
 for that matter), but you should be able to view them with that thing
 you can pop up when pressing ctrl-alt-del (task manager or whatever it's
 called).

 --
 Andreas Ericsson   andreas.erics...@op5.se
 OP5 AB www.op5.se
 Tel: +46 8-230225  Fax: +46 8-230231

 Considering the successes of the wars on alcohol, poverty, drugs and
 terror, I think we should give some serious thought to declaring war
 on peace.


 --
 Crystal Reports - New Free Runtime and 30 Day Trial
 Check out the new simplified licensing option that enables unlimited
 royalty-free distribution of the report engine for externally facing
 server and web deployment.
 http://p.sf.net/sfu/businessobjects
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] disk IO for windows?

2009-06-09 Thread Curtis LaMasters
I use Disk Idle time as an indicator.  Not an original idea :(  I was
told to mimic the monitoring built into Windows SBS.

Curtis LaMasters
http://www.curtis-lamasters.com
http://www.builtnetworks.com



On Tue, Jun 9, 2009 at 8:27 PM, Anthony Montibelloamontibe...@gmail.com wrote:
 That is partialy right,

 The Alt delete thing is To get to TaskManager but only on New versions of
 windows does it give access to some counter names,

 The best place to go is Performance Monitor Since thats on all version of
 windows since 2000
 Control Panel-- Administrative Tools-- Computer Managment-- then
 Performance Counter
 on newer systems from Computer Management -- Reliability and performance
 -- Monitoring Tools -- Performance Monitor
 Once you FIND performance Monitor then click the Green + to get into the
 add counters
 Click the Checkbox to Show the Counter description then click arround till
 you find what you need
 Look for Disks for Drive stuff,

 Tony (Author of NC_NEt)

 On Tue, Jun 9, 2009 at 9:24 AM, Andreas Ericsson a...@op5.se wrote:

 dave stern - e-mail.pluribus.unum wrote:
  Anyone know of a plug-in or mechanism to log local disk I/O on windows?
 
  My nagios server is currently using check_nt to connect to windows hosts
  via nsclient++. I was hoping perhaps COUNTER has something buried
  within it to pull down this info.
 

 There are indeed counters for that, but due to Microsoft's stupidity the
 counter-names are different depending on which base-language you've
 used for your windows servers.

 I don't know what they're named for english platforms (or any other
 for that matter), but you should be able to view them with that thing
 you can pop up when pressing ctrl-alt-del (task manager or whatever it's
 called).

 --
 Andreas Ericsson                   andreas.erics...@op5.se
 OP5 AB                             www.op5.se
 Tel: +46 8-230225                  Fax: +46 8-230231

 Considering the successes of the wars on alcohol, poverty, drugs and
 terror, I think we should give some serious thought to declaring war
 on peace.


 --
 Crystal Reports - New Free Runtime and 30 Day Trial
 Check out the new simplified licensing option that enables unlimited
 royalty-free distribution of the report engine for externally facing
 server and web deployment.
 http://p.sf.net/sfu/businessobjects
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


 --
 Crystal Reports - New Free Runtime and 30 Day Trial
 Check out the new simplified licensing option that enables unlimited
 royalty-free distribution of the report engine for externally facing
 server and web deployment.
 http://p.sf.net/sfu/businessobjects
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Error while configuring NRPE on solaris

2009-06-09 Thread N Patil
This problem is still unresolvable. I have tried all possible situations 
but no use. I can still see error in dmesg when i run  
/usr/local/nagios/libexec/check_nrpe -H localhost
CHECK_NRPE: Error - Could not complete SSL handshake.

Error i see when i run dmesg,
 svc:/network/nrpe/tcp:default (chdir: No such file or directory)
Jun 10 10:15:53 unknown inetd[7268]: [ID 702911 daemon.error] Failed to 
set credentials for the inetd_start method of instance 
svc:/network/nrpe/tcp:default (chdir: No such file or directory)
Jun 10 10:15:59 unknown inetd[7276]: [ID 702911 daemon.error] Failed to 
set credentials for the inetd_start method of instance 
svc:/network/nrpe/tcp:default (chdir: No such file or directory)

I am using SunOS 5.10 Generic_120012-14 i86pc i386 i86pc

Thanks,
Nilesh




Luc I. Suryo l...@suryo.com 
05/29/2009 10:07 PM
Please respond to
Luc I. Suryo l...@suryo.com


To
Eric Pearce epea...@amberpoint.com
cc
N Patil n.pa...@lntinfotech.com, Nagios Users Mailinglist 
nagios-users@lists.sourceforge.net
Subject
Re: [Nagios-users] Error while configuring NRPE on solaris






fyi

I have been using nagios and nrpe 9-10 years now; sparc and x86, started 
back with solaris 7
and now soalris 10, zero error mix solaris, aix, hpux, linux.
The server has always been Solaris (sparc or x86), use inetd/xinetd/deamon 
mode
again zero error

The one 'problem' i have seen people complain about is ssl and nrpe, read 
the manual and it should pretty
clear what todo, 99.9% is almost the use not having doing some RTFM thingy 
:)
The other one is tcp-wrapper and nrpe, nrpe has a access control buildt-in
so I never understood one would need to use tcp-wrapper :)

-ls


 
 From: N Patil
 To: Eric Pearce
 Cc: Nagios Users Mailinglist
 Sent: Thursday, May 28, 2009 9:13 PM
 Subject: Re: [Nagios-users] Error while configuring NRPE on solaris
 
 Thanks Eric,
 I have followed the same article but it dint help. This problem is 
 something which occured at the end, i mean while testing connectivity.
 
 Thanks,
 Nilesh
 
 May 28 19:15:27 solaris10.remotehost.com inetd[24241]: [ID 702911 
 daemon.error] Failed to set credentials for the inetd_start method of 
  instance svc:/network/nrpe/tcp:default (chdir: No such file or 
directory)
 
 I'm just guessing, but do you have a home directory for the nagios user 
 (with owner and group set to nagios)?
 The chdir error might come from this.
 -e

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers  brand creativity professionals. 
Meet
the minds behind Google Creative Lab, Visual Complexity, Processing,  
iPhoneDevCamp as they present alongside digital heavyweights like 
Barbarian 
Group, R/GA,  Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when 
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

__



__--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null