Re: [Nagios-users] single email alert to multiple contacts?

2010-08-20 Thread Charlie Reddington

On Aug 20, 2010, at 10:23 AM, Scott Nottingham wrote:

 Does anyone know how (or if it is even possible) to configure nagios  
 to send a single email to all contacts associated with the host/ 
 service/etc as opposed to a separate email to each contact?

 The problem I'm facing is with emailing distribution lists.  If both  
 distribution_list_A and B contain user_A, said user ends up getting  
 2 email for the same event.  If nagios could be configured to send a  
 single email to both distribution lists, our exchange server would  
 recognize that user_A is a member of both lists and send only 1  
 email to him.

 Thanks in advance for any insight you can provide!

Think of your exhange servers mailing lists as buckets. Bucket A is  
list A with user A in it. Bucket B is list B with user A in it.

Each bucket is going to get an email, and that email is going to get  
copied to it's users.

I don't think this way is going to be possible, unless you make  
another group, and put your groups in there. But I will bet that user  
a still gets 2 emails. But I can't say for certain, since it's been  
about 5 years since I used a exchange server.

I would probably pull user a out, and let him get contacted separately  
with nagios, instead of depending on a group list if it's a big deal.  
The down side is this doesn't scale very well.

Charlie

--
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Passive freshness checks - active checks

2010-08-13 Thread Charlie Reddington
Hey All,

 I'll check out the stats and turn on debugging next to see if there  
 is
 anything there. In the mean time, what version of nagios are you  
 running?

 Nagios Core 3.2.1

This seems to be the problem right here. I upgraded to nagios 3.2.1  
from 3.2.0 and nagios now honors my thresholds properly. I looked at  
the change log and didn't see this listed as a fix, but maybe I'm just  
blind.

Either way, this is the fix (upgrading) for those that follow in my  
foot steps.

Thanks,

Charlie

--
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Passive freshness checks - active checks

2010-08-09 Thread Charlie Reddington

I can't see any problem with the config below.  If you have dozens of
checks set up this way and they are all set up in crontab to run at
*/15 then you will get a storm of checks at each 15 minute intervals.
I normally make sure I stagger the checks in cron so that they are
reasonably evenly spaced.  If you have thousands it might also be
worth introducing a small random sleep to spread them out even more.

I've not had any problems with it myself, but if you have a very busy
system, you might need to check that the command buffers aren't
filling (run /usr/local/nagios/bin/nagiosstats to list the current
Nagios statistics).

Check the logs from nsca too.  If I recall correctly you may need to
set debug=1 in nsca.cfg for a while to get enough information.  One
problem I sometimes see occurs when the clock on the sending server is
way out of sync with the clock on the Nagios server, nsca will
complain and not process the check.  See this section in the nsca.cfg
file:

 # MAX PACKET AGE OPTION
 # This option is used by the nsca daemon to determine when client
 # data is too old to be valid.  Keeping this value as small as
 # possible is recommended, as it helps prevent the possibility of
 # replay attacks.  This value needs to be at least as long as
 # the time it takes your clients to send their data to the server.
 # Values are in seconds.  The max packet age cannot exceed 15
 # minutes (900 seconds).  If this variable is set to zero (0), no
 # packets will be rejected based on their age.

 max_packet_age=30

If I recall, I increased this from some smaller value to make it more
forgiving of systems which are a bit out of sync.


I hope that's pointed you in the right direction.

Cheers,

Jim


Hey Jim,

Thanks for the info,

I have increased the time offset to be a minute or two. But all our  
systems should be close as we use NTP to keep them in sync, and nagios  
currently does active checks on this one to make sure things are happy.


I'll check out the stats and turn on debugging next to see if there is  
anything there. In the mean time, what version of nagios are you  
running?


Thanks,

Charlie

--
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev ___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Passive freshness checks - active checks

2010-08-06 Thread Charlie Reddington
Hi All,

I'm having a bit of a problem with my nagios setup. I'm trying to move  
toward passive checks, with failover being a active check. For now, my  
failover check command is just a one liner that returns critical with  
a message.

I'm it's looking like the active check is being run, even when I see  
the corresponding passive check coming in. I suspect it may be in my  
configs somewhere, but I'm not sure what is wrong yet.

The big kicker of this, is it's not all of my checks. Only some of  
them. They all have different freshness thresholds, but that doesn't  
seem to be common. Their configs are the same, but in a different  
order, and that doesn't seem like the problem either as it's affecting  
some of one, and not of the other.

Any thoughts of what I may be doing wrong?

Charlie

---


Nagios Version: 3.2.0

I have a service template definition that looks like this.
define service{
 namepassive-service
 check_freshness 1
 active_checks_enabled   0
 passive_checks_enabled  1
 parallelize_check   1
 obsess_over_service 0
 notifications_enabled   0
 event_handler_enabled   0
 flap_detection_enabled  0
 failure_prediction_enabled  0
 process_perf_data   1
 retain_status_information   1
 retain_nonstatus_information1
 is_volatile 0
 check_period24x7
 max_check_attempts  1
 contact_groups  admins
 notification_optionsw,c,r
 notification_interval   60
 notification_period 24x7
 register0
 }

And then I have a services defined like so.
# Free Memory Check
define service{
 use passive-service
 service_description Passive Memory Check
 check_command   check_stale
 hostgroups  passive
 freshness_threshold 3600
 }

My active checks are defined with.
# alert on staledefine command{command_name 
check_stale
 command_line$USER1$/check_dummy 2 Check is  
stale, please run manually
 }

On my host, I use cron jobs to run things like this. I use  
nsca_wrapper to send my check results to the central nagios server.
# Check Free Memory
*/15 * * * * root /usr/local/nagios/libexec/nsca_wrapper.sh -H  
server.name -S 'Passive Memory Check' -C '/usr/local/nagios/libexec/ 
check_memory -w 10 -c 5'   /dev/null



--
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] setting up nagios to monitor other systems load, mem, disk

2010-08-06 Thread Charlie Reddington


On Aug 6, 2010, at 10:59 AM, Dombrowski, Neil wrote:

I have nagios 3.2.1 installed on a RH5.5 box, and it is monitoring  
ssh on


client systems (and host check/ping). I now want to be able to check  
disk


capacity, cpu load, etc., on other systems(clients). It's not clear  
to me


how to do this in the documentation. Do I need to use check_by_ssh  
or nrpe?


You can do either. I have been using check_by_ssh because I didn't  
want to open a new port on my client machines, and they are all  
running sshd on them. But this does not scale very well. We have a 2  
core server, and about 300 hosts, and 1500 checks, and it's loaded the  
host down pretty bad.


If I was to stay doing active checks, I would do NRPE as I have done  
in the past. It scales much better.
Is there a way to package up part of the nagios install and  
distribute it to


all systems I want to monitor?

You'll want to get the nrpe plugin along with the plugin checks. We  
usually just push the files to the systems, compile, install, and then  
put our configs in place. A smart bash script can automate most of the  
install for you.

I would much appreciate it if someone could

send me a link to the right document for this.

The basics you'll need are to define another host. Define services.  
Define commands for those services. Add the host to those services. On  
the client, install and configure nrpe. Install the plugins you want  
to use. And then open up firewalls for these new services.


Sorry I can't find the link I used to use when I first started out  
with nagios.





Thanks,

Neil




--
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev  
___

Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when  
reporting any issue.

::: Messages without supporting info will risk being sent to /dev/null


--
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev ___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Send Only one notification...

2009-07-22 Thread Charlie Reddington

Hey Luis, whats up man!

Couldn't you just create a separate contact, and set it up with  
escalations, but only have it alert once? Maybe something like this...  
(note, untested).


define hostescalation {
host_name   *
first_notification  1
last_notification   1
notification_interval   60
contact_groups  new_contact
}

Charlie

On Jul 22, 2009, at 8:47 AM, Luis Fernando Lacayo wrote:


Good Day everyone,

I was wondering if someone is kind enough to help me with a little  
thing I am stuck with.


I need to create a helpdesk ticket from NAGIOS from certain  
devices.  I am not sure of how to send only one email to the  
helpdesk but multiple emails to admins of these devices.


any guidance and /or example would be greatly appreciated.

thanks.

Luis
--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when  
reporting any issue.

::: Messages without supporting info will risk being sent to /dev/null


--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] NRPE vs. check_by_ssh

2009-03-26 Thread Charlie Reddington

On Mar 26, 2009, at 11:05 AM, Kevin Keane wrote:

 Andreas Ericsson wrote:
 Kevin Keane wrote:
 Christopher McAtackney wrote:
 2009/3/25 Kevin Keane subscript...@kkeane.com:

 I think you are comparing apples and oranges here, because in most
 situations that I can think of, the decision is dictated by the
 network
 topology. If you are exclusively on a trusted private network,
 check_by_ssh really doesn't offer any benefits. Conversely, if  
 your
 topology involves the Internet or some other untrusted network  
 (WiFi),
 then you wouldn't want NRPE in the first place.

 The only exception to the above that I can think of is when it
 comes to
 deciding between using check_by_ssh over an untrusted network, vs.
 NRPE
 through some other kind of tunnel or VPN. But in that case, you'd
 incur
 encryption overhead either way, and the comparison is very  
 different
 from the question you asked.

 All that said: I don't have any first-hand experience, but I  
 suspect
 that the impact of establishing 2200 ssh connections in a five- 
 minute
 span (assuming that you are using a five-minute check interval) is
 pretty substantial. The main impact actually lies in  
 establishing and
 tearing down the connections, key negotiations etc.; the  
 encryption
 during the data phase probably has only limited impact because  
 most
 checks only transmit a few bytes back and forth.

 SSH does much better with longer-duration connections when the  
 keys
 are
 already exchanged. This is even more true if you have a router- 
 based
 VPN, because in that case the overhead is offloaded to a different
 machine.

 So if you have the option of sending the checks as NRPE through  
 one
 or a
 few long-term VPNs: you are probably going to be better off. Of
 course,
 in the big picture, your mileage may vary.

 Firstly, thanks for the detailed explanation of the issues  
 involved in
 this choice Kevin, it's been very helpful.

 I'm curious though, could you elaborate on why NRPE is unsuitable  
 if
 communication with my remote hosts is going to go via the  
 Internet? Is
 it not sufficient that NRPE uses SSL? This may be more of a network
 security question than a Nagios one, but I've no real experience in
 either area unfortunately, so I appreciate any info you can give  
 here.

 No, you are right. I wasn't aware that NRPE could use SSL. In that
 case, NRPE would be pretty much the same in terms of performance  
 as SSL.

 That said, I am generally concerned from a security standpoint about
 any kind of active checks going over the Internet. This is because  
 if
 you are monitoring, in your example, 200 hosts, you have to poke
 holes into 200 firewalls (or into one firewall, and then set up SSL
 or SSH keys on 200 hosts). That's 200 potential security holes all
 over the place with little or no control, and on machines that may
 not necessarily be hardened for access from the outside world. Worse
 - active checks, by nature, cause a program to be launched and
 executed on the monitored client, and usually with very high
 permissions. You said that you check 2000 services, so that's 2000
 plugins (give or take a few). What if a hacker found a way to
 compromise one of your 2000 plugins? You'd have a privilege
 escalation issue along with remote-launch capability. On 200  
 clients.


 Very high permissions are normally not needed.
 Depends on the plugin, but I'm not sure that this is generally true.  
 For
 instance, something as simple as log file analysis either requires  
 root
 permission on Linux; log files aren't readable by anybody else, or it
 requires that you relax file permissions or security somewhere else.  
 On
 Windows, I'm running my monitoring agent (by default) as the Local
 System account (most Windows services do that anyway). That has
 basically full access to everything, but nothing on the network.

My nagios user only checks basic system stuff, and I haven't run into  
a permission error situation yet, and I check the following by default  
- load, users, disk, swap, memory, processes, databases, raid.



 Of course check_ping, check_tcp etc. don't usually need such high
 permissions.
 I prefer using NRPE because
 of two reasons:
 1. It provides a rather simple way of specifying exactly which  
 commands
  can be run, and with which arguments (don't enable argument parsing
  in nrpe if the receiving end isn't duly protected by firewalls etc)
 2. If someone breaks into the Nagios server, he or she does not get  
 the
  public keys required for running commands on the remote servers.
 Can you explain that second statement? I'm not sure I follow what you
 are trying to say here. Why would getting public keys be a bad thing?
 They are, by definition, freely available anyway.

What you CAN do, though it's kind of a p.i.t.a is, is have a key per  
command. So if you have something like check_disk, you can put a  
single key for just that command. On all the servers you roll this out  
to, you can secure it up 

Re: [Nagios-users] NRPE vs. check_by_ssh

2009-03-25 Thread Charlie Reddington

On Mar 25, 2009, at 2:30 PM, RijilV wrote:

 2009/3/24 Christopher McAtackney crist...@gmail.com:
 Hi all,

 I was wondering if someone could give a brief overview of the pros /
 cons of using NRPE to monitor my remote hosts versus using the
 check_by_ssh command?

 I'm aware that check_by_ssh increases the CPU overhead, but I'm not
 clear on the level of impact here - does this increase the load on  
 the
 monitoring machine in direction relation to the number of hosts being
 monitored? For example, if I was using check_by_ssh to monitor, say,
 2000 services spread across 200 hosts, would I experience significant
 slowdown on my monitoring machine?

 Cheers for any info,

 Chris



 SSH is going to slow it down on both sides of the communication.  SSH
 does quite a bit more in terms of setting up the connection which
 involves using asymmetric encryption to setup a shared secret for
 symmetric encryption and verifying keys for the asymmetric part,
 verifying access, allocating a session.  Whereas NRPE even with
 encryption just does a simple pre-shared secret for the symmetric
 encryption, much faster even if using the same encryption algorithm


 One thing you could do with SSH to speed it up (and I would argue make
 it faster than NRPE depending on the stability of your network)) would
 be to use ControlMaster.  ControlMaster is a SSH v2 feature, where you
 create a connection and can open up multiple sessions with that
 ControlMaster for other SSH processes.  This saves you not only the
 key-exchange heavy lifting but also you're not opening up a new socket
 on the remote host.  In order to really make it worth it you'd have to
 spawn a process that was continuously connected.  I wrote an ugly
 check_by_ssh that would spawn a ControlMaster if one didn't exist and
 use it if it did.  Reduced the load/latency quite a bit for SSH
 checks.  Though if I had to do it again I'd used 'ControlMaster auto'
 (man 5 ssh_config) and create a separate check that was responsible
 for maintaining the ControlMaster, then you could use the stock
 check_by_ssh without any modifications.


 That all being said, you might want to think about a distributed setup
 anyhow, if nothing more for redundancy.  200 servers and 2,000 checks
 is alot of responsibility for a singleton, you could break it 50/50
 between two servers that could take over for the other one if it
 fails.


 .r'

 --
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when  
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

+1 on the control master. We have about 1000 checks over 300 hosts and  
using control master made the box much more stable and quite frankly  
usable. Saved a lot of plug in time outs as well.

Think about 1000 checks every 5 or 10 minutes. That's 1000 encrypted  
tunnels that are going up and down. That's a lot of overhead for a  
quick check, let along if your server is checking say 5 or 10 things  
back to back.

http://www.torchbox.com/blog/ssh_tips_2.html

Charlie

--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios and email

2009-03-23 Thread Charlie Reddington
You have checked your /var/spool/mqueue and /var/spool/clientmqueue  
right?

I have had a few million emails queued up there before.

Charlie
On Mar 23, 2009, at 9:22 PM, Peter Doherty wrote:

 Hello,

 I have a kind of custom nagios setup, so maybe this is a byproduct of
 that...
 I had to reboot my nagios server today, and it didn't come right back
 up.  By the time it did, it realized that the service checks weren't
 fresh, and started sending out lots of notifications.  I stopped
 sendmail to keep from flooding my inbox...so here's the question:

 I just want to clear out the outgoing email queue.  mailq and sendmail
 -bp both show nothing queued up.  When I restart sendmail, it starts
 sending again.
 Has nagios passed all the emails over to sendmail, and I just need to
 clear out sendmail's queue, or is nagios holding onto them while
 sendmail isn't running, and then once it sees sendmail running, it
 starts dumping email into the queue?

 Which is it, and how do I clear them from the queue?

 Thank you.
 --Peter

 --
 Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM)  
 are
 powering Web 2.0 with engaging, cross-platform capabilities. Quickly  
 and
 easily build your RIAs with Flex Builder, the Eclipse(TM)based  
 development
 software that enables intelligent coding and step-through debugging.
 Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when  
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


--
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Cannot disable individual servers / hosts

2008-11-22 Thread Charlie Reddington
Hi,

I'm running Nagios 3.0.5.

I currently have about 5 different clients on my system for different  
companies. So users are only added to see their own servers, except  
for my user, which can see and do it all.

I CAN disable all notifications for the entire system. Process Info -  
Disable Notifications

I CANNOT disable notifications for a particular service. Service  
Detail - service check - Disable Notifications

I have a user that can access everything - 'nagiosadmin' that is added  
to the cgi.cfg file and even this user cannot do individual  
notification disabling.

When I go through the disable process, I see this in my event log

External Command[11-22-2008 08:14:41] EXTERNAL COMMAND:  
DISABLE_SVC_NOTIFICATIONS;Tenant602E;PING

But I don't get the icon, and I still end up getting alerts.

Here's the last two lines of my nagios.log file

[1227362376] Auto-save of retention data completed successfully.
[1227363281] EXTERNAL COMMAND:  
DISABLE_SVC_NOTIFICATIONS;Tenant602E;PING

I'm just not getting where things are going wrong here. If you guys  
need any of my configs to help better troubleshoot this with me let me  
know and I'll paste them up.

Thanks,

Charlie




-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Cannot disable individual servers / hosts

2008-11-22 Thread Charlie Reddington
Okay, just found out something.

I can disable services or hosts, but not both at the same time.

For example, if I go to disable all service notifications and click  
the box, 'and hosts too', nothing gets disabled.

If I disable JUST the service it will disable

If I disable JUST the host, it will disable.

Weird.

Charlie

On Nov 22, 2008, at 8:19 AM, Charlie Reddington wrote:

 Hi,

 I'm running Nagios 3.0.5.

 I currently have about 5 different clients on my system for  
 different companies. So users are only added to see their own  
 servers, except for my user, which can see and do it all.

 I CAN disable all notifications for the entire system. Process Info - 
  Disable Notifications

 I CANNOT disable notifications for a particular service. Service  
 Detail - service check - Disable Notifications

 I have a user that can access everything - 'nagiosadmin' that is  
 added to the cgi.cfg file and even this user cannot do individual  
 notification disabling.

 When I go through the disable process, I see this in my event log

   External Command[11-22-2008 08:14:41] EXTERNAL COMMAND:  
 DISABLE_SVC_NOTIFICATIONS;Tenant602E;PING

 But I don't get the icon, and I still end up getting alerts.

 Here's the last two lines of my nagios.log file

   [1227362376] Auto-save of retention data completed successfully.
   [1227363281] EXTERNAL COMMAND:  
 DISABLE_SVC_NOTIFICATIONS;Tenant602E;PING

 I'm just not getting where things are going wrong here. If you guys  
 need any of my configs to help better troubleshoot this with me let  
 me know and I'll paste them up.

 Thanks,

 Charlie





-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Confused on how this is working with out nrpe...

2008-11-11 Thread Charlie Reddington
Hi,

I'm running the latest version of nagios - 3.0.5.

On my remote hosts, I'm running NRPE.

I haven't used nrpe with nagios since 2.9, so I'm wondering did I miss  
how things work now that we are 3.0.

I have a bunch of checks - disk, load, users, etc, but I'm not putting  
it through nrpe, yet it's returning info.

Do I not need nrpe any more? Or namely do I not need to do things like.

check_nrpe!check_disk

Thanks for clueing me in.

Charlie

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Confused on how this is working with out nrpe...

2008-11-11 Thread Charlie Reddington
That's the thing, I don't have nrpe installed around. Which is where  
my confusion is.


On Nov 11, 2008, at 12:20 PM, Sean McAfee wrote:

 Charlie Reddington wrote:
 Sure thing. I think I am missing something as it's not working how I
 remembered it working.


 ---

 define service{
 use generic-service ;
 Name of service template to use
 host_name   master,prodws01,prodws02
 service_description Root Partition
  check_command   check_local_disk!20%!10%!/
 }

 Now looking at this, I'm able to get successful checks, with out  
 using
 nrpe on the host server. So my question comes back to, how is this
 working when I thought you had to define commands like  this

 define service{
 use generic-service
 host_name   master,prodws01,prodws02
 service_description Current Load
  check_command   
 check_nrpe!check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
 }
 You're missing See the check_local_* names in the service
 definitions?  Somewhere you've defined something like:

 define command{
 command_namecheck_local_disk
 command_line$USER1$/check_nrpe2 -H $HOSTNAME$ -c  
 check_disk -a $ARG1$ $ARG2$
 }

 That is how NRPE is getting invoked.  Look closer at your templates.

 -- 
 Sean McAfee
 System Engineer

 Collaborative Fusion, Inc.
 [EMAIL PROTECTED]
 412-422-3463 x 4025

 5849 Forbes Avenue
 Pittsburgh, PA 15217

 
 IMPORTANT: This message contains confidential information
 and is intended only for the individual named. If the reader of
 this message is not an intended recipient (or the individual
 responsible for the delivery of this message to an intended
 recipient), please be advised that any re-use, dissemination,
 distribution or copying of this message is prohibited. Please
 notify the sender immediately by e-mail if you have received
 this e-mail by mistake and delete this e-mail from your system.
 E-mail transmission cannot be guaranteed to be secure or
 error-free as information could be intercepted, corrupted, lost,
 destroyed, arrive late or incomplete, or contain viruses. The
 sender therefore does not accept liability for any errors or
 omissions in the contents of this message, which arise as a
 result of e-mail transmission.
 




 IMPORTANT: This message contains confidential information and is  
 intended only for the individual named. If the reader of this  
 message is not an intended recipient (or the individual responsible  
 for the delivery of this message to an intended recipient), please  
 be advised that any re-use, dissemination, distribution or copying  
 of this message is prohibited. Please notify the sender immediately  
 by e-mail if you have received this e-mail by mistake and delete  
 this e-mail from your system.



 -
 This SF.Net email is sponsored by the Moblin Your Move Developer's  
 challenge
 Build the coolest Linux based applications with Moblin SDK  win  
 great prizes
 Grand prize is a trip for two to an Open Source event anywhere in  
 the world
 http://moblin-contest.org/redirect.php?banner_id=100url=/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when  
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Confused on how this is working with out nrpe...

2008-11-11 Thread Charlie Reddington
Sure thing. I think I am missing something as it's not working how I  
remembered it working.

On my remote host, my config file.

Alloweed_hosts=my.server.com

# The following examples use hardcoded command arguments...
command[check_smtp]=/usr/local/nagios/libexec/check_smtp -t20 -w 10 -c  
20
command[check_ftp]=/usr/local/nagios/libexec/check_ftp -w 10 -c 20
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c  
30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10%  
-p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5  
-c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w  
200 -c 250
command[check_mysql]=/usr/local/nagios/libexec/check_mysql -u user -p  
password
command[check_local_disk]=/usr/local/nagios/libexec/check_disk -w 20 - 
c 10 -u MB
command[check_load]=/usr/local/nagios/libexec/check_load -w  
5.0,4.0,3.0 -c 10.0,6.0,4.0
# The following examples allow user-supplied arguments and can
# only be used if the NRPE daemon was compiled with support for
# command arguments *AND* the dont_blame_nrpe directive in this
# config file is set to '1'.  This poses a potential security risk, so
# make sure you read the SECURITY file before doing this.

#command[check_users]=/usr/local/nagios/libexec/check_users -w $ARG1$ - 
c $ARG2$
#command[check_load]=/usr/local/nagios/libexec/check_load -w $ARG1$ -c  
$ARG2$
command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c  
$ARG2$ -p $ARG3$
command[check_procs]=/usr/local/nagios/libexec/check_procs -w $ARG1$ - 
c $ARG2$ -s $ARG3$

And on my host system, I have some checks and definiations as such.

# 'check_ftp' command definition
define command{
 command_namecheck_ftp
 command_line$USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$
 }


# 'check_hpjd' command definition
define command{
 command_namecheck_hpjd
 command_line$USER1$/check_hpjd -H $HOSTADDRESS$ $ARG1$
 }


# 'check_snmp' command definition
define command{
 command_namecheck_snmp
 command_line$USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
 }


# 'check_http' command definition
define command{
 command_namecheck_http
 command_line$USER1$/check_http -I $HOSTADDRESS$ $ARG1$
 }




---

define service{
 use generic-service ;  
Name of service template to use
 host_name   master
 service_description PING
check_command   check_ping!100.0,20%!500.0,60%
 }

define service{
 use generic-service ;  
Name of service template to use
 host_name   master,prodws01,prodws02
 service_description Root Partition
check_command   check_local_disk!20%!10%!/
 }

define service{
 use generic-service ;  
Name of service template to use
 host_name   master,prodws01,prodws02
 service_description Current Users
check_command   check_local_users!20!50
 }

define service{
 use generic-service ;  
Name of service template to use
 host_name   master,prodws01,prodws02
 service_description Total Processes
check_command   check_local_procs!250!400!RSZDT
 }

define service{
 use generic-service ;  
Name of service template to use
 host_name   master,prodws01,prodws02
 service_description Current Load
check_command   
check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
 }


Now looking at this, I'm able to get successful checks, with out using  
nrpe on the host server. So my question comes back to, how is this  
working when I thought you had to define commands like  this

define service{
 use generic-service
 host_name   master,prodws01,prodws02
 service_description Current Load
check_command   
check_nrpe!check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
 }


On Nov 11, 2008, at 11:58 AM, Aaron Segura wrote:

 This is not nearly enough information to offer any sort of help other
 than You're obviously misunderstanding or misstating something.
 Please include (at the very least) some relevant configs.

 -Original Message-
 From: Charlie Reddington [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, November 11, 2008 10:49 AM
 To: Nagios User list
 Subject: [Nagios-users] Confused on how this is working with out  
 nrpe

[Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Charlie Reddington
Hi,

I have a couple machines that spit out a warning similar to this:

WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/ 
check_disk' returned status 1

I believe this to be caused by the check itself is timing out. As when  
I try to login it will sometimes take up to a minute or two just to  
get a prompt.

The server will respond to ping, so I'm generally not totally  
concerned about it. And the checks usually clear up in 5 minutes or  
soon as the server gets whatever IO hog out of the way.

Is anyone else experiencing this, and if so how do you cope / deal  
with this?

Thanks,

Charlie

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Charlie Reddington
I should also  mention that I also have these timeouts in place...

service_check_timeout=90
host_check_timeout=30
event_handler_timeout=30
notification_timeout=60
ocsp_timeout=5
perfdata_timeout=5

Charlie

On Oct 6, 2008, at 10:35 AM, Charlie Reddington wrote:

 Hi,

 I have a couple machines that spit out a warning similar to this:

 WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/ 
 check_disk' returned status 1

 I believe this to be caused by the check itself is timing out. As  
 when I try to login it will sometimes take up to a minute or two  
 just to get a prompt.

 The server will respond to ping, so I'm generally not totally  
 concerned about it. And the checks usually clear up in 5 minutes or  
 soon as the server gets whatever IO hog out of the way.

 Is anyone else experiencing this, and if so how do you cope / deal  
 with this?

 Thanks,

 Charlie


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Charlie Reddington
Sorry, forgot the mail list

I'm using no ldap, but with DSA keys.

On Oct 6, 2008, at 10:58 AM, Matt Rivet wrote:

 Are you using a LDAP server and RSA keys?

 -Original Message-
 From: Charlie Reddington [mailto:[EMAIL PROTECTED]
 Sent: Monday, October 06, 2008 11:35 AM
 To: Nagios User list
 Subject: [Nagios-users] check_by_ssh timeouts / how to work around?

 Hi,

 I have a couple machines that spit out a warning similar to this:

 WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/
 check_disk' returned status 1

 I believe this to be caused by the check itself is timing out. As when
 I try to login it will sometimes take up to a minute or two just to
 get a prompt.

 The server will respond to ping, so I'm generally not totally
 concerned about it. And the checks usually clear up in 5 minutes or
 soon as the server gets whatever IO hog out of the way.

 Is anyone else experiencing this, and if so how do you cope / deal
 with this?

 Thanks,

 Charlie

 
 -
 This SF.Net email is sponsored by the Moblin Your Move Developer's
 challenge
 Build the coolest Linux based applications with Moblin SDK  win great
 prizes
 Grand prize is a trip for two to an Open Source event anywhere in the
 world
 http://moblin-contest.org/redirect.php?banner_id=100url=/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Charlie Reddington

On Oct 6, 2008, at 11:03 AM, James wrote:

 On Mon, October 6, 2008 11:37 am, Charlie Reddington wrote:
 I should also  mention that I also have these timeouts in place...


 service_check_timeout=90 host_check_timeout=30  
 event_handler_timeout=30
 notification_timeout=60 ocsp_timeout=5 perfdata_timeout=5

 Charlie


 On Oct 6, 2008, at 10:35 AM, Charlie Reddington wrote:


 Hi,


 I have a couple machines that spit out a warning similar to this:


 WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/
 check_disk' returned status 1

 I believe this to be caused by the check itself is timing out. As
 when I try to login it will sometimes take up to a minute or two  
 just to
 get a prompt.

 The server will respond to ping, so I'm generally not totally
 concerned about it. And the checks usually clear up in 5 minutes  
 or soon
 as the server gets whatever IO hog out of the way.

 Is anyone else experiencing this, and if so how do you cope / deal
 with this?

 Thanks,


 Charlie

 The timeouts in nagios.cfg are ow long the nagios process waits before
 aborting a check.
 There are usually check specific timeouts that you can add to the  
 command
 definition.
 Run the check_* command manually and see what the syntax is  
 (sometimes '-t
 xx').


I thought I had did that already , and just put the --timeout option  
on the check_by_ssh, but I guess not. I added the timeout, from 30   
to  60.  We'll see how it goes.

Charlie

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nagios on call schedule w/ escalations?

2008-10-04 Thread Charlie Reddington
Jon thanks. I got things figured out.

I setup 2 sets of contacts with the same users. One was just for the  
regular contact. I setup this group of 'admins' so they are only  
contacted on their oncall schedule.

I then just did nearly exactly as you wrote and made a totally  
seperate set of contacts, that can be contacted 24x7.

I have 2 groups. Admins and Escalations.

Escalations use the second set of 24x7 contacts, and the Admins  
contacts uses the oncall schedule.

Inheritance wasn't really necessary, just the separate groups.

Oh and I made a separate contact template that used the proper contact  
time period.

Thanks again, works perfect.

charlie

On Oct 2, 2008, at 3:13 AM, Jon Angliss wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On Tue, 30 Sep 2008 16:22:16 -0500, Charlie Reddington
 [EMAIL PROTECTED] wrote:

 Hi guys / gals,

 I am working on the final stages of my nagios setup, but I'm entering
 territory which I haven't been before and can use some guidance.

 I'm sure you've probably taken a peek at the On Call Rotations
 details in the documentation:

  http://nagios.sourceforge.net/docs/3_0/oncallrotation.html

 There are plenty of examples to get a good idea.

 Here's what I'm trying to achieve. We have a team of 3 admins, where
 we rotate weeks who is on call. Of course, they aren't every other  
 3rd
 week , because of people having vacation time, etc. So some weeks
 people are on call for 2 weeks, or every 2 weeks, etc.

 What we'd like is, to have a schedule setup where the primary guy  
 gets
 woken up first. But if he doesn't answer his call after an hour, it
 drops down to the rest of us admins. No matter if your just at home
 sleeping, or if your on vacation, you get pinged. After that it goes
 up to our manager.

 I can figure out the setting of people's initial schedule, as I have
 it looking something like this

 # contacts

 define contact{
contact_nameuser1
use generic-contact
alias   user1
email   user1
host_notification_perioduser1_oncall
service_notfication_period  user1_oncall
}

 define contact{
contact_nameuser2
use generic-contact
alias   user2
email   user2
host_notification_perioduser2_oncall
service_notfication_period user2_oncall
}

 define contact{
contact_nameuser3
use generic-contact
alias   user3
email   user3
host_notification_perioduser3_oncall
service_notfication_perioduser3_oncall
}
 define contact{
   contact_namemanager1
   usegeneric-contact
   emailmanager1
   }

 # groiups

 define contactgroup{
   contact_groupname admins
   members user1,user2,user3
 }
 define contactgroup{
   contact_groupname managers
   members manager1
 }

 # Time periods

 define timeperiod{
timeperiod_name user1_oncall
Sept 29 - Oct 5 00:00-24:00
Oct 20 - Oct 26 00:00-24:00
Nov 17 - Nov 23 00:00-24:00
Dec 1 - Dec 7 00:00-24:00
Dec 15 - Dec 21 00:00-24:00
 }

 define timeperiod{
timeperiod_name user2_oncall
Oct 6 - Oct 12 00:00-24:00
Nov 3 - Nov 9  00:00-24:00
Nov 24 - Nov 30 00:00-24:00
Dec 22 - Dec 23 00:00-24:00
 }

 define timeperiod{
timeperiod_name user3_oncall
Oct 13 - Oct 19 00:00-24:00
Oct 27 - Nov 2  00:00-24:00
Nov 10 - Nov 16 00:00-24:00
Dec 8 - Dec 14  00:00-24:00
 }

 Would / Does escalations trump the initial contacts?

 # First escalations
 define serviceescalation{
hostgroup_name  Servers
service_description *
first_notification  2
last_notification   3
notification_interval   30
contact_groups  admins
 }

 # Second escalations
 define serviceescalation{
hostgroup_name  Servers
service_description *
first_notification  3
last_notification   8
notification_interval   60
contact_groups  admins,managers
 }

 So I know this isn't quite right, as our admins are part of the admin
 group, but also trying to restrict when they get contacted. So I'm  
 not
 really sure how to proceed with this.

 You might want to read up on notifications, and serviceescalations,
 too... Looking at the time stuff you've got, what'll happen is at any
 one point, only 1 of the admins will be reachable by notifications at
 any time.  This is because the timeperiods stop nagios from sending
 notifications to a user that is outside their timeperiod.  For
 example, a host

[Nagios-users] nagios on call schedule w/ escalations?

2008-09-30 Thread Charlie Reddington
Hi guys / gals,

I am working on the final stages of my nagios setup, but I'm entering  
territory which I haven't been before and can use some guidance.

Here's what I'm trying to achieve. We have a team of 3 admins, where  
we rotate weeks who is on call. Of course, they aren't every other 3rd  
week , because of people having vacation time, etc. So some weeks  
people are on call for 2 weeks, or every 2 weeks, etc.

What we'd like is, to have a schedule setup where the primary guy gets  
woken up first. But if he doesn't answer his call after an hour, it  
drops down to the rest of us admins. No matter if your just at home  
sleeping, or if your on vacation, you get pinged. After that it goes  
up to our manager.

I can figure out the setting of people's initial schedule, as I have  
it looking something like this

# contacts

define contact{
 contact_nameuser1
 use generic-contact
 alias   user1
 email   user1
 host_notification_perioduser1_oncall
 service_notfication_period  user1_oncall
 }

define contact{
 contact_nameuser2  
 use generic-contact
 alias   user2
 email   user2
 host_notification_perioduser2_oncall
 service_notfication_period user2_oncall
 }

define contact{
 contact_nameuser3
 use generic-contact
 alias   user3
 email   user3
 host_notification_perioduser3_oncall
 service_notfication_perioduser3_oncall
 }
define contact{
contact_namemanager1
use generic-contact
email   manager1
}

# groiups

define contactgroup{
contact_groupname admins
members user1,user2,user3
}
define contactgroup{
contact_groupname managers
members manager1
}

# Time periods

define timeperiod{
 timeperiod_name user1_oncall
 Sept 29 - Oct 5 00:00-24:00
 Oct 20 - Oct 26 00:00-24:00
 Nov 17 - Nov 23 00:00-24:00
 Dec 1 - Dec 7 00:00-24:00
 Dec 15 - Dec 21 00:00-24:00
}

define timeperiod{
 timeperiod_name user2_oncall
 Oct 6 - Oct 12 00:00-24:00
 Nov 3 - Nov 9  00:00-24:00
 Nov 24 - Nov 30 00:00-24:00
 Dec 22 - Dec 23 00:00-24:00
}

define timeperiod{
 timeperiod_name user3_oncall
 Oct 13 - Oct 19 00:00-24:00
 Oct 27 - Nov 2  00:00-24:00
 Nov 10 - Nov 16 00:00-24:00
 Dec 8 - Dec 14  00:00-24:00
}

Would / Does escalations trump the initial contacts?

# First escalations
define serviceescalation{
 hostgroup_name  Servers
 service_description *
 first_notification  2
 last_notification   3
 notification_interval   30
 contact_groups  admins
}

# Second escalations
define serviceescalation{
 hostgroup_name  Servers
 service_description *
 first_notification  3
 last_notification   8
 notification_interval   60
 contact_groups  admins,managers
}

So I know this isn't quite right, as our admins are part of the admin  
group, but also trying to restrict when they get contacted. So I'm not  
really sure how to proceed with this.

Thanks for any advice.

Charlie



-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] check_by_ssh - getting errors in logs

2008-09-12 Thread Charlie Reddington
Hi all,

I have nagios working pretty good, it's checking all i want over ssh.  
But ever since I have set up nagios over ssh, I keep getting the  
following in my logs.

authpriv crit sshd[8939]: fatal: Read from socket failed: Connection  
reset by peer

I've checked the load / iowait of the servers in question and they all  
seem to be good. So I don't think they are loaded down when this  
happens.

I also checked the versions of ssh to see if it was a particular  
version of openssh complaining but it seems pretty wide spread across  
our versions which are openssh 3.9p1 - 4.5.

I also am using forced commands per host, and I added ' exit'  
thinking that maybe the connection wasn't exiting cleanly.

Anyone have any idea's what else I can do to eliminate these errors?  
They seem somewhat intermittent, but I'm getting notices about every  
20 minutes that a server is seeing this.

Thanks,

CHarlie

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] service dependency on host?

2008-09-11 Thread Charlie Reddington
Hi,

I have nagios up and working but I want to tweak it some so I'm not  
getting buried under sms messages.

My setup works like so.

I have a file called Loc-Servers.cfg

In this file it has the host definitions that look like this, but just  
a ton of them

# serv01
define host{
 use linux-server
 host_name   serv01.example.com
 alias   serv01
 address 192.168.1.101
 }

# serv01
define host{
 use linux-server
 host_name   serv02.example.com
 alias   serv02
 address 192.168.1.102
 }

And then after the hosts I have the services setup generally like this:

define service{
 use generic-service
 host_name   serv01.example.com,  
serv02.example.com
 service_description Ping
 check_command  check_ping!100.0,20%!500.0,60%
 }

My real question comes down to dependencies. As much as I love getting  
400 messages if something 'upstream' goes down like a switch, I  
generally want to try to get alerts only for hosts down and alerts for  
the first point of failure.

So assuming one of my networks look like this:

  Router -- Switch - serv01, serv02, serv03

Lets say the switch goes down. Which makes the servers all  
unreachable, which fails out all other servers. I don't want to have  
any notifications really for anything below the switch.

I've seen the docs about having services dependent on services, and  
hosts dependent on hosts.  But how about services, dependent on hosts.  
Do I just use hosts instead of services in the config?

Thanks for you time and for your help,

Charlie

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] service dependency on host?

2008-09-11 Thread Charlie Reddington
I thought I was done

There is no mention about the services for the affected hosts. Will  
they by default not send alerts, but only unreachables as well?

So if the hosts aren't sending notifications because the head switch  
is down, what about the sub sequent services?

Thanks,

Charlie

On Sep 11, 2008, at 2:15 PM, Goldschrafe, Jeffrey wrote:

 You don't want host or service dependencies, you want parent/child  
 relationships on the hosts.

 FAQ (old): http://www.nagios.org/faqs/viewfaq.php?faq_id=145
 Docs (current): 
 http://nagios.sourceforge.net/docs/3_0/networkreachability.html

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios- 
 users-
 [EMAIL PROTECTED] On Behalf Of Charlie Reddington
 Sent: Thursday, September 11, 2008 3:11 PM
 To: nagios-users@lists.sourceforge.net
 Subject: [Nagios-users] service dependency on host?

 Hi,

 I have nagios up and working but I want to tweak it some so I'm not
 getting buried under sms messages.

 My setup works like so.

 I have a file called Loc-Servers.cfg

 In this file it has the host definitions that look like this, but  
 just
 a ton of them

 # serv01
 define host{
 use linux-server
 host_name   serv01.example.com
 alias   serv01
 address 192.168.1.101
 }

 # serv01
 define host{
 use linux-server
 host_name   serv02.example.com
 alias   serv02
 address 192.168.1.102
 }

 And then after the hosts I have the services setup generally like  
 this:

 define service{
 use   generic-service
 host_name serv01.example.com,
 serv02.example.com
 service_description Ping
 check_command  check_ping!100.0,20%!500.0,60%
 }

 My real question comes down to dependencies. As much as I love  
 getting
 400 messages if something 'upstream' goes down like a switch, I
 generally want to try to get alerts only for hosts down and alerts  
 for
 the first point of failure.

 So assuming one of my networks look like this:

  Router -- Switch - serv01, serv02, serv03

 Lets say the switch goes down. Which makes the servers all
 unreachable, which fails out all other servers. I don't want to have
 any notifications really for anything below the switch.

 I've seen the docs about having services dependent on services, and
 hosts dependent on hosts.  But how about services, dependent on  
 hosts.
 Do I just use hosts instead of services in the config?

 Thanks for you time and for your help,

 Charlie

 ---
 --
 This SF.Net email is sponsored by the Moblin Your Move Developer's
 challenge
 Build the coolest Linux based applications with Moblin SDK  win  
 great
 prizes
 Grand prize is a trip for two to an Open Source event anywhere in the
 world
 http://moblin-contest.org/redirect.php?banner_id=100url=/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/ 
 null


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null