[Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Charlie Reddington
Hi,

I have a couple machines that spit out a warning similar to this:

WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/ 
check_disk' returned status 1

I believe this to be caused by the check itself is timing out. As when  
I try to login it will sometimes take up to a minute or two just to  
get a prompt.

The server will respond to ping, so I'm generally not totally  
concerned about it. And the checks usually clear up in 5 minutes or  
soon as the server gets whatever IO hog out of the way.

Is anyone else experiencing this, and if so how do you cope / deal  
with this?

Thanks,

Charlie

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Charlie Reddington
I should also  mention that I also have these timeouts in place...

service_check_timeout=90
host_check_timeout=30
event_handler_timeout=30
notification_timeout=60
ocsp_timeout=5
perfdata_timeout=5

Charlie

On Oct 6, 2008, at 10:35 AM, Charlie Reddington wrote:

 Hi,

 I have a couple machines that spit out a warning similar to this:

 WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/ 
 check_disk' returned status 1

 I believe this to be caused by the check itself is timing out. As  
 when I try to login it will sometimes take up to a minute or two  
 just to get a prompt.

 The server will respond to ping, so I'm generally not totally  
 concerned about it. And the checks usually clear up in 5 minutes or  
 soon as the server gets whatever IO hog out of the way.

 Is anyone else experiencing this, and if so how do you cope / deal  
 with this?

 Thanks,

 Charlie


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Matt Rivet
Are you using a LDAP server and RSA keys?

-Original Message-
From: Charlie Reddington [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 06, 2008 11:35 AM
To: Nagios User list
Subject: [Nagios-users] check_by_ssh timeouts / how to work around?

Hi,

I have a couple machines that spit out a warning similar to this:

WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/ 
check_disk' returned status 1

I believe this to be caused by the check itself is timing out. As when  
I try to login it will sometimes take up to a minute or two just to  
get a prompt.

The server will respond to ping, so I'm generally not totally  
concerned about it. And the checks usually clear up in 5 minutes or  
soon as the server gets whatever IO hog out of the way.

Is anyone else experiencing this, and if so how do you cope / deal  
with this?

Thanks,

Charlie


-
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK  win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the
world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread James
On Mon, October 6, 2008 11:37 am, Charlie Reddington wrote:
 I should also  mention that I also have these timeouts in place...


 service_check_timeout=90 host_check_timeout=30 event_handler_timeout=30
 notification_timeout=60 ocsp_timeout=5 perfdata_timeout=5

 Charlie


 On Oct 6, 2008, at 10:35 AM, Charlie Reddington wrote:


 Hi,


 I have a couple machines that spit out a warning similar to this:


 WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/
 check_disk' returned status 1

 I believe this to be caused by the check itself is timing out. As
 when I try to login it will sometimes take up to a minute or two just to
 get a prompt.

 The server will respond to ping, so I'm generally not totally
 concerned about it. And the checks usually clear up in 5 minutes or soon
 as the server gets whatever IO hog out of the way.

 Is anyone else experiencing this, and if so how do you cope / deal
 with this?

 Thanks,


 Charlie

The timeouts in nagios.cfg are ow long the nagios process waits before
aborting a check.
There are usually check specific timeouts that you can add to the command
definition.
Run the check_* command manually and see what the syntax is (sometimes '-t
xx').


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Charlie Reddington
Sorry, forgot the mail list

I'm using no ldap, but with DSA keys.

On Oct 6, 2008, at 10:58 AM, Matt Rivet wrote:

 Are you using a LDAP server and RSA keys?

 -Original Message-
 From: Charlie Reddington [mailto:[EMAIL PROTECTED]
 Sent: Monday, October 06, 2008 11:35 AM
 To: Nagios User list
 Subject: [Nagios-users] check_by_ssh timeouts / how to work around?

 Hi,

 I have a couple machines that spit out a warning similar to this:

 WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/
 check_disk' returned status 1

 I believe this to be caused by the check itself is timing out. As when
 I try to login it will sometimes take up to a minute or two just to
 get a prompt.

 The server will respond to ping, so I'm generally not totally
 concerned about it. And the checks usually clear up in 5 minutes or
 soon as the server gets whatever IO hog out of the way.

 Is anyone else experiencing this, and if so how do you cope / deal
 with this?

 Thanks,

 Charlie

 
 -
 This SF.Net email is sponsored by the Moblin Your Move Developer's
 challenge
 Build the coolest Linux based applications with Moblin SDK  win great
 prizes
 Grand prize is a trip for two to an Open Source event anywhere in the
 world
 http://moblin-contest.org/redirect.php?banner_id=100url=/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Charlie Reddington

On Oct 6, 2008, at 11:03 AM, James wrote:

 On Mon, October 6, 2008 11:37 am, Charlie Reddington wrote:
 I should also  mention that I also have these timeouts in place...


 service_check_timeout=90 host_check_timeout=30  
 event_handler_timeout=30
 notification_timeout=60 ocsp_timeout=5 perfdata_timeout=5

 Charlie


 On Oct 6, 2008, at 10:35 AM, Charlie Reddington wrote:


 Hi,


 I have a couple machines that spit out a warning similar to this:


 WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/
 check_disk' returned status 1

 I believe this to be caused by the check itself is timing out. As
 when I try to login it will sometimes take up to a minute or two  
 just to
 get a prompt.

 The server will respond to ping, so I'm generally not totally
 concerned about it. And the checks usually clear up in 5 minutes  
 or soon
 as the server gets whatever IO hog out of the way.

 Is anyone else experiencing this, and if so how do you cope / deal
 with this?

 Thanks,


 Charlie

 The timeouts in nagios.cfg are ow long the nagios process waits before
 aborting a check.
 There are usually check specific timeouts that you can add to the  
 command
 definition.
 Run the check_* command manually and see what the syntax is  
 (sometimes '-t
 xx').


I thought I had did that already , and just put the --timeout option  
on the check_by_ssh, but I guess not. I added the timeout, from 30   
to  60.  We'll see how it goes.

Charlie

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Matthew Pounsett




I believe this to be caused by the check itself is timing out. As when
I try to login it will sometimes take up to a minute or two just to
get a prompt.




As for setting the timeouts for that sort of thing, this is what I do.

In my resource.cfg:
--
# check_by_ssh timeout
$USER4$=10
--

.. and in my commands.cfg definitions..
---
# 'check_disk_remote' command definition
define command {
command_namecheck_disk_remote
command_line$USER1$/check_by_ssh -H $HOSTADDRESS$ -t $USER4$ - 
C $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

}
---

And I use the same $USER4$ definition for all of the check_by_ssh  
calls, so that it's easy to tune.


Have you looked into the reason for the long login delay though?  I  
think I'd start there.  A 60 second wait for ssh to get you a shell  
indicates some sort of problem.  Either the target machine is so  
resource starved that it can't negotiate the authentication and  
encryption, or you've got some other delay in there.  The most likely  
culprit to my mind is DNS -- ssh itself, login and your shell on the  
target machine might all be trying to do a reverse DNS lookup on the  
source of the connection.  If that's timing out, it could cause very  
long delays.   There are lots of other potential problems, but I'd  
start looking there.





PGP.sig
Description: This is a digitally signed message part
-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null