Re: [Nagios-users] Return code of 127 is out of bounds - only on high cpu load though

2010-12-15 Thread Andreas Ericsson
On 12/14/2010 06:39 PM, Daniel Wittenberg wrote: I ran a full strace of nagios daemon and children and it looks like it was the enable_environment_macros that was causing: [pid 20478] execve(/bin/sh, [sh, -c, . . . . . ] = -1 E2BIG (Argument list too long)0.000337 [pid 20478]

[Nagios-users] Return code of 127 is out of bounds - only on high cpu load though

2010-12-14 Thread Daniel Wittenberg
I noticed something odd the other day while stressing my servers. I noticed that when I overload it with too many hosts/checks, that I start getting active check failures with the standard 127 code. But, if I slowly reduce the number of hosts/checks, I’ll get to a point where it starts

Re: [Nagios-users] Return code of 127 is out of bounds - only on high cpu load though

2010-12-14 Thread Andreas Ericsson
On 12/14/2010 05:08 PM, Daniel Wittenberg wrote: I noticed something odd the other day while stressing my servers. I noticed that when I overload it with too many hosts/checks, that I start getting active check failures with the standard 127 code. But, if I slowly reduce the number of

Re: [Nagios-users] Return code of 127 is out of bounds - only on high cpu load though

2010-12-14 Thread Daniel Wittenberg
Yeah, the only two I'm testing with are check_nrpe and check_tcp, and it's all of them on every server that start failing. Any idea what kind of shared resources it might be starving? Dan -Original Message- From: Andreas Ericsson [mailto:a...@op5.se] Sent: Tuesday, December 14,

Re: [Nagios-users] Return code of 127 is out of bounds - only on high cpu load though

2010-12-14 Thread Andreas Ericsson
On 12/14/2010 05:14 PM, Daniel Wittenberg wrote: Yeah, the only two I'm testing with are check_nrpe and check_tcp, and it's all of them on every server that start failing. Any idea what kind of shared resources it might be starving? Not those two, no. They should be fairly well behaved,

Re: [Nagios-users] Return code of 127 is out of bounds - only on high cpu load though

2010-12-14 Thread Daniel Wittenberg
I ran a full strace of nagios daemon and children and it looks like it was the enable_environment_macros that was causing: [pid 20478] execve(/bin/sh, [sh, -c, . . . . . ] = -1 E2BIG (Argument list too long) 0.000337 [pid 20478] exit_group(127) = ? I turned them off and that fixes