Re: Nagios + 6.3-RELEASE == Hung Process

2008-02-04 Thread Mike Tancsa

At 06:17 PM 2/4/2008, Jarrod Sayers wrote:

On 03/01/2008, at 11:56 AM, Marc G. Fournier wrote:

As noted in my original report, this isn't a nagios issue per se ...
my first
experience with this issue was with Azureus/java ... so its a
'threading issue
in general' ...


A patch to force the package to link against libthr() has been
committed [1] and should be available once mirrors update as 
net-mgmt/ nagios 2.10_1.  This has been tested since this 
conversation stated in

the net-mgmt/nagios-devel port [2] without any negative feedback being


We have been using nagios linked against libthr via libmap.conf since 
the end of November and its been working great since then.  Prior to 
that, we would see 100% CPU usage a couple of times a week on various 
nagios procs. Hasnt happened since.


---Mike 


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Nagios + 6.3-RELEASE == Hung Process

2008-02-04 Thread Jarrod Sayers

On 03/01/2008, at 11:56 AM, Marc G. Fournier wrote:
As noted in my original report, this isn't a nagios issue per se ...  
my first
experience with this issue was with Azureus/java ... so its a  
'threading issue

in general' ...


A patch to force the package to link against libthr() has been  
committed [1] and should be available once mirrors update as net-mgmt/ 
nagios 2.10_1.  This has been tested since this conversation stated in  
the net-mgmt/nagios-devel port [2] without any negative feedback being  
received.  Bundled in the update is the inclusion of libltdl as  
requested by Tom Judge.


I'd be interested to know how people go with the updated port.  Thanks  
for your patience.


[1] http://www.freebsd.org/cgi/query-pr.cgi?pr=120150
[2] http://www.freebsd.org/cgi/query-pr.cgi?pr=119246

Jarrod.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Nagios + 6.3-RELEASE == Hung Process

2008-01-02 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Thursday, January 03, 2008 11:05:16 +1030 Jarrod Sayers 
<[EMAIL PROTECTED]> wrote:

> That's actually good to know, as you're now (unless I am mistaken) the first
> user to contact me about this problem on non-i386 systems.  One user, plus
> myself, have also seen the issue under Nagios 3.x, both on i386 systems
> though.
>
> I also have a net-mgmt/ndoutils port in the works (less the database support
> for now) which also has the same issue so using broker modules doesn't seem
> to affect the outcome.
>
> My gut feeling is that it's not an architecture issue but more an
> interoperability issue between the Nagios threading code and the libpthread()
> threading library.

As noted in my original report, this isn't a nagios issue per se ... my first 
experience with this issue was with Azureus/java ... so its a 'threading issue 
in general' ...

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHfDm94QvfyHIvDvMRAtZkAKCf4z6csc+YaXBS1/UMurQ3NIqXDgCeLCif
jplg0JQzX4xKQEgJsVy/nGY=
=dA7G
-END PGP SIGNATURE-

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Nagios + 6.3-RELEASE == Hung Process

2008-01-02 Thread Michael Butler
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Marc G. Fournier wrote:
> I never tried on i386, but in my case it was an amd64 system as well ... not 
> sure if that is relevant or not ... has anyone seen this problem *with* i386?

When I read about it, I was in the middle of upgrading the problem
machine to 7-stable - which now reports as follows:

FreeBSD 7.0-PRERELEASE #0: Tue Jan  1 22:12:02 EST 2008
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/AARON
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel Pentium III (701.59-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x681  Stepping = 1

Features=0x387f9ff
real memory  = 1073479680 (1023 MB)
avail memory = 1041297408 (993 MB)
kbd1 at kbdmux0
acpi0:  on motherboard
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHfDKWQv9rrgRC1JIRAgTzAJ0T4HwQcR8kSj+iuKL90S2oz5EWMACeLPqd
pBkMfN9J08zv+ibT3TgcYHA=
=vmkg
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Nagios + 6.3-RELEASE == Hung Process

2008-01-02 Thread Jarrod Sayers

On Wed, 2 Jan 2008, Tom Judge wrote:

Jarrod Sayers wrote:
I hope I can confirm your frustrations.  There is a threading issue 
with Nagios when it's binaries are linked against libpthread(3) 
threading library, the default on recent FreeBSD 5.x releases and all 
6.x releases. The issue is random and extremely difficult to track down 
with the symptoms being a second Nagios process sitting on the system 
hanging a CPU.  Be rest assured that I have been working on it, and 
have seen it on one system of mine.


Not sure if this is related at all but out of the 3 nagios deployments 
we have here I have only ever seen it on one (It currently has 2 nagios 
threads spinning CPU time atm).


The differences on that server are:

* It is amd64 compared to i386
* It also runs ndo2db from ndoutils 1.4b7

All the systems run 6.2-RELEASE-p5 and nagios-2.9_1, they are also all 
patched with gnu libltdl patch below.


Don't know if that info is of any use to you.


That's actually good to know, as you're now (unless I am mistaken) the 
first user to contact me about this problem on non-i386 systems.  One 
user, plus myself, have also seen the issue under Nagios 3.x, both on i386 
systems though.


I also have a net-mgmt/ndoutils port in the works (less the database 
support for now) which also has the same issue so using broker modules 
doesn't seem to affect the outcome.


My gut feeling is that it's not an architecture issue but more an 
interoperability issue between the Nagios threading code and the 
libpthread() threading library.


[yoink]

I did receive that email and the changes went in with the last commit 
of net-mgmt/nagios-devel to test.  No issues have arisen so i'll be 
back-porting it to net-mgmt/nagios soon for you.  There also has been a 
rather large ports freeze which delayed the upgrade to Nagios 2.10, 
that PR was submitted on the 1st of November and committed on the 13th 
of December. Unfortunately your email fell somewhere in the middle, 
apologies for not letting you know.


Thanks for this, I currently maintain the patch on our build servers.


No worries, I will look at bundling in the change with the libthr() fix 
over the next few days.  Thanks for pointing that out too as it was a bug 
instead of a feature request, as on systems where the library was 
available, the build process would link to it.  Hmm...


Jarrod.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Nagios + 6.3-RELEASE == Hung Process

2008-01-02 Thread Greg Byshenk
On Wed, Jan 02, 2008 at 07:24:28PM -0400, Marc G. Fournier wrote:
> - --On Wednesday, January 02, 2008 22:54:33 + Tom Judge <[EMAIL 
> PROTECTED]> wrote:

> > Not sure if this is related at all but out of the 3 nagios deployments we
> > have here I have only ever seen it on one (It currently has 2 nagios threads
> > spinning CPU time atm).

> > The differences on that server are:
> >
> > * It is amd64 compared to i386

> I never tried on i386, but in my case it was an amd64 system as well ... not 
> sure if that is relevant or not ... has anyone seen this problem *with* i386?

Yes.

We run Nagios on an i386 machine (dual Athlon MP 1800+), and I first saw this
problem with a build of 6-STABLE as of 2007-10-04, and it continues (if I don't
use the libmap.conf settings) with the running system of 6.3-PRERLEASE as of
2007-12-18 and nagios-2.10 (from ports of same date).

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Nagios + 6.3-RELEASE == Hung Process

2008-01-02 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Wednesday, January 02, 2008 22:54:33 + Tom Judge <[EMAIL PROTECTED]> 
wrote:

> Not sure if this is related at all but out of the 3 nagios deployments we
> have here I have only ever seen it on one (It currently has 2 nagios threads
> spinning CPU time atm).
>
> The differences on that server are:
>
>   * It is amd64 compared to i386

I never tried on i386, but in my case it was an amd64 system as well ... not 
sure if that is relevant or not ... has anyone seen this problem *with* i386?

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHfB0s4QvfyHIvDvMRAudqAKCuiXkAYPL5goXbmlvJjylpMlqUIwCgiRfM
m15NQlmqpRtO/MtEXR7m+RU=
=utJ9
-END PGP SIGNATURE-

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Nagios + 6.3-RELEASE == Hung Process

2008-01-02 Thread Tom Judge

Jarrod Sayers wrote:

On 03/01/2008, at 1:56 AM, Tom Judge wrote:

I have also seen this issue, but have always put it down to the way that
we manage our nagios deployments with cfengine.  I will try to deploy
this change and monitor for the problem to see if it persists.


I hope I can confirm your frustrations.  There is a threading issue with 
Nagios when it's binaries are linked against libpthread(3) threading 
library, the default on recent FreeBSD 5.x releases and all 6.x 
releases. The issue is random and extremely difficult to track down with 
the symptoms being a second Nagios process sitting on the system hanging 
a CPU.  Be rest assured that I have been working on it, and have seen it 
on one system of mine.




Not sure if this is related at all but out of the 3 nagios deployments 
we have here I have only ever seen it on one (It currently has 2 nagios 
threads spinning CPU time atm).


The differences on that server are:

* It is amd64 compared to i386
* It also runs ndo2db from ndoutils 1.4b7

All the systems run 6.2-RELEASE-p5 and nagios-2.9_1, they are also all 
patched with gnu libltdl patch below.


Don't know if that info is of any use to you.

Changes have been submitted for net-mgmt/nagios-devel (aka Nagios 
3.0.r1)) to force the build process to link against libthr(3) where 
available, removing the need to map libpthread() out with 
/etc/libmap.conf.  If this goes well, as stated in the PR, i'll 
back-port it to net-mgmt/nagios (aka Nagios 2.10) in the next few days.


If anyone out there is running net-mgmt/nagios-devel and feels like 
trying it for me, see ports/119246 and drop me an email with a before 
and after "ldd /usr/local/bin/nagios".



On a side note if you want to use broker modules with nagios from port
you need to change the following in the port Makefile in order to make
them load properly:

From:
USE_AUTOTOOLS=  autoconf:259
To:
SE_AUTOTOOLS=  autoconf:259 libltdl:15

I sent an email to the maintainer but got no response and my email did
not seem to have affected the last commit to upgrade to 2.10


I did receive that email and the changes went in with the last commit of 
net-mgmt/nagios-devel to test.  No issues have arisen so i'll be 
back-porting it to net-mgmt/nagios soon for you.  There also has been a 
rather large ports freeze which delayed the upgrade to Nagios 2.10, that 
PR was submitted on the 1st of November and committed on the 13th of 
December.  Unfortunately your email fell somewhere in the middle, 
apologies for not letting you know.




Thanks for this,  I currently maintain the patch on our build servers.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Nagios + 6.3-RELEASE == Hung Process

2008-01-02 Thread Jarrod Sayers

On 03/01/2008, at 1:56 AM, Tom Judge wrote:
I have also seen this issue, but have always put it down to the way  
that

we manage our nagios deployments with cfengine.  I will try to deploy
this change and monitor for the problem to see if it persists.


I hope I can confirm your frustrations.  There is a threading issue  
with Nagios when it's binaries are linked against libpthread(3)  
threading library, the default on recent FreeBSD 5.x releases and all  
6.x releases. The issue is random and extremely difficult to track  
down with the symptoms being a second Nagios process sitting on the  
system hanging a CPU.  Be rest assured that I have been working on it,  
and have seen it on one system of mine.


Changes have been submitted for net-mgmt/nagios-devel (aka Nagios  
3.0.r1)) to force the build process to link against libthr(3) where  
available, removing the need to map libpthread() out with /etc/ 
libmap.conf.  If this goes well, as stated in the PR, i'll back-port  
it to net-mgmt/nagios (aka Nagios 2.10) in the next few days.


If anyone out there is running net-mgmt/nagios-devel and feels like  
trying it for me, see ports/119246 and drop me an email with a before  
and after "ldd /usr/local/bin/nagios".



On a side note if you want to use broker modules with nagios from port
you need to change the following in the port Makefile in order to make
them load properly:

From:
USE_AUTOTOOLS=  autoconf:259
To:
SE_AUTOTOOLS=  autoconf:259 libltdl:15

I sent an email to the maintainer but got no response and my email did
not seem to have affected the last commit to upgrade to 2.10


I did receive that email and the changes went in with the last commit  
of net-mgmt/nagios-devel to test.  No issues have arisen so i'll be  
back-porting it to net-mgmt/nagios soon for you.  There also has been  
a rather large ports freeze which delayed the upgrade to Nagios 2.10,  
that PR was submitted on the 1st of November and committed on the 13th  
of December.  Unfortunately your email fell somewhere in the middle,  
apologies for not letting you know.


Jarrod.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Nagios + 6.3-RELEASE == Hung Process

2008-01-02 Thread Tom Judge

Michael Butler wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Marc G. Fournier wrote:

G'day ...

  Yesterday, I setup nagios to do some system monitoring ... installed the
latest version from ports into a jail, so that I could easily move it around
between machines as I upgrade, without losing data ... after about 30 minutes
running, I get a second nagios process running (fork?) that takes up ch CPU
time as is available, and just hangs there until I kill -9 it ...


[ .. ]


After searching the 'Net a bit, came across this thread:



That recommends modifying libmap.conf with:

[/usr/local/bin/nagios]
libpthread.so.2 libthr.so.2
libpthread.so libthr.so


Thanks for pointing this out. I've had similar problems with nagios but
hadn't found a solution until I saw your pointer. Sadly, my expertise
with both thread libraries is sufficiently lacking that I have no clue
where to start looking for the cause :-(



I have also seen this issue, but have always put it down to the way that
we manage our nagios deployments with cfengine.  I will try to deploy
this change and monitor for the problem to see if it persists.

On a side note if you want to use broker modules with nagios from port
you need to change the following in the port Makefile in order to make
them load properly:

From:
USE_AUTOTOOLS=  autoconf:259
To:
SE_AUTOTOOLS=  autoconf:259 libltdl:15


I sent an email to the maintainer but got no response and my email did
not seem to have affected the last commit to upgrade to 2.10.


Tom

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Nagios + 6.3-RELEASE == Hung Process

2008-01-01 Thread Michael Butler
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Marc G. Fournier wrote:
> 
> G'day ...
> 
>   Yesterday, I setup nagios to do some system monitoring ... installed the
> latest version from ports into a jail, so that I could easily move it around
> between machines as I upgrade, without losing data ... after about 30 minutes
> running, I get a second nagios process running (fork?) that takes up ch CPU
> time as is available, and just hangs there until I kill -9 it ...

[ .. ]

> After searching the 'Net a bit, came across this thread:
> 
> 
> 
> That recommends modifying libmap.conf with:
> 
> [/usr/local/bin/nagios]
> libpthread.so.2 libthr.so.2
> libpthread.so libthr.so

Thanks for pointing this out. I've had similar problems with nagios but
hadn't found a solution until I saw your pointer. Sadly, my expertise
with both thread libraries is sufficiently lacking that I have no clue
where to start looking for the cause :-(

Michael
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHenK4Qv9rrgRC1JIRAqifAKChinXb0dEPTMMlnXNYsuECLJL+vgCgvLF5
G5UYcIuvPe+UEk+qJSplrnY=
=xXMF
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Nagios + 6.3-RELEASE == Hung Process

2008-01-01 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


G'day ...

  Yesterday, I setup nagios to do some system monitoring ... installed the 
latest version from ports into a jail, so that I could easily move it around 
between machines as I upgrade, without losing data ... after about 30 minutes 
running, I get a second nagios process running (fork?) that takes up ch CPU 
time as is available, and just hangs there until I kill -9 it ...

Figuring that it might be a problem with the jail (trying to access somethign 
that isn't available to the process in a jail), I moved it to the physical 
server level ... but, again, after ~30 minutes, its doing the same thing:

# ps aux | grep nagios
nagios  32065 73.2  0.1 10948  3516  ??  R11:15AM   7:40.77 
/usr/local/bin/nagios -d /usr/local/etc/nagios/nagios.cfg
nagios  82120  0.0  0.1 10948  3580  ??  Ss   10:47AM   0:01.18 
/usr/local/bin/nagios -d /usr/local/etc/nagios/nagios.cfg

So, definitely not jail related ...

I've tried to do a 'truss -p 32065', it just hangs.

And: ktrace -f /tmp/output -p 32065 ... produces nothing:

# kdump -f /tmp/output
 32065 nagios   PSIG  SIGKILL SIG_DFL

Once I kill -9 the process, a bunch of 'check_ping' processes start up and then 
things go back to normal ...

My last kernel / world build on that box is: Mon Nov 12 06:43:30 AST 2007

After searching the 'Net a bit, came across this thread:



That recommends modifying libmap.conf with:

[/usr/local/bin/nagios]
libpthread.so.2 libthr.so.2
libpthread.so libthr.so

This seems to fix the problem on the physical server, and am currently testing 
it in the jail itself to make sure it fixes it there too ...

Should this be something that is more prominently documented somewhere?  Maybe 
in the port itself?  azureus has similar problems that are fixed with entries 
in libmap.conf, so its not "just a nagios issue" ...



- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHemsH4QvfyHIvDvMRApUOAKCLRDnmRba6ho4St8qZ6U19V8yJ+wCghMBp
Xph3ac9d7QsMjeKBMtmgkuw=
=mXxF
-END PGP SIGNATURE-

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"