Mike Erdely wrote on Fri, Dec 01, 2006 at 12:07:19PM -0500:

> I'm running jabberd2 from ports on an 4.0-release+patches, P4 2GHz, 1 GB
> RAM box for ~50 users.  This box is running nothing but jabber & mysql. 
> Jabber is configured to use the local mysql (its only purpose is jabber)
> for storage and LDAPS for authentication.
> 
> During the work week jabber seems to be working fine and then at some
> point people cannot log in.  People who are already logged in do not seem
> to have a problem.  A few seconds later, the c2s process is taking all of
> the CPU.

Could you check whether jabberd2 also uses large numbers of file
descriptors?  There is a file descriptor leak in jabberd2; it is
triggered by SSL handshake errors.  Such errors have been seen in
the wild, in particular in s2s with the host jabjab.de.  Another
file descriptor leak - or rather, probably the same one -
had also been reported in the jabberd2 bug tracking system:
  http://j2.openaether.org/bugzilla/show_bug.cgi?id=23

Because jabberd2 will indefinitely retry to use these broken
connections, i could well imagine that it will eventually overload
your CPU.  I didn't check, though; i never saw a point in
increasing kern.maxfiles because my collegue Klara quickly
realised that the file descriptor leak is the cause of the problem
we saw.

Klara prepared a patch fixing the file descriptor leak.  She submitted
it upstream but never got any response.  So i converted it to a patch
for the ports tree and sent it to the MAINTAINER.  It bounced.  So I
contacted the maintainer via another mail adress he has been using.
I never heard back from him.  So i submitted the patch here, but
nobody replied, and i am not aware of any related commits.  It seems
jabberd2 is rarely used.

All the same, i will now update my -current build machine, check
whether the patch is still ok and resubmit.  It will take some
time, i'm now at home, my build machine is at the office, 
and my build machine is not fast, so please be patient.

> Any ideas?  Anyone else seeing c2s spike at 100% CPU on an almost
> daily basis?

All this is now several months ago, but i dimly recall we saw
jabberd2 processes eating most of the CPU, too, though it never
got as bad as locking up the system (PIII 635MHz, 256 MD RAM).
I'm still sure about the following: When killing the processes
holding the leaked file descriptors, the CPU did go to 100% for
several seconds, and during that time, the system was not
responsive.  But in all cases, the processes finally died off
and released all resourced they held.

We use the following settings:

[EMAIL PROTECTED] # grep max_fds /etc/jabberd/*
c2s.xml:    <max_fds>128</max_fds>
router.xml:    <max_fds>128</max_fds>

Besides, we are running mysqld with --open-files-limit=256,
and it is used by SpamAssassin, Joomla, and Mediawiki, too.
The rest of the related settings are at the defaults; _jabberd 
and _mysql are login class daemon with :openfiles-cur=128:.
In any case, i cannot believe you actually need
  kern.maxfiles=10240 or :openfiles-cur=4096:
or anything like that to serve 50 users.

Reply via email to