I got close to 300 machines running collectd, configured to use unixsocks to 
rrdcached on a central server. We are running more and more into threads dieing 
(collectd then starts complaining and fills up /var/messages) and when we try 
to restart collectd, sometimes it works, sometimes we end up with:

Oct 30 22:27:19 log02 rrdcached[16864]: listen_thread_main: pthread_create 
failed.
Oct 30 22:27:34 log02 rrdcached[16864]: listen_thread_main: pthread_create 
failed.
Oct 30 22:28:10 log02 rrdcached[16864]: listen_thread_main: pthread_create 
failed.

And at this point we usual have to restart the rrdcached daemon, which then 
means having to restart collectd on close to 300 machines.

How can this be debugged to find the issue (potential inside of pthreads). The 
central server is running RedHat EL5 Update 4, the rrdtool/rrdcached is 1.4.4 
from rpmforge.

Ulf, who is getting more grey hair by the minute with issues like this :-(

_______________________________________________
rrd-users mailing list
[email protected]
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users

Reply via email to