I had this same memory problem and error message a while back after 4 days of 
running, but had thought it to be due to a couple of small memory leaks in the 
branch code (since fixed).

Can you indicate which rrdcached functions you are using -- ie, is it just used 
for update, or are you also using other functions like last, create, info, etc 
on a regular (not necessarily frequent) basis?  This would help to track down 
problems.

Another possibility is that the number of active threads has hit 1024 
(PTHREAD_THREADS_MAX -- this can be increased only by recompiling the kernel).  
I don't have enough intimate knowledge of rrdcached to tell if it is possible 
for it to be 'leaking' threads; I suppose that since you have a separate thread 
for each active client connection, plus the write threads, a large number of 
clients might cause this to be reached?  To tell if this is it, use 'ps -L -p 
<rrdcached PID>' and count the number of threads for the rrdcached process.  
For comparison, we have 15 on our server, and it has been running (with 
1.4.trunk) for more than a week now with over 50 updates per second.

A separate issue is that, from what I can tell of the code, the rrd client is 
supposed to attempt a re-connect to the daemon in the event of the remote 
daemon restarting and the connection dying.  However it does seem that this 
doesn't necessarily happen -- I've had to restart the MRTG daemon, and you 
apparently need to restart collectd when the rrdcached is restarted.

Steve

Steve Shipway
University of Auckland ITS
UNIX Systems Design Lead
[email protected]
Ph: +64 9 373 7599 ext 86487

_______________________________________________
rrd-users mailing list
[email protected]
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users

Reply via email to