I had this same memory problem and error message a while back after 4 days of running, but had thought it to be due to a couple of small memory leaks in the branch code (since fixed).
Can you indicate which rrdcached functions you are using -- ie, is it just used for update, or are you also using other functions like last, create, info, etc on a regular (not necessarily frequent) basis? This would help to track down problems. Another possibility is that the number of active threads has hit 1024 (PTHREAD_THREADS_MAX -- this can be increased only by recompiling the kernel). I don't have enough intimate knowledge of rrdcached to tell if it is possible for it to be 'leaking' threads; I suppose that since you have a separate thread for each active client connection, plus the write threads, a large number of clients might cause this to be reached? To tell if this is it, use 'ps -L -p <rrdcached PID>' and count the number of threads for the rrdcached process. For comparison, we have 15 on our server, and it has been running (with 1.4.trunk) for more than a week now with over 50 updates per second. A separate issue is that, from what I can tell of the code, the rrd client is supposed to attempt a re-connect to the daemon in the event of the remote daemon restarting and the connection dying. However it does seem that this doesn't necessarily happen -- I've had to restart the MRTG daemon, and you apparently need to restart collectd when the rrdcached is restarted. Steve Steve Shipway University of Auckland ITS UNIX Systems Design Lead [email protected] Ph: +64 9 373 7599 ext 86487 _______________________________________________ rrd-users mailing list [email protected] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
