Oh and the threat count is: log02 root /home/ulf # ps -L -p 2515 | wc -l 282
So 281 not counting the header of ps. > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On > Behalf Of Ulf Zimmermann > Sent: Sunday, November 21, 2010 1:53 AM > To: 'Steve Shipway' > Cc: '[email protected]' > Subject: Re: [rrd-users] rrdcached issues with larger number of clients > via network/pthread > > I use it via collectd and that should only be doing update. Graphing > happens through rrdtool itself, directly on the files. Currently I got > 275 connections (as per netstat). It runs as: > > collectd 2515 1 20 Nov17 ? 16:50:17 //opt/rrdtool- > 1.4.4.002147/bin/rrdcached -p /var/rrdtool/rrdcached/rrdcached.pid -w > 600 -z 300 -l 10.21.0.43 -p /data/rrdcached/run/rrdcached.pid -l > /data/rrdcached/run/rrdcached.sock -j /data/rrdcached/journal -b > /data/rrdcached/data > > Top shows it as: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 2515 collectd 15 0 3000m 182m 896 S 19.9 1.1 1010:22 rrdcached > > Virt is currently bouncing between 2997 and 3000. It was initial around > 2,776 I think after I started the newly compiled rrdcached and then > restarted all the collectd instances (I need to get something in place > which does that automatic). > > The last few times I have looked it ran out of memory as far I can, > failing to create new pthread or failed on mmaping: > > Nov 17 13:37:40 log02 rrdcached[21009]: listen_thread_main: > pthread_create failed. > Nov 17 13:39:04 log02 rrdcached[21009]: queue_thread_main: rrd_update_r > (/data/rrdcached/data/co-db02.autc.com/disk-cciss_c0d2/disk_time.rrd) > failed with status -1. (mmaping file '/data/rrdcached/data/co- > db02.autc.com/disk-cciss_c0d2/disk_time.rrd': Cannot allocate memory) > Nov 17 13:41:34 log02 rrdcached[21009]: queue_thread_main: rrd_update_r > (/data/rrdcached/data/co-db02.autc.com/interface/if_octets-sit0.rrd) > failed with status -1. (mmaping file '/data/rrdcached/data/co- > db02.autc.com/interface/if_octets-sit0.rrd': Cannot allocate memory) > Nov 17 13:47:40 log02 rrdcached[21009]: listen_thread_main: > pthread_create failed. > > I need to figure out what I can do about moving all this to a 64-bit > machine, this is currently just EL5 i386. Initial I was going to > install it as 64-bit (machine has 16GB) but due to issues with rrd and > different file format between i386 and x86_64, I ended up using i386. > Since then I have moved anything either to this machine locally > (collectd and some other collectors) or using collectd/rrdcached for > remote machines, so I could switch to x86_64, but would have to convert > all the files when I do that. > > If it weren't also my central syslog server, I would potential just > reinstall it. > > > > > -----Original Message----- > > From: Steve Shipway [mailto:[email protected]] > > Sent: Sunday, November 21, 2010 1:28 AM > > To: Ulf Zimmermann > > Cc: '[email protected]' > > Subject: RE: [rrd-users] rrdcached issues with larger number of > clients > > via network/pthread > > > > I had this same memory problem and error message a while back after 4 > > days of running, but had thought it to be due to a couple of small > > memory leaks in the branch code (since fixed). > > > > Can you indicate which rrdcached functions you are using -- ie, is it > > just used for update, or are you also using other functions like > last, > > create, info, etc on a regular (not necessarily frequent) basis? > This > > would help to track down problems. > > > > Another possibility is that the number of active threads has hit 1024 > > (PTHREAD_THREADS_MAX -- this can be increased only by recompiling the > > kernel). I don't have enough intimate knowledge of rrdcached to tell > > if it is possible for it to be 'leaking' threads; I suppose that > since > > you have a separate thread for each active client connection, plus > the > > write threads, a large number of clients might cause this to be > > reached? To tell if this is it, use 'ps -L -p <rrdcached PID>' and > > count the number of threads for the rrdcached process. For > comparison, > > we have 15 on our server, and it has been running (with 1.4.trunk) > for > > more than a week now with over 50 updates per second. > > > > A separate issue is that, from what I can tell of the code, the rrd > > client is supposed to attempt a re-connect to the daemon in the event > > of the remote daemon restarting and the connection dying. However it > > does seem that this doesn't necessarily happen -- I've had to restart > > the MRTG daemon, and you apparently need to restart collectd when the > > rrdcached is restarted. > > > > Steve > > > > Steve Shipway > > University of Auckland ITS > > UNIX Systems Design Lead > > [email protected] > > Ph: +64 9 373 7599 ext 86487 > > _______________________________________________ > rrd-users mailing list > [email protected] > https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users _______________________________________________ rrd-users mailing list [email protected] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
