Re: [rrd-users] rrdcached issues with larger number of clients via network/pthread

Ulf Zimmermann Sun, 21 Nov 2010 01:54:56 -0800

I use it via collectd and that should only be doing update. Graphing happens 
through rrdtool itself, directly on the files. Currently I got 275 connections 
(as per netstat). It runs as:


collectd  2515     1 20 Nov17 ?        16:50:17 
//opt/rrdtool-1.4.4.002147/bin/rrdcached -p 
/var/rrdtool/rrdcached/rrdcached.pid -w 600 -z 300 -l 10.21.0.43 -p 
/data/rrdcached/run/rrdcached.pid -l /data/rrdcached/run/rrdcached.sock -j 
/data/rrdcached/journal -b /data/rrdcached/data

Top shows it as:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
                                            
 2515 collectd  15   0 3000m 182m  896 S 19.9  1.1   1010:22 rrdcached          
                                            

Virt is currently bouncing between 2997 and 3000. It was initial around 2,776 I 
think after I started the newly compiled rrdcached and then restarted all the 
collectd instances (I need to get something in place which does that automatic).

The last few times I have looked it ran out of memory as far I can, failing to 
create new pthread or failed on mmaping:

Nov 17 13:37:40 log02 rrdcached[21009]: listen_thread_main: pthread_create 
failed.
Nov 17 13:39:04 log02 rrdcached[21009]: queue_thread_main: rrd_update_r 
(/data/rrdcached/data/co-db02.autc.com/disk-cciss_c0d2/disk_time.rrd) failed 
with status -1. (mmaping file 
'/data/rrdcached/data/co-db02.autc.com/disk-cciss_c0d2/disk_time.rrd': Cannot 
allocate memory)
Nov 17 13:41:34 log02 rrdcached[21009]: queue_thread_main: rrd_update_r 
(/data/rrdcached/data/co-db02.autc.com/interface/if_octets-sit0.rrd) failed 
with status -1. (mmaping file 
'/data/rrdcached/data/co-db02.autc.com/interface/if_octets-sit0.rrd': Cannot 
allocate memory)
Nov 17 13:47:40 log02 rrdcached[21009]: listen_thread_main: pthread_create 
failed.

I need to figure out what I can do about moving all this to a 64-bit machine, 
this is currently just EL5 i386. Initial I was going to install it as 64-bit 
(machine has 16GB) but due to issues with rrd and different file format between 
i386 and x86_64, I ended up using i386. Since then I have moved anything either 
to this machine locally (collectd and some other collectors) or using 
collectd/rrdcached for remote machines, so I could switch to x86_64, but would 
have to convert all the files when I do that.

If it weren't also my central syslog server, I would potential just reinstall 
it.



> -----Original Message-----
> From: Steve Shipway [mailto:[email protected]]
> Sent: Sunday, November 21, 2010 1:28 AM
> To: Ulf Zimmermann
> Cc: '[email protected]'
> Subject: RE: [rrd-users] rrdcached issues with larger number of clients
> via network/pthread
> 
> I had this same memory problem and error message a while back after 4
> days of running, but had thought it to be due to a couple of small
> memory leaks in the branch code (since fixed).
> 
> Can you indicate which rrdcached functions you are using -- ie, is it
> just used for update, or are you also using other functions like last,
> create, info, etc on a regular (not necessarily frequent) basis?  This
> would help to track down problems.
> 
> Another possibility is that the number of active threads has hit 1024
> (PTHREAD_THREADS_MAX -- this can be increased only by recompiling the
> kernel).  I don't have enough intimate knowledge of rrdcached to tell
> if it is possible for it to be 'leaking' threads; I suppose that since
> you have a separate thread for each active client connection, plus the
> write threads, a large number of clients might cause this to be
> reached?  To tell if this is it, use 'ps -L -p <rrdcached PID>' and
> count the number of threads for the rrdcached process.  For comparison,
> we have 15 on our server, and it has been running (with 1.4.trunk) for
> more than a week now with over 50 updates per second.
> 
> A separate issue is that, from what I can tell of the code, the rrd
> client is supposed to attempt a re-connect to the daemon in the event
> of the remote daemon restarting and the connection dying.  However it
> does seem that this doesn't necessarily happen -- I've had to restart
> the MRTG daemon, and you apparently need to restart collectd when the
> rrdcached is restarted.
> 
> Steve
> 
> Steve Shipway
> University of Auckland ITS
> UNIX Systems Design Lead
> [email protected]
> Ph: +64 9 373 7599 ext 86487

_______________________________________________
rrd-users mailing list
[email protected]
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users

Re: [rrd-users] rrdcached issues with larger number of clients via network/pthread

Reply via email to