> > They all become un-stuck at the same time, maybe 20 seconds > later, and > > then the graphs appear very quickly. > > The FLUSH commands are waiting to be notified that the file > has been written out to disk. They block on > pthread_cond_wait() and don't return until the queue thread > has written the file out to disk. > > What is happening on your system at that time? Are there > other events which may slow down the I/O?
The I/O is not so bad - I'm watching it with iostat -k 1 -x and I also have a Ganglia metric module for IO which gives me a nice graph. I've experimented with sysctl, here are values I'm currently using: vm.dirty_expire_centisecs = 179971 vm.dirty_writeback_centisecs = 35993 vm.dirty_ratio = 90 vm.dirty_background_ratio = 2 vm.max_map_count = 4000000 If I understand correctly, then vm.dirty_ratio means nothing should block until 90% of the RAM is taken up by dirty pages. Given that mmap() is being used with MAP_SHARED, and I have 8GB of RAM, all the necessary pages should be staying in RAM. If you can suggest a more appropriate strategy for configuring the cache, it would be very welcome. > I have seen this behavior before on one of my Linux 2.6.x > machines. When it has dirtied too many pages, all I/O on the > system pauses until it has flushed the "write-back" pages out > to disk. What kind of system are you running? RHEL5: Linux xxx 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 13:27:27 EST 2008 x86_64 x86_64 x86_64 GNU/Linux > > > I'm using r1621 + the patch adding pthread_cond_init(&ci->flushed, > > NULL); > > You should upgrade to at least r1626. Otherwise, you may > notice some files that are getting flushed by the flush > proces (corresponding to the -f timer) are hanging around in > queue forever. The bug was introduced in r1588, resolved in r1626. > I've now merged in changes to rrd_daemon.c from r1626, still have the same problem though. There is also a memory leak somewhere (maybe in my striping code, maybe in rrdcached). I've tried to start rrdcached with valgrind, but my large mmap() call fails with EINVAL when using valgrind. The memory leak could be the cause of the performance issue - it grows to several gigabytes and there is swapping, that might be reducing the amount of RAM available for caching the mmap() pages. Can you make any suggestions for using valgrind or another tool in this scenario? _______________________________________________ This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing. Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered offic e at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. _______________________________________________ _______________________________________________ rrd-developers mailing list [email protected] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
