Now that's a big setup. I think the corruption is a result of the code not correctly handling the out-of-memory problems, and so if your version isn't experiencing a memory leak then you'll not be affected even if the bug is in the version you're using. The big problem is the memory leak, and I guess I'll need to learn to use valgrind to track it.
This is pointing the finger at it being the newer code (last and info, most likely, since create is rarely done) that causes any leaks, though there were a couple of additional changes between 2092 and 2136 that could be to blame. I might try out the -a option; we've not used it yet as it's a new one in 1.4.trunk Steve ________________________________ Steve Shipway ITS Unix Services Design Lead University of Auckland, New Zealand Floor 1, 58 Symonds Street, Auckland Phone: +64 (0)9 3737599 ext 86487 DDI: +64 (0)9 924 6487 Mobile: +64 (0)21 753 189 Email: [email protected]<mailto:[email protected]> P Please consider the environment before printing this e-mail From: Thorsten von Eicken [mailto:[email protected]] Sent: Friday, 22 October 2010 3:13 p.m. To: Steve Shipway Cc: kevin brintnall; [email protected]; [email protected] Subject: Re: [rrd-developers] rrdcached use corrupting RRD files (trunk) Sadly interesting... As a separate data point, we're running over 100 rrdcached servers, each handling >30k tree nodes and receiving about 3k updates/sec, caching data for ~1 hour so updating files at ~20 updates/sec. Uptime in months without problem, never seen corruption (knock on wood). We're running 1.4 trunk revision r2092 (randomly picked) on Ubuntu 8.04 (used to run on CentOS 5.2, I believe). We're not seeing any memory leak and running stable at 800-900MB virtual / 500-600MB rss. We're using TCP sockets and doing updates, fetches and flushes. The command line we use is: /usr/bin/rrdcached -w 3600 -z 3600 -f 7200 -t 2 -a 128 -b /rrds/hosts -B -j /rrds/journal -p /var/run/rrdcached/rrdcached.pid -l 10.x.x.x:xxxx I'm not writing this to contradict you, I'm just wondering what could be different in your set-up that causes the problems. (Oh, that reminds me that the -a 128 made a huge difference for us around memory allocation performance.) Good luck! TvE
_______________________________________________ rrd-users mailing list [email protected] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
