Hi Steve, On Thu, May 24, 2007 at 02:50:19PM -0400, Steve Hill wrote: > On May 24, 2007, at 12:51 PM, Mark Plaksin wrote: > > > We have about 45k RRDs and our testing so far says the fadvise changes > > are very nice--thanks! We're also testing local disk (via cciss driver) > > vs SAN storage. Our current RRD server is pretty crushed io-wise. So > > far the SAN storage looks like a big win too. > > We have a system here with about 65k RRDs, again updated every 5 minutes. > I will definitely have to take a look at the fadvise changes, since our > system is currently WAAY overloaded.
If you're on Linux 2.6.5 or better, and you have enough memory to handle about 1/3 the RRDs you have now (without the patch), then the fadvise RANDOM patch to rrd_open will likely solve your problem by not loading unnecessary RRD file blocks/pages from the buffer-cache so that the page replacement algorithm will have room to pull in the few additional pages per RRD file that are typically needed at the aggregation times. > One thing I have noticed is the periodicity of the load. All our RRDs > have the same spec. The system happily churns away until it passes one > of the larger consolidation points and then *bam* the load goes through > the roof because of the extra IO. This is a side-effect and performance limitation of the RRD design because all like RRD files with RRAs configured with more than 1 PDP for consolidation will undergo aggregations at the same times of day, as offset from zero UTC. > For our system, I am interested in a way of smoothing out those load > spikes, since the system becomes periodically unusable. Yes, I do > appreciate that our updater could be smarter :) > > One of the things I thought about was adding some offset to each RRD > such that the consolidation didn't happen on all RRDs at once, perhaps > based on the creation time of the RRD... > > Comments? We've considered this as well. Indeed the synchronized aggregations have the potential to limit performance for very large RRD systems. It is however a convenience, and intentional in the design. While I think you'll find it unnecessary to address the synchronized aggregations for only 45K RRD files (or even hundreds of thousands) with a five minute update interval, two options are (a) change RRDTool so that not all files aggregate at the same time or (b) defer/stagger the updates perhaps by introducing an independent thread (as others have suggested RRDAccelerator, etc.) Method (a) presumably has significant consequences for read (fetch/graph) semantics since the aggregation times will no longer line up across files. Method (b) changes the read semantics since the data (across all RRD files) will not be available in near realtime. Dave -- [EMAIL PROTECTED] http://net.doit.wisc.edu/~plonka/ Madison, WI _______________________________________________ rrd-developers mailing list [email protected] https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
