On 02 Oct 2012, at 10:32 PM, Wesley Wyche <[email protected]> wrote:

> Is there anyone that has had success in dealing with large numbers of
> updates per second, and if so, what solution are you using?

What I have done in the past is to "collapse" updates on the fly to reduce 
their physical number, and in turn the number of writes that has to be made to 
RRD files. What I modelled it on was "sort" followed by "uniq -c", which turned 
this (3 writes):

/foo
/foo
/bar

into this (2 writes):

2: /foo
1: /bar

Or at scale (584951 writes becomes 2 writes):

487264: /foo
97687: /bar

Obviously "sort" doesn't work on streams, so I came up with a tool that did the 
"uniq -c" part on streamed data. It kept a FIFO of lines, a line that didn't 
already exist went in at the head. If the line did exist in the FIFO a counter 
was incremented. Once a timeout for each line had been reached, a line was 
ejected. This "collapsed" data on the fly. Increasing this timeout reduced the 
number of lines, and in turn the number of rrd writes. All of this was 
processed in real time, which meant I could drop the date out of the line 
format and just count URLs only, over multiple seconds.

The storage was multiple rrd files, one per URL, and the bottleneck was the 
number of discrete URLs. Ultimately it didn't matter as to the magnitude of the 
numbers being written, but rather than physical number of URLs involved. If you 
use something like "cut" you can turn "/foo/subfoo/subsubfoo" into just "/foo" 
and then "collapse" that, thus reducing the flood of data to a trickle. You 
lose detail in the data, but this could be the difference between "possible" 
and "impossible". One of the key things about this is that to get real 
performance you need to do this in a language like C. As soon as a scripting 
language got involved performance went down by orders of magnitude and became 
the bottleneck.

Depending on the kind of data you're trying to record, rrdcached may or may not 
help. If you have data being saved across tens or hundreds of GBs of rrd files, 
then rrdcached is unlikely to help as you need a cache hundreds of GB in size. 
If you have some hot rrd files then rrdcached might work for you, but it's not 
guaranteed, it all depends on the data you're recording.

I found iowait to be the key bottleneck, which something like an SSD might help 
with, assuming you don't wear it out in the process.

Regards,
Graham
--

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
rrd-users mailing list
[email protected]
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users

Reply via email to