On Wed, May 2, 2012 at 8:22 AM, Tim Wintle <timwin...@gmail.com> wrote: > On Tue, 2012-05-01 at 11:00 -0700, Aaron Turner wrote: >> Tens or a few hundred MB per row seems reasonable. You could do >> thousands/MB if you wanted to, but that can make things harder to >> manage. > > thanks (Both Aarons) > >> Depending on the size of your data, you may find that the overhead of >> each column becomes significant; far more then the per-row overhead. >> Since all of my data is just 64bit integers, I ended up taking a days >> worth of values (288/day @ 5min intervals) and storing it as a single >> column as a vector. > > By "vector" do you mean a raw binary array of long ints?
Yep. I've also done a few small optimizations for when an entire days data is 0, etc. > That sounds very nice for reducing overhead - but I'd like to to work > with counters (I was going to rely on them for streaming "real-time" > updates). I was going to use counters for aggregates... but I ended up doing all the work in the client and storing them the same way as individual data sources. Depends on what you're counting really. Basically with counters, if you get an error incrementing them, you have no idea if the value changed or not. There's other issues too, which have been discussed here on list and should be in the archives. Not a big deal if you're just counting the number of times people have clicked "Like", but if you're building network traffic aggregates and you fail to include or double count a 10 slot switch full of 10Gbps ports your graphs end up looking really bad! > Is that why you've got the two CFs described below (to have an archived > summary and a live version that can have counters), or do you have no > contention over writes/increments for individual values? Basically if I inserted data as it came in as a vector, I'd have to do a read for every write (read the current vector, and then write a new vector with the new value appended to it). That would destroy performance, hence the two CF's. By doing it nightly, it's a lot more efficient. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"