Rowkey design for time series data

Bartosz M. Frak Tue, 03 Jul 2012 15:22:55 -0700

Hey Guys,

Before I get to my thoughts on the rowkey design, here's some backgroundinfo about the problem we are trying to tackle.

We are producing about 60TB of data a year (uncompressed). Most of thisdata is collected continuously from various detectors around ourfacility Vast majority of it is numerical (either scalar or1-dimensional array). Detectors can be either polled at regularintervals or they can send their measurements asynchronously. We arecurrently using regular compressed files to store all this informationwith headers (metadata) stored in a relational database. Managing(moving, archiving) this amount of data is slowly becoming a nuisance,so we are investigating other solutions, including distributeddatabases like HBase.

We have a number of requirements based on our users' access patters -the most important one being a (ridiculously) fast sequential, timesorted access for selected metric(s). We also want to decimate the datafor live viewing (visually compressing billions of time series pointsinto a manageable size without loosing the "shape" of the originaldataset) . We are already doing this in our custom middleware server,but this looks like a problem that could be tackled by MapReduce.

Now, back to the subject matter. I examined bothOpenTSDB(http://opentsdb.net/schema.html) and HBaseWD(http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/)solutions and although their general ideas looks right, neither one ofthem looks like a perfect fit for us. The former correctly distributeswrites across the entire cluster, but the sequential, time ordered readsfor the same metric end up being localized to a relatively small numberof regions (with enough collected data over a long period of time thereads will hit just one, maybe two regions). The latter also distributesthe writes across the entire cluster, but the reads require BUCKET_COUNT(BC) number of scans and they are almost guaranteed to be out of orderacross multiple buckets (they are in the correct relative order withineach bucket).I was thinking about a rowkey design, which takes another time dimensioninto consideration (the timestamp or some form of it ends up at afterthe metric name itself) for example hour of the day. This value rangingfrom 1 to 24 would be prefixed to each rowkey (i.e.7-metric.name-1349585333) - obviously this is a terrible design, becausea handful of regions would end up being overloaded based on the hour ofthe day, however we can use the metric name hash modded with the bucketcount (24 in this case) to come up with a new starting prefix base foreach metric. Now we have to add the real hour of day to the base andsubtract BC if the value is greater than BC This way all writes arestill distributed evenly across the system ... and so are the readsassuming we reading more than one hour worth of data in this case, whichis almost always true in our case. We still have to do BC scans ifreading 24+ hours of data, but the data in and across the buckets isalways correctly time sorted. We can also limit the scan count based onthe selected time range (i.e. if someone asks for data for a givenmetric between 7am and 10am we'll only have to do 3 scans for thosethree full hours).

I'm a complete newb when it comes to distributed databases, so if I'mway off on this please set me straight.


Bartek

Rowkey design for time series data

Reply via email to