Study the OpenTSDB at StumbleUpon described by Benoit "tsuna" Sigoure
([email protected]) in the
HBaseCon talk called "Lessons Learned from OpenTSDB".
His team have done a great job working with Time-series data, and he
gave a lot of great advices to work with this kind of data with HBase:
- Wider rows to seek faster
- Use asynchbase + Netty or Finagle(great tool created by Twitter
engineers to work with HBase) = performance ++
- Make writes idempotent and independent
before: start rows at arbitrary points in time
after: align rows on 10m (then 1h) boundaries
- Store more data per Key/Value
- Compact your data
- Use short family names
Best wishes
El 28/08/2012 20:21, Mohit Anchlia escribió:
In timeseries type data how do people deal with scenarios where one might
get multiple events in a millisecond? Using nano second approach seems
tricky. Other option is to take advantage of versions or counters.
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci