Hi,
we are managing some naturally time versioned data in HBase. That is,
there are change events that have a specific time set and when such
event is handled, data in HBase, pertaining to the exact same point in
time, is updated.
So far we are using HBase time stamps to model the time dimension. All
columns have unlimited number of versions. That worked ok so far, and
HBase's way of providing access to data at a given time or time range
seemed a natural fit.
We are aware of some tricky issues around timestamp handling (e.g. in
particular in conjunction with deletes). As we need to migrate HBase
stored data (for other reasons) shortly we are wondering, if our
approach has some long-term drawbacks that we should pay attention to
now and possibly re-design our timestamp handling as well.
So my question is:
* Is there problematic experience with using HBase timestamps as time
dimension of your data (assuming it has some natural time-based versioning)?
* Is it generally better to model time-based versioning of data within
the data structure itself (e.g. in the row key) and why?
* In case you used HBase timestamps similar to the way we use them,
feedback on how that worked is welcome as well!
Thanks,
Henning
- Using HBase timestamps as natural versioning Henning Blohm
-