Alex, This might be an interesting use of the time dimension in HBase. Every value in HBase is uniquely represented by a set of coordinates:
- table - row key - column family - column qualifier - timestamp So, you can have two different values that have all the same coordinates, except their timestamp. So for your example, that could be: - table: econ - row key: "indicatorABC" - column family: cf1 - column qualifier: "reporting_2011-10-01" first value: - timestamp: "2011-11-01 00:00:00.000" - value: 2 second value: - timestamp: "2011-12-01 00:00:00.000" - value: 2.5 I.e., if you load the data such that the timestamps on the values represent the release date, then you can model this in a natural way. By default, reads in HBase will only give you the latest value, but you can manually tell a scanner to give you "time travel" by only reporting values as of an older date; so you could say "tell me what the data would have said on 11/01". (Also, by default, HBase only keeps a limited number of historical versions (3), but you can tell it to keep all versions.) There are some downsides to using the time dimension explicitly like this: - If you back date things and also work with deletes, you could get some weird behavior depending on when compaction runs. - If you have lots of versions of things, the server still has to read over these when you scan, which makes things slower. (Probably doesn't apply if you only have a couple historical versions of any given value.) All the usual caveats apply: don't bother with HBase unless you've got some serious size in your data (e.g. TB) and need to support a heavy load of real-time updates and queries. Otherwise, go with something simpler to operate like a relational database, couchdb, etc. Ian On Feb 6, 2013, at 2:24 PM, Alex Grund wrote: Hi, I am a newbie in nosql-databases and I am wondering how to model a specific case with Hbase. The thing I want to model are economic time series, such as unemployment rate in a given country. The complicated thing is this: Values of an economic time series can, but do not have to be revised. An example: Imagine, the time series is published monthly, at the first day of a month with the value for the previous month, such like: Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2 Unemployment; release: 2011/10/01; reporting: 2011/09/01; value: 3 Unemployment; release: 2011/09/01; reporting: 2011/08/01; value: 4 (where "release" is the date of release and "reporting" is the date of the month the "value" refers to. Read: "On Dec 1, 2011 the unemployement rate for Nov 2011 was reported to be "1"). Now, imagine, that on every release, the value for the previous month is revised, such like: Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 Unemployment; release: 2011/12/01; reporting: 2011/10/01; value: 2.5 Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2 Unemployment; release: 2011/11/01; reporting: 2011/09/01; value: 3.5 Unemployment; release: 2011/10/01; reporting: 2011/09/01; value: 3 Unemployment; release: 2011/10/01; reporting: 2011/08/01; value: 4.5 Unemployment; release: 2011/09/01; reporting: 2011/08/01; value: 4 Unemployment; release: 2011/09/01; reporting: 2011/07/01; value: 5.5 Read: On Oct, 1, 2011, the unemployment rate was reported to be "3" for Sep, and the revised value for Aug was reported to be "4.5". The most recent observation (release) ex-post is: [1] Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 Unemployment; release: 2011/12/01; reporting: 2011/10/01; value: 2.5 Since the data is not revised further than one month behind, the whole series ex-post would look like that: [3] Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 Unemployment; release: 2011/12/01; reporting: 2011/10/01; value: 2.5 Unemployment; release: 2011/11/01; reporting: 2011/09/01; value: 3.5 Unemployment; release: 2011/10/01; reporting: 2011/08/01; value: 4.5 Unemployment; release: 2011/09/01; reporting: 2011/07/01; value: 5.5 Whereas, the "known-to-market"-series would look like that: [2] Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2 Unemployment; release: 2011/10/01; reporting: 2011/09/01; value: 3 Unemployment; release: 2011/09/01; reporting: 2011/08/01; value: 4 That are the series I want to get from the db. How would you model this with Hbase? Is Hbase suitable for that application? Or in general, a column oriented DB? Or, is a a relational approach a better fit? Thanks!
