Why don't you simply identify the six different types of information per number:
- figure name (unemployment) - month (reporting) - release date - figure - revision date - revised figure the key would be: <figure name>_<month> en voila. I strongly advise against "overloading" the timestamping/versioning feature of hbase. You would still have to load the entire series and sort it by what you like, but that's not a problem with hbase. Thinking in ActiveQuant, you would store each of the columns above through it's IArchiveWriter. Then you can seamlessly view/chart it in the ActiveQuant Master Server, making it available over CSV and SOAP to your corporate intranet. Cheers On Wed, Feb 6, 2013 at 11:01 PM, James Taylor <[email protected]>wrote: > Another approach would be to use Phoenix (http://github.com/** > forcedotcom/phoenix <http://github.com/forcedotcom/phoenix>). You can > model your schema as you would in the relational world, but you get the > horizontal scalability of HBase. > > James > > > On 02/06/2013 01:49 PM, Michael Segel wrote: > >> Overloading the time stamp aka the versions of the cell is really not a >> good idea. >> >> Yeah, I know opinions are like A.... everyone has one. ;-) >> >> But you have to be aware that if someone decides to delete some data... >> well one tombstone marker for the column, goodbye all of the versions of >> the cell. >> (Any ideas on a clean easy way to remove that tombstone? ;-) >> >> You're better off using other methods of adding dimension to your cell. >> One that works well is using Avro. >> >> When I teach a course on HBase, I do mention about cells in my schema >> design section of the course. I talk about the ability to use the >> versioning as a way to add dimension and then tell the students that this >> really isn't a good idea ... >> >> -Just saying... >> >> On Feb 6, 2013, at 3:05 PM, Ian Varley <[email protected]> wrote: >> >> Alex, >>> >>> This might be an interesting use of the time dimension in HBase. Every >>> value in HBase is uniquely represented by a set of coordinates: >>> >>> - table >>> - row key >>> - column family >>> - column qualifier >>> - timestamp >>> >>> So, you can have two different values that have all the same >>> coordinates, except their timestamp. So for your example, that could be: >>> >>> - table: econ >>> - row key: "indicatorABC" >>> - column family: cf1 >>> - column qualifier: "reporting_2011-10-01" >>> >>> first value: >>> - timestamp: "2011-11-01 00:00:00.000" >>> - value: 2 >>> >>> second value: >>> - timestamp: "2011-12-01 00:00:00.000" >>> - value: 2.5 >>> >>> I.e., if you load the data such that the timestamps on the values >>> represent the release date, then you can model this in a natural way. By >>> default, reads in HBase will only give you the latest value, but you can >>> manually tell a scanner to give you "time travel" by only reporting values >>> as of an older date; so you could say "tell me what the data would have >>> said on 11/01". >>> >>> (Also, by default, HBase only keeps a limited number of historical >>> versions (3), but you can tell it to keep all versions.) >>> >>> There are some downsides to using the time dimension explicitly like >>> this: >>> - If you back date things and also work with deletes, you could get some >>> weird behavior depending on when compaction runs. >>> - If you have lots of versions of things, the server still has to read >>> over these when you scan, which makes things slower. (Probably doesn't >>> apply if you only have a couple historical versions of any given value.) >>> >>> All the usual caveats apply: don't bother with HBase unless you've got >>> some serious size in your data (e.g. TB) and need to support a heavy load >>> of real-time updates and queries. Otherwise, go with something simpler to >>> operate like a relational database, couchdb, etc. >>> >>> Ian >>> >>> On Feb 6, 2013, at 2:24 PM, Alex Grund wrote: >>> >>> Hi, >>> >>> I am a newbie in nosql-databases and I am wondering how to model a >>> specific case with Hbase. >>> >>> The thing I want to model are economic time series, such as >>> unemployment rate in a given country. >>> >>> The complicated thing is this: Values of an economic time series can, >>> but do not have to be revised. >>> >>> An example: >>> >>> Imagine, the time series is published monthly, at the first day of a >>> month with the value for the previous month, such like: >>> >>> Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 >>> Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2 >>> Unemployment; release: 2011/10/01; reporting: 2011/09/01; value: 3 >>> Unemployment; release: 2011/09/01; reporting: 2011/08/01; value: 4 >>> >>> (where "release" is the date of release and "reporting" is the date of >>> the month the "value" refers to. Read: "On Dec 1, 2011 the >>> unemployement rate for Nov 2011 was reported to be "1"). >>> >>> Now, imagine, that on every release, the value for the previous month >>> is revised, such like: >>> >>> Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 >>> Unemployment; release: 2011/12/01; reporting: 2011/10/01; value: 2.5 >>> >>> Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2 >>> Unemployment; release: 2011/11/01; reporting: 2011/09/01; value: 3.5 >>> >>> Unemployment; release: 2011/10/01; reporting: 2011/09/01; value: 3 >>> Unemployment; release: 2011/10/01; reporting: 2011/08/01; value: 4.5 >>> >>> Unemployment; release: 2011/09/01; reporting: 2011/08/01; value: 4 >>> Unemployment; release: 2011/09/01; reporting: 2011/07/01; value: 5.5 >>> >>> Read: On Oct, 1, 2011, the unemployment rate was reported to be "3" >>> for Sep, and the revised value for Aug was reported to be "4.5". >>> >>> The most recent observation (release) ex-post is: [1] >>> Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 >>> Unemployment; release: 2011/12/01; reporting: 2011/10/01; value: 2.5 >>> >>> Since the data is not revised further than one month behind, the whole >>> series ex-post would look like that: [3] >>> Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 >>> Unemployment; release: 2011/12/01; reporting: 2011/10/01; value: 2.5 >>> >>> Unemployment; release: 2011/11/01; reporting: 2011/09/01; value: 3.5 >>> >>> Unemployment; release: 2011/10/01; reporting: 2011/08/01; value: 4.5 >>> >>> Unemployment; release: 2011/09/01; reporting: 2011/07/01; value: 5.5 >>> >>> Whereas, the "known-to-market"-series would look like that: [2] >>> >>> Unemployment; release: 2011/12/01; reporting: 2011/11/01; value: 1 >>> Unemployment; release: 2011/11/01; reporting: 2011/10/01; value: 2 >>> Unemployment; release: 2011/10/01; reporting: 2011/09/01; value: 3 >>> Unemployment; release: 2011/09/01; reporting: 2011/08/01; value: 4 >>> >>> That are the series I want to get from the db. >>> >>> >>> How would you model this with Hbase? Is Hbase suitable for that >>> application? Or in general, a column oriented DB? >>> >>> Or, is a a relational approach a better fit? >>> >>> >>> Thanks! >>> >>> The opinions expressed here are mine, while they may reflect a >> cognitive thought, that is purely accidental. >> Use at your own risk. >> Michael Segel >> michael_segel (AT) hotmail.com >> >> >> >> >> >> >> >> > -- Ulrich Staudinger, Managing Director and Sr. Software Engineer, ActiveQuant GmbH P: +41 79 702 05 95 E: [email protected] http://www.activequant.com AQ-R user? Join our mailing list: http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/aqr-user
