> On 02/06/2013 01:49 PM, Michael Segel wrote: > >> Overloading the time stamp aka the versions of the cell is really not a >> good idea. >> >> Fully agree.
> Yeah, I know opinions are like A.... everyone has one. ;-) >> >> Yeah, but some people share one. > But you have to be aware that if someone decides to delete some data... >> well one tombstone marker for the column, goodbye all of the versions of >> the cell. >> (Any ideas on a clean easy way to remove that tombstone? ;-) >> >> You're better off using other methods of adding dimension to your cell. >> One that works well is using Avro. >> >> > >>> All the usual caveats apply: don't bother with HBase unless you've got >>> some serious size in your data (e.g. TB) and need to support a heavy load >>> of real-time updates and queries. Otherwise, go with something simpler to >>> operate like a relational database, couchdb, etc. >>> >>> While this is a valid point for just storing it and working on your own with data, there are reasons why you want to choose a data integration platform (more on this later). Back to the root discussion. Why don't you simply identify the six different types of information per number: - figure name (unemployment) - month (reporting) - release date - figure - revision date - revised figure the key would be: <figure name>_<month> en voila. I strongly advise against "overloading" the timestamping/versioning feature of hbase. You would still have to load the entire series and sort it by what you like, but that's not a problem with hbase. <snip> Thinking in ActiveQuant, you would store each of the columns above through it's IArchiveWriter. Then you can seamlessly view/chart it in the ActiveQuant Master Server, making it available over CSV and SOAP to your corporate intranet or to Excel through the AQ plugin. </snip> -- Ulrich Staudinger http://www.activequant.org Connect online: https://www.xing.com/profile/Ulrich_Staudinger
