How do you plan on accessing this file name/ file path data? I mean, what are you "Get" patterns?
I would suggest that you search these mailing lists for several discussion on schema design (there was a very good one in the last month or so on tall tables vs wide tables and various techniques for optimal reads and writes) http://www.search-hadoop.com is one site that has searchable archives of the HBase mailing lists One thing to point out is that if you don't have 1500*2 columns every single day, no space is lost. Only columns that are created take space (as this is a column oriented data store). Another point is - how you read / access your data is usually most important (as you are using HBase for low-latency reads of BigData). So - having more rows and using Scanners may turn out to be faster / easier for you to access this data. So, build your schema with this in mind rather than ease of storage. --Suraj On Fri, Mar 11, 2011 at 4:10 PM, Rickm <[email protected]> wrote: > Suraj Varma <svarma.ng@...> writes: > >> >> It is a bit unusual, I think. >> >> To begin with, the number of versions is set when you create a >> ColumnFamily - so, you are signing up for every column in that column >> family having 1500 versions which you may or may not want. >> > Yes, that is correct. In my case is just one or two columns in the family > column. > >> Secondly, if your goal is to select a specific one of those email >> addresses, how can you select from these versioned values (e.g. to >> select the "home" email ... what do you do?) > > I'm HBase newbie, still didn't start developing, just initiating the design. > But > I guess I should have to iterate searching for the value. > > My scenario is this: I will have a row per day and userid. I need to store a > list of filenames and filepaths (no more than 1500 per day). So instead having > 3000 columns I though of having just 2 columns with 1500 versions. > >> >> A good read on time versioning is: >> http://outerthought.org/blog/417-ot/version/2 which also points out >> some gotchas. > Yeah, I saw that one and it is good but not enough info about the way I'm > approaching this. > >> >> Finally, I'm always a bit leery (or careful?) towards using features >> that are not intended to be used in such ways - a lot of things hang >> off of the hbase cell time versioning (major_compactions, delete >> markers, replication, etc etc all use the cell's time version to >> determine state) ... so, using it in unusual ways may bring up some >> gotchas. >> >> It is an interesting question, though - if anyone of the list has >> tried such things, it would be good to hear about it. > > Yeah. If anyone has anything to comment about this approach I will appreciate. > It's hard to find HBase docs and I couldn't find any books, it's all spread on > the internet and a lot of deprecated info too. > > Thanks a lot. > > >> --Suraj >> > > > >
