How do you plan on accessing this file name/ file path data? I mean,
what are you "Get" patterns?

I would suggest that you search these mailing lists for several
discussion on schema design (there was a very good one in the last
month or so on tall tables vs wide tables and various techniques for
optimal reads and writes)
http://www.search-hadoop.com is one site that has searchable archives
of the HBase mailing lists

One thing to point out is that if you don't have 1500*2 columns every
single day, no space is lost. Only columns that are created take space
(as this is a column oriented data store).

Another point is - how you read / access your data is usually most
important (as you are using HBase for low-latency reads of BigData).
So - having more rows and using Scanners may turn out to be faster /
easier for you to access this data. So, build your schema with this in
mind rather than ease of storage.

--Suraj

On Fri, Mar 11, 2011 at 4:10 PM, Rickm <[email protected]> wrote:
> Suraj Varma <svarma.ng@...> writes:
>
>>
>> It is a bit unusual, I think.
>>
>> To begin with, the number of versions is set when you create a
>> ColumnFamily - so, you are signing up for every column in that column
>> family having 1500 versions which you may or may not want.
>>
> Yes, that is correct. In my case is just one or two columns in the family
> column.
>
>> Secondly, if your goal is to select a specific one of those email
>> addresses, how can you select from these versioned values (e.g. to
>> select the "home" email ... what do you do?)
>
> I'm HBase newbie, still didn't start developing, just initiating the design. 
> But
> I guess I should have to iterate searching for the value.
>
> My scenario is this: I will have a row per day and userid. I need to store a
> list of filenames and filepaths (no more than 1500 per day). So instead having
> 3000 columns I though of having just 2 columns with 1500 versions.
>
>>
>> A good read on time versioning is:
>> http://outerthought.org/blog/417-ot/version/2 which also points out
>> some gotchas.
> Yeah, I saw that one and it is good but not enough info about the way I'm
> approaching this.
>
>>
>> Finally, I'm always a bit leery (or careful?) towards using features
>> that are not intended to be used in such ways - a lot of things hang
>> off of the hbase cell time versioning (major_compactions, delete
>> markers, replication, etc etc all use the cell's time version to
>> determine state) ... so, using it in unusual ways may bring up some
>> gotchas.
>>
>> It is an interesting question, though - if anyone of the list has
>> tried such things, it would be good to hear about it.
>
> Yeah. If anyone has anything to comment about this approach I will appreciate.
> It's hard to find HBase docs and I couldn't find any books, it's all spread on
> the internet and a lot of deprecated info too.
>
> Thanks a lot.
>
>
>> --Suraj
>>
>
>
>
>

Reply via email to