Hi Michael,

I'm trying to deeply dive into HBase and forget all my RDBMS knowledge
but sometime it's difficult to not try to compare and I don't have yet
all the right thinking mechanism. The more Amandeep was replying
yesterday, more clear it become, but seems I still have a LOT to
learn.

I will never update one single value from the data I have. I will
update all the columns for one row, or not any. When I need to ready
them, I usually need to read all of them, or almost all. Not just one.
I moved to a multiple columns architecture because I did the
application with MySQL first but the more I read, the more I see that
it's not the right way.

I can have 2 tables.
One with a key made with the person ID, and only one single CF and one
C with everything into a single cell stored as a JSON output
serialized using AVRO like you are suggesting.
And a second table with rows ike PERSONID_PERSONADDRESS with a dummy
CF and C just to keep one cell.

At the end, that will meet all my needs but that will ask a bit more
thinking. And it's so far from the initial design! But I think that's
definitively a good solution.

Thanks!

JM

2012/7/3, Michael Segel <[email protected]>:
> Hi,
>
> You're over thinking this.
>
> Take a step back and remember that you can store anything you want as a byte
> stream in a column.
> Literally.
>
> So you have a record that could be a text blob. Store it in one column. Use
> JSON to define its structure and fields.
>
> The only thing that makes it difficult is that you will need to pull out
> everything just to insert or update something.
> So then maybe segment your data in to logical blocks. Like a column that
> stores the physical attributes of the person.
> Another column that stores the list of addresses for the person.
> Another column that stores the list of aliases used by the person.
>
> Don't think in relational terms. HBase isn't relational and ER is not the
> best way to model in a NoSQL database.
> Think IMS/COBOL (mainframe) or Dick Pick's Revelation's OS.
>
> The only relationships in HBase are weak relationships between tables.
> Column Families currently have some nasty side effects that you may want to
> consider how you apply them.
>
> Think in terms of records.
>
> Look at storing data using Avro.
>
> On Jul 2, 2012, at 8:56 PM, Jean-Marc Spaggiari wrote:
>
>> 2012/7/2, Amandeep Khurana <[email protected]>:
>>>> Here are the 2 options now. Both with a new table.
>>>>
>>>> 1) I store the key "personID" and a:a1 to a:an for the addresses.
>>>> 2) I store the key "personID" + "address
>>>>
>>>> In both I will have the same amount of data. In #1 total size will be
>>>> smaller since the key will be stored only once.
>>>>
>>>>
>>>
>>> The size will be the same. The underlying HFile will store 1 row per
>>> cell
>>> and the number of cells in both cases is the same.
>>>
>>> However, the first approach with multiple columns for addresses needs you
>>> to
>>> keep track of the number and makes updates, deletes, additions
>>> complicated
>>> as I highlighted earlier. The second option with putting both things in
>>> the
>>> key makes life much easier.
>>>
>>> If the data is primarily being accessed independently, I'd go with option
>>> 2.
>>
>> Oh! I see! My misunderstanding comes from from my lack of HBase
>> knowledge/reflex. I forgot it was storing the data that way. So I
>> think I will most probably give a try to this 2nd option! Thanks for
>> sharing your ideas all over the day.
>>
>> JM
>>
>
>

Reply via email to