Hi Michael, I'm trying to deeply dive into HBase and forget all my RDBMS knowledge but sometime it's difficult to not try to compare and I don't have yet all the right thinking mechanism. The more Amandeep was replying yesterday, more clear it become, but seems I still have a LOT to learn.
I will never update one single value from the data I have. I will update all the columns for one row, or not any. When I need to ready them, I usually need to read all of them, or almost all. Not just one. I moved to a multiple columns architecture because I did the application with MySQL first but the more I read, the more I see that it's not the right way. I can have 2 tables. One with a key made with the person ID, and only one single CF and one C with everything into a single cell stored as a JSON output serialized using AVRO like you are suggesting. And a second table with rows ike PERSONID_PERSONADDRESS with a dummy CF and C just to keep one cell. At the end, that will meet all my needs but that will ask a bit more thinking. And it's so far from the initial design! But I think that's definitively a good solution. Thanks! JM 2012/7/3, Michael Segel <[email protected]>: > Hi, > > You're over thinking this. > > Take a step back and remember that you can store anything you want as a byte > stream in a column. > Literally. > > So you have a record that could be a text blob. Store it in one column. Use > JSON to define its structure and fields. > > The only thing that makes it difficult is that you will need to pull out > everything just to insert or update something. > So then maybe segment your data in to logical blocks. Like a column that > stores the physical attributes of the person. > Another column that stores the list of addresses for the person. > Another column that stores the list of aliases used by the person. > > Don't think in relational terms. HBase isn't relational and ER is not the > best way to model in a NoSQL database. > Think IMS/COBOL (mainframe) or Dick Pick's Revelation's OS. > > The only relationships in HBase are weak relationships between tables. > Column Families currently have some nasty side effects that you may want to > consider how you apply them. > > Think in terms of records. > > Look at storing data using Avro. > > On Jul 2, 2012, at 8:56 PM, Jean-Marc Spaggiari wrote: > >> 2012/7/2, Amandeep Khurana <[email protected]>: >>>> Here are the 2 options now. Both with a new table. >>>> >>>> 1) I store the key "personID" and a:a1 to a:an for the addresses. >>>> 2) I store the key "personID" + "address >>>> >>>> In both I will have the same amount of data. In #1 total size will be >>>> smaller since the key will be stored only once. >>>> >>>> >>> >>> The size will be the same. The underlying HFile will store 1 row per >>> cell >>> and the number of cells in both cases is the same. >>> >>> However, the first approach with multiple columns for addresses needs you >>> to >>> keep track of the number and makes updates, deletes, additions >>> complicated >>> as I highlighted earlier. The second option with putting both things in >>> the >>> key makes life much easier. >>> >>> If the data is primarily being accessed independently, I'd go with option >>> 2. >> >> Oh! I see! My misunderstanding comes from from my lack of HBase >> knowledge/reflex. I forgot it was storing the data that way. So I >> think I will most probably give a try to this 2nd option! Thanks for >> sharing your ideas all over the day. >> >> JM >> > >
