Comparisons are fine. Try to not think of this in terms of rows and columns, but in terms of records. Think of each record as being atomic. Create a list of all of the components that make up that record. Then combine like components in to structures.
Like the Street Address. Add in a couple of fields to suggest when the person lived there. If there is no end date, it must be a current address. You could put them in an Array however array's imply a finite size. Ordered set or list would be more appropriate. Each of these structures then becomes a column. On Jul 3, 2012, at 7:31 AM, Jean-Marc Spaggiari wrote: > Hi Michael, > > I'm trying to deeply dive into HBase and forget all my RDBMS knowledge > but sometime it's difficult to not try to compare and I don't have yet > all the right thinking mechanism. The more Amandeep was replying > yesterday, more clear it become, but seems I still have a LOT to > learn. > > I will never update one single value from the data I have. I will > update all the columns for one row, or not any. When I need to ready > them, I usually need to read all of them, or almost all. Not just one. > I moved to a multiple columns architecture because I did the > application with MySQL first but the more I read, the more I see that > it's not the right way. > > I can have 2 tables. > One with a key made with the person ID, and only one single CF and one > C with everything into a single cell stored as a JSON output > serialized using AVRO like you are suggesting. > And a second table with rows ike PERSONID_PERSONADDRESS with a dummy > CF and C just to keep one cell. > > At the end, that will meet all my needs but that will ask a bit more > thinking. And it's so far from the initial design! But I think that's > definitively a good solution. > > Thanks! > > JM > > 2012/7/3, Michael Segel <[email protected]>: >> Hi, >> >> You're over thinking this. >> >> Take a step back and remember that you can store anything you want as a byte >> stream in a column. >> Literally. >> >> So you have a record that could be a text blob. Store it in one column. Use >> JSON to define its structure and fields. >> >> The only thing that makes it difficult is that you will need to pull out >> everything just to insert or update something. >> So then maybe segment your data in to logical blocks. Like a column that >> stores the physical attributes of the person. >> Another column that stores the list of addresses for the person. >> Another column that stores the list of aliases used by the person. >> >> Don't think in relational terms. HBase isn't relational and ER is not the >> best way to model in a NoSQL database. >> Think IMS/COBOL (mainframe) or Dick Pick's Revelation's OS. >> >> The only relationships in HBase are weak relationships between tables. >> Column Families currently have some nasty side effects that you may want to >> consider how you apply them. >> >> Think in terms of records. >> >> Look at storing data using Avro. >> >> On Jul 2, 2012, at 8:56 PM, Jean-Marc Spaggiari wrote: >> >>> 2012/7/2, Amandeep Khurana <[email protected]>: >>>>> Here are the 2 options now. Both with a new table. >>>>> >>>>> 1) I store the key "personID" and a:a1 to a:an for the addresses. >>>>> 2) I store the key "personID" + "address >>>>> >>>>> In both I will have the same amount of data. In #1 total size will be >>>>> smaller since the key will be stored only once. >>>>> >>>>> >>>> >>>> The size will be the same. The underlying HFile will store 1 row per >>>> cell >>>> and the number of cells in both cases is the same. >>>> >>>> However, the first approach with multiple columns for addresses needs you >>>> to >>>> keep track of the number and makes updates, deletes, additions >>>> complicated >>>> as I highlighted earlier. The second option with putting both things in >>>> the >>>> key makes life much easier. >>>> >>>> If the data is primarily being accessed independently, I'd go with option >>>> 2. >>> >>> Oh! I see! My misunderstanding comes from from my lack of HBase >>> knowledge/reflex. I forgot it was storing the data that way. So I >>> think I will most probably give a try to this 2nd option! Thanks for >>> sharing your ideas all over the day. >>> >>> JM >>> >> >> >
