I am also a beginner, so I would like to ask you something about the method you proposed. HBase is column-oriented. This means (as far as I know from databases) that it stores its data column by column and not row by row. If we use the schema you suggested then when we want some of the documents for a single word we will have to access many columns and I think this will cost as a lot. I think that the locality of the data is lost using this schema.
I repeat that I am a beginner so please correct me if I am wrong. Regards, Panagiotis. > Date: Sat, 23 Apr 2011 11:25:47 +0200 > Subject: Re: HBase - Column family > From: [email protected] > To: [email protected] > > That's how I would do it: > What's nice in HBase is that you can store all the data for one of > your keywords in a single row. > Create a column family "doc_id". > Now, for each word, you create one row. > In this row, for each matching document you create one column (that's > the gotcha compared to a RDB design). > The name of the column is the doc id. The column's cell content is the weight. > > So, following your example you'd get: > > row id | column-family:column.... > HELLO | doc_id:2 | doc_id:3 | doc_id:4 > > and column values: > doc_id:2 | doc_id:3 | doc_id:4 > 12 | 45 | 36 > > HTH, > > Bernd > > > On Sat, Apr 23, 2011 at 09:56, JohnJohnGa <[email protected]> wrote: > > Hi, I'm a beginner in HBase. I need to design my table. I want to play with > > the > > following information: > > > > At the date XX-XX-XXXX, the word 'HELLO' is in document 2,3,4 and the > > weight of > > each doc is 12,45,36 - My raw data: doc:D title:'i like > > potatoes',weight:W,date:D > > > > I created a table with, row: word, column:date, value:doc But I can't store > > multiple row with the same date, for the same word. > > > > Can we create multiple column families for a table? What can be the best > > way to > > design the schema? > > > > Thanks a lot > > > >
