I think that two separate tables can work, because users usually fetch file 
info and the blob of specific file is fetched rarely.

On Tue, 20 Mar 2012 04:32:48 -0500
Michael Segel <[email protected]> wrote:

> Yes, 
> 
> Currently if one of the column family causes a split, then all of the column 
> families get split. So if you are dealing with a large blob, you're going to 
> shoot yourself in the foot. 
> 
> Are you filtering on any of the values in the 'info' family? 
> If not, you could try creating a serialized record. (AVRO is an example) for 
> the info data, 
> and then store the data in a single column family where one column contains 
> the info rec and the other column contains the blob. 
> 
> Or you could use two tables with the same row key. But that would mean two 
> get()s... having said that if you were doing a table scan, you'd want to scan 
> the info column and based on the results, you would fetch back the blob.
> 
> HTH
> 
> -Mike
> 
> On Mar 20, 2012, at 3:56 AM, Laxman wrote:
> 
> > Do we see any problem with the below schema?
> > 
> >      family "info":
> >          "info:pg" - keeps page number
> >          "info:id" - sender ID
> >          "info:nm" - pdf name
> >          "info:prop_name" - column to hold property name
> >          "info:prop_value" - column to hold property value
> >      family "data":
> >          "data:blob" - blob of pdf file
> > 
> > --
> > Regards,
> > Laxman
> >> -----Original Message-----
> >> From: Konrad Tendera [mailto:[email protected]]
> >> Sent: Monday, March 19, 2012 8:22 PM
> >> To: [email protected]
> >> Subject: Rows vs. Columns
> >> 
> >> Hello,
> >> 
> >> I'm designing some schema for my use case and I'm considering what will
> >> be better: rows or columns. Here's what I need - my schema actually
> >> looks like this (it will be used for keeping not large pdf files or
> >> single pages of larger document)
> >> table files:
> >>     family "info":
> >>         "info:pg" - keeps page number
> >>         "info:id" - sender ID
> >>         "info:nm" - pdf name
> >>         ***
> >>     family "data":
> >>         "data:blob" - blob of pdf file
> >> 
> >> Now let's get back to ***: each user can add multiple of additional
> >> properties ("name" - "value"), but let's assume that every user will be
> >> so creative that there won't be two same names. I don't know how solve
> >> this problem: each "name" will be new column ("info:name") or I should
> >> try to do this like it is said here:
> >> http://hbase.apache.org/book.html#schema.smackdown.rowscols and make
> >> new
> >> row for earch property?
> >> 
> >> K.
> > 
> > 
> 


-- 
Konrad Tendera

Reply via email to