I think that two separate tables can work, because users usually fetch file info and the blob of specific file is fetched rarely.
On Tue, 20 Mar 2012 04:32:48 -0500 Michael Segel <[email protected]> wrote: > Yes, > > Currently if one of the column family causes a split, then all of the column > families get split. So if you are dealing with a large blob, you're going to > shoot yourself in the foot. > > Are you filtering on any of the values in the 'info' family? > If not, you could try creating a serialized record. (AVRO is an example) for > the info data, > and then store the data in a single column family where one column contains > the info rec and the other column contains the blob. > > Or you could use two tables with the same row key. But that would mean two > get()s... having said that if you were doing a table scan, you'd want to scan > the info column and based on the results, you would fetch back the blob. > > HTH > > -Mike > > On Mar 20, 2012, at 3:56 AM, Laxman wrote: > > > Do we see any problem with the below schema? > > > > family "info": > > "info:pg" - keeps page number > > "info:id" - sender ID > > "info:nm" - pdf name > > "info:prop_name" - column to hold property name > > "info:prop_value" - column to hold property value > > family "data": > > "data:blob" - blob of pdf file > > > > -- > > Regards, > > Laxman > >> -----Original Message----- > >> From: Konrad Tendera [mailto:[email protected]] > >> Sent: Monday, March 19, 2012 8:22 PM > >> To: [email protected] > >> Subject: Rows vs. Columns > >> > >> Hello, > >> > >> I'm designing some schema for my use case and I'm considering what will > >> be better: rows or columns. Here's what I need - my schema actually > >> looks like this (it will be used for keeping not large pdf files or > >> single pages of larger document) > >> table files: > >> family "info": > >> "info:pg" - keeps page number > >> "info:id" - sender ID > >> "info:nm" - pdf name > >> *** > >> family "data": > >> "data:blob" - blob of pdf file > >> > >> Now let's get back to ***: each user can add multiple of additional > >> properties ("name" - "value"), but let's assume that every user will be > >> so creative that there won't be two same names. I don't know how solve > >> this problem: each "name" will be new column ("info:name") or I should > >> try to do this like it is said here: > >> http://hbase.apache.org/book.html#schema.smackdown.rowscols and make > >> new > >> row for earch property? > >> > >> K. > > > > > -- Konrad Tendera
