Thanks St.Ack -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Stack Sent: Monday, September 12, 2011 11:02 PM To: [email protected] Subject: Re: Using multiple column families
It depends on how you access the table. Three to four column families may be appropriate schema if you are accessing individual cfs mostly. Its when you do x-cf accesses, that things can slow (If most of your accesses are getting all data -- then just have one cf). Multiple cfs too if all active at the one time can make the server internal accounting a little messy. We've not spent much time studying and optimizing for this case; e.g. mult-cf flushing, compacting, querying. Because of this, query times can be slower. St.Ack On Mon, Sep 12, 2011 at 12:05 AM, Stuti Awasthi <[email protected]> wrote: > Hi, > > I am also looking answer for similar question. In my scenario we will be > having petabytes of data to handle. Currently I am working with schema which > has 3-4 column family with them. What the major issues we can face if we have > multiple column family. > > I have read that each column family will be stored as separate Hfile in > regionserver and if we search by row-id and column family that will be useful > as client will go to Hfile for specific column family. > If we have flat table structure then we will land up either having more > tables with data replication because of the data dependencies on each other. > > Please suggest > > > -----Original Message----- > From: Imran M Yousuf [mailto:[email protected]] > Sent: Saturday, September 10, 2011 6:55 AM > To: [email protected] > Subject: Re: Using multiple column families > > Hi J-D, > > Thanks for your feedback. > > (replies inline) > On Sat, Sep 10, 2011 at 5:39 AM, Jean-Daniel Cryans <[email protected]> > wrote: >> 20k rows? If this is your only use case, you don't need HBase :) >> > > Its one of several others > >> If it's 20k rows times a gazillion columns per row, then I would >> recommend flattening out the rows instead. >> > > Well, our guess is at the moment their would not be more than 500 cells per > family to start with. > >> If it's just one small table among others, then you probably won't be >> bothered by the multiple families. >> > > We actually have many other tables which are flattened out to a single column > family and this is one table for which we are using more than 1 column family. > > Thanks once again. > > Imran > >> J-D >> >> On Thu, Sep 8, 2011 at 10:07 PM, Imran M Yousuf <[email protected]> wrote: >>> Hi, >>> >>> Firstly, I have read in the mailing list before that having more >>> than >>> 1 column family is not recommended. I am more interested to know >>> whether it is a problem in my use case as well or not. >>> >>> I have a strong entitly and it has 6 weak entities all with >>> 1-to-many cardinal relationship to the strong entity. Furthermore, >>> they are all loaded in mutually exclusive manner, i.e. if A is >>> strong entity and its weak entities are P, Q, R, S, T, U in that >>> case no 2 weak entities are accessed at once. Moreover their >>> lifecycles are independent of each other. My current implementation >>> is I have one column family for the strong entity and one for each weak >>> entities. >>> So for a given row I only load one column family at a time. The >>> obvious advantages are that >>> - deleting strong entity automatically deletes the weak entities as >>> they are a single row, delete all of a kind weak entity for a >>> specific weak entity is as simple as deleting all cells in a column >>> family for a row. Our assumption (pretty high than what we expect) >>> is that we will not have more than 20k rows in that table. Under >>> these circumstance how bad is it to have 7 column families? >>> >>> We would be glad if you would kindly share thoughts and feedback on this >>> issue. >>> >>> Thank you, >>> >>> -- >>> Imran M Yousuf >>> Entrepreneur & CEO >>> Smart IT Engineering Ltd. >>> Dhaka, Bangladesh >>> Twitter: @imyousuf - http://twitter.com/imyousuf >>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/ >>> Mobile: +880-1711402557 >>> >> > > > > -- > Imran M Yousuf > Entrepreneur & CEO > Smart IT Engineering Ltd. > Dhaka, Bangladesh > Twitter: @imyousuf - http://twitter.com/imyousuf > Blog: http://imyousuf-tech.blogs.smartitengineering.com/ > Mobile: +880-1711402557 > > ::DISCLAIMER:: > ---------------------------------------------------------------------- > ------------------------------------------------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > It shall not attach any liability on the originator or HCL or its > affiliates. Any views or opinions presented in this email are solely those of > the author and may not necessarily reflect the opinions of HCL or its > affiliates. > Any form of reproduction, dissemination, copying, disclosure, > modification, distribution and / or publication of this message > without the prior written consent of the author of this e-mail is > strictly prohibited. If you have received this email in error please delete > it and notify the sender immediately. Before opening any mail and attachments > please check them for viruses and defect. > > ---------------------------------------------------------------------- > ------------------------------------------------- >
