It depends on how you access the table. Three to four column families may be appropriate schema if you are accessing individual cfs mostly. Its when you do x-cf accesses, that things can slow (If most of your accesses are getting all data -- then just have one cf). Multiple cfs too if all active at the one time can make the server internal accounting a little messy. We've not spent much time studying and optimizing for this case; e.g. mult-cf flushing, compacting, querying. Because of this, query times can be slower.
St.Ack On Mon, Sep 12, 2011 at 12:05 AM, Stuti Awasthi <[email protected]> wrote: > Hi, > > I am also looking answer for similar question. In my scenario we will be > having petabytes of data to handle. Currently I am working with schema which > has 3-4 column family with them. What the major issues we can face if we have > multiple column family. > > I have read that each column family will be stored as separate Hfile in > regionserver and if we search by row-id and column family that will be useful > as client will go to Hfile for specific column family. > If we have flat table structure then we will land up either having more > tables with data replication because of the data dependencies on each other. > > Please suggest > > > -----Original Message----- > From: Imran M Yousuf [mailto:[email protected]] > Sent: Saturday, September 10, 2011 6:55 AM > To: [email protected] > Subject: Re: Using multiple column families > > Hi J-D, > > Thanks for your feedback. > > (replies inline) > On Sat, Sep 10, 2011 at 5:39 AM, Jean-Daniel Cryans <[email protected]> > wrote: >> 20k rows? If this is your only use case, you don't need HBase :) >> > > Its one of several others > >> If it's 20k rows times a gazillion columns per row, then I would >> recommend flattening out the rows instead. >> > > Well, our guess is at the moment their would not be more than 500 cells per > family to start with. > >> If it's just one small table among others, then you probably won't be >> bothered by the multiple families. >> > > We actually have many other tables which are flattened out to a single column > family and this is one table for which we are using more than 1 column family. > > Thanks once again. > > Imran > >> J-D >> >> On Thu, Sep 8, 2011 at 10:07 PM, Imran M Yousuf <[email protected]> wrote: >>> Hi, >>> >>> Firstly, I have read in the mailing list before that having more than >>> 1 column family is not recommended. I am more interested to know >>> whether it is a problem in my use case as well or not. >>> >>> I have a strong entitly and it has 6 weak entities all with 1-to-many >>> cardinal relationship to the strong entity. Furthermore, they are all >>> loaded in mutually exclusive manner, i.e. if A is strong entity and >>> its weak entities are P, Q, R, S, T, U in that case no 2 weak >>> entities are accessed at once. Moreover their lifecycles are >>> independent of each other. My current implementation is I have one >>> column family for the strong entity and one for each weak entities. >>> So for a given row I only load one column family at a time. The >>> obvious advantages are that >>> - deleting strong entity automatically deletes the weak entities as >>> they are a single row, delete all of a kind weak entity for a >>> specific weak entity is as simple as deleting all cells in a column >>> family for a row. Our assumption (pretty high than what we expect) is >>> that we will not have more than 20k rows in that table. Under these >>> circumstance how bad is it to have 7 column families? >>> >>> We would be glad if you would kindly share thoughts and feedback on this >>> issue. >>> >>> Thank you, >>> >>> -- >>> Imran M Yousuf >>> Entrepreneur & CEO >>> Smart IT Engineering Ltd. >>> Dhaka, Bangladesh >>> Twitter: @imyousuf - http://twitter.com/imyousuf >>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/ >>> Mobile: +880-1711402557 >>> >> > > > > -- > Imran M Yousuf > Entrepreneur & CEO > Smart IT Engineering Ltd. > Dhaka, Bangladesh > Twitter: @imyousuf - http://twitter.com/imyousuf > Blog: http://imyousuf-tech.blogs.smartitengineering.com/ > Mobile: +880-1711402557 > > ::DISCLAIMER:: > ----------------------------------------------------------------------------------------------------------------------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > It shall not attach any liability on the originator or HCL or its affiliates. > Any views or opinions presented in > this email are solely those of the author and may not necessarily reflect the > opinions of HCL or its affiliates. > Any form of reproduction, dissemination, copying, disclosure, modification, > distribution and / or publication of > this message without the prior written consent of the author of this e-mail > is strictly prohibited. If you have > received this email in error please delete it and notify the sender > immediately. Before opening any mail and > attachments please check them for viruses and defect. > > ----------------------------------------------------------------------------------------------------------------------- >
