"So things are fine as long as all CFs have roughly the same size. But if you have one that gets a lot of data and many others that are smaller, we'd end up with a lot of unnecessary and small store files from the smaller CFs."
This is true. I am not very sure of other reasons. We any way ensure cross CF atomicity with a single row. Regards Ram On Mon, Apr 8, 2013 at 10:09 AM, lars hofhansl <[email protected]> wrote: > I think the main problem is that all CFs have to be flushed if one gets > large enough to require a flush. > (Does anyone remember why exactly that is? And do we still need that now > that the memstoreTS is stored in the HFiles?) > > > So things are fine as long as all CFs have roughly the same size. But if > you have one that gets a lot of data and many others that are smaller, we'd > end up with a lot of unnecessary and small store files from the smaller CFs. > > Anything else known that is bad about many column families? > > > -- Lars > > > > ________________________________ > From: Andrew Purtell <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Sunday, April 7, 2013 3:52 PM > Subject: Re: schema design: rows vs wide columns > > Is there a pointer to evidence/experiment backed analysis of this question? > I'm sure there is some basis for this text in the book but I recommend we > strike it. We could replace it with YCSB or LoadTestTool driven latency > graphs for different workloads maybe. Although that would also be a big > simplification of 'schema design' considerations, it would not be so > starkly lacking background. > > On Sunday, April 7, 2013, Ted Yu wrote: > > > From http://hbase.apache.org/book.html#number.of.cfs : > > > > HBase currently does not do well with anything above two or three column > > families so keep the number of column families in your schema low. > > > > Cheers > > > > On Sun, Apr 7, 2013 at 3:04 PM, Stack <[email protected] <javascript:;>> > > wrote: > > > > > On Sun, Apr 7, 2013 at 11:58 AM, Ted <[email protected]<javascript:;>> > > wrote: > > > > > > > With regard to number of column families, 3 is the recommended > maximum. > > > > > > > > > > How did you come up w/ the number '3'? Is it a 'hard' 3? Or does it > > > depend? If the latter, on what does it depend? > > > Thanks, > > > St.Ack > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
