bq. Maybe we can explain why there is some impacts, or what to consider? The above would be covered in the JIRA.
Thanks On Tue, Apr 16, 2013 at 7:04 AM, Jean-Marc Spaggiari < [email protected]> wrote: > Can we add more details than just changing the maximum CF number? Maybe we > can explain why there is some impacts, or what to consider? > > JM > > 2013/4/16 Ted Yu <[email protected]> > > > If there is no objection, I will create a JIRA to increase the maximum > > number of column families described here: > > > > http://hbase.apache.org/book.html#number.of.cfs > > > > Cheers > > > > On Mon, Apr 8, 2013 at 7:21 AM, Doug Meil <[email protected] > > >wrote: > > > > > > > > > > > For the record, the refGuide mentions potential issues of CF lumpiness > > > that you mentioned: > > > > > > http://hbase.apache.org/book.html#number.of.cfs > > > > > > > > > 6.2.1. Cardinality of ColumnFamilies > > > > > > Where multiple ColumnFamilies exist in a single table, be aware of the > > > cardinality (i.e., number of rows). > > > If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 > billion > > > rows, ColumnFamilyA's data will likely be spread > > > across many, many regions (and RegionServers). This makes mass > > > scans for ColumnFamilyA less efficient. > > > > > > > > > > > > > > > > > > Š. anything that needs to be updated/added for this? > > > > > > > > > > > > > > > > > > On 4/8/13 12:39 AM, "lars hofhansl" <[email protected]> wrote: > > > > > > >I think the main problem is that all CFs have to be flushed if one > gets > > > >large enough to require a flush. > > > >(Does anyone remember why exactly that is? And do we still need that > now > > > >that the memstoreTS is stored in the HFiles?) > > > > > > > > > > > >So things are fine as long as all CFs have roughly the same size. But > if > > > >you have one that gets a lot of data and many others that are smaller, > > > >we'd end up with a lot of unnecessary and small store files from the > > > >smaller CFs. > > > > > > > >Anything else known that is bad about many column families? > > > > > > > > > > > >-- Lars > > > > > > > > > > > > > > > >________________________________ > > > > From: Andrew Purtell <[email protected]> > > > >To: "[email protected]" <[email protected]> > > > >Sent: Sunday, April 7, 2013 3:52 PM > > > >Subject: Re: schema design: rows vs wide columns > > > > > > > >Is there a pointer to evidence/experiment backed analysis of this > > > >question? > > > >I'm sure there is some basis for this text in the book but I recommend > > we > > > >strike it. We could replace it with YCSB or LoadTestTool driven > latency > > > >graphs for different workloads maybe. Although that would also be a > big > > > >simplification of 'schema design' considerations, it would not be so > > > >starkly lacking background. > > > > > > > >On Sunday, April 7, 2013, Ted Yu wrote: > > > > > > > >> From http://hbase.apache.org/book.html#number.of.cfs : > > > >> > > > >> HBase currently does not do well with anything above two or three > > column > > > >> families so keep the number of column families in your schema low. > > > >> > > > >> Cheers > > > >> > > > >> On Sun, Apr 7, 2013 at 3:04 PM, Stack <[email protected] > <javascript:;>> > > > >> wrote: > > > >> > > > >> > On Sun, Apr 7, 2013 at 11:58 AM, Ted <[email protected] > > > >><javascript:;>> > > > >> wrote: > > > >> > > > > >> > > With regard to number of column families, 3 is the recommended > > > >>maximum. > > > >> > > > > > >> > > > > >> > How did you come up w/ the number '3'? Is it a 'hard' 3? Or does > it > > > >> > depend? If the latter, on what does it depend? > > > >> > Thanks, > > > >> > St.Ack > > > >> > > > > >> > > > > > > > > > > > >-- > > > >Best regards, > > > > > > > > - Andy > > > > > > > >Problems worthy of attack prove their worth by hitting back. - Piet > Hein > > > >(via Tom White) > > > > > > > > > > > > > > >
