bq. does not do well with anything above two or three column families Current hbase releases, such as 0.98.x, would do better than the above.
5 column families should be accommodated. Cheers On Tue, Aug 19, 2014 at 3:06 PM, Wei Liu <wei....@stellarloyalty.com> wrote: > We are doing schema design for our application, One thing we are not so > clear about is multiple column families (more than 3, probably 5 - 8) vs > multiple tables. In our use case, we will have the same number of rows in > all these column families, but some column families may be modified more > often than others, and some column families will have more columns than > others (thousands vs several). > > The reason we are thinking about multiple column families is that it > probably can give us better performance if we need to do a search with data > from multiple column families. For example, search for a row with value x > in column family A and with value Y in column family B. > > On the other hand, we saw the following paragraph in the user guide which > is scary to us: > "HBase currently does not do well with anything above two or three column > families so keep the number of column families in your schema low. > Currently, flushing and compactions are done on a per Region basis so if > one column family is carrying the bulk of the data bringing on flushes, the > adjacent families will also be flushed though the amount of data they carry > is small. When many column families the flushing and compaction interaction > can make for a bunch of needless i/o loading (To be addressed by changing > flushing and compaction to work on a per column family basis). For more > information on compactions, see Section 9.7.6.7, “Compaction” > <http://hbase.apache.org/book.html#compaction>." > > Can any one please shed some light on this topic? Thanks in advance. > > Thanks, > Wei >