One use-case that applies to my tables is that I have a table with a set of columns that have data that is always processed with MR jobs, but other rather large columns that are generally only accessed through a UI. By separating those into two column families, MR jobs that do a full table scan on the MR column family run more efficiently because the scans don’t read data from the ‘ui’ column family.
> On Jun 22, 2017, at 8:44 AM, Alexander Ilyin <[email protected]> wrote: > > Hi, > > A general question regarding column families. It is said in the doc that > HBase doesn't do well with more than 2-3 column families because flushing > and compactions are done on a per region basis which should be addressed in > the future: http://hbase.apache.org/book.html#number.of.cfs > > Is it still the case in new versions of HBase or there were some > improvements on this? > > I also don't understand why using several column families might be useful > even if data access is column scoped. Why can't we just create several > tables instead? Row key is stored with every cell anyway and it's possible > to filter by column when querying. > > In general, I don't see when it might make sense to have more than one > column family in a table with current limitations. > > Thanks in advance.
