One use-case that applies to my tables is that I have a table with a set of
columns that have data that is always processed with MR jobs, but other rather 
large columns
that are generally only accessed through a UI. By separating those into two
column families, MR jobs that do a full table scan on the MR column family
run more efficiently because the scans don’t read data from the ‘ui’ column 
family.

> On Jun 22, 2017, at 8:44 AM, Alexander Ilyin <[email protected]> wrote:
> 
> Hi,
> 
> A general question regarding column families. It is said in the doc that
> HBase doesn't do well with more than 2-3 column families because flushing
> and compactions are done on a per region basis which should be addressed in
> the future: http://hbase.apache.org/book.html#number.of.cfs
> 
> Is it still the case in new versions of HBase or there were some
> improvements on this?
> 
> I also don't understand why using several column families might be useful
> even if data access is column scoped. Why can't we just create several
> tables instead? Row key is stored with every cell anyway and it's possible
> to filter by column when querying.
> 
> In general, I don't see when it might make sense to have more than one
> column family in a table with current limitations.
> 
> Thanks in advance.

Reply via email to