Brian, Ted, thank you for your answers. Ted, could you point out the HBase version where per column family flush first appeared?
On Thu, Jun 22, 2017 at 4:06 PM, Ted Yu <[email protected]> wrote: > bq. HBase doesn't do well with more than 2-3 column families > > The above is out of date - we have per column family flush which would > reduce the number of small hfiles. > > bq. Why can't we just create several tables instead? > > Currently hbase doesn't provide transaction across region boundary. This > means with more than one table, burden is on application code to > achieve transaction > where needed. > Since the multiple tables tend to have same row key design as you > mentioned, region servers carry more regions, increasing load on assignment > manager / balancer, etc. > > Cheers > > On Thu, Jun 22, 2017 at 5:44 AM, Alexander Ilyin <[email protected]> > wrote: > > > Hi, > > > > A general question regarding column families. It is said in the doc that > > HBase doesn't do well with more than 2-3 column families because flushing > > and compactions are done on a per region basis which should be addressed > in > > the future: http://hbase.apache.org/book.html#number.of.cfs > > > > Is it still the case in new versions of HBase or there were some > > improvements on this? > > > > I also don't understand why using several column families might be useful > > even if data access is column scoped. Why can't we just create several > > tables instead? Row key is stored with every cell anyway and it's > possible > > to filter by column when querying. > > > > In general, I don't see when it might make sense to have more than one > > column family in a table with current limitations. > > > > Thanks in advance. > > >
