Otis, Perhaps your biggest issue will be the need to disable the table to add a new CF. So effectively you need to bring down the application to move in a new tenant.
Another thing with multiple CFs is that if one CF tends to get disproportionally more data, you will get a lot of region splitting, and the other CFs will have HFiles for a region that are very small. I think the only reasonable use of CFs is if you really need row-level atomicity across CFs. Otherwise just use multiple tables. On Thu, Mar 17, 2011 at 2:30 AM, Otis Gospodnetic < [email protected]> wrote: > Hi, > > My Q is around the suggested or maximum number of CFs per table (see > http://hbase.apache.org/book/schema.html#number.of.cfs ) > > Consider the following use-case. > * A multi-tenant system. > * All tenants write data to the same table. > * Tenants have different data retention policies. > > For the above use case I thought one could then just have different CFs > with > different TTLs because Stack suggested relying on HBase's ability to purge > old > rows by applying CF-specific TTLs: http://search-hadoop.com/m/VAeb52cvWHV. > These CFs would have the same set of columns, just different TTLs. Then > tenants > who want to keep only last 1 month's worth of data go to the CF where TTL=1 > month, tenants who want to keep last 6 months of data go to CF where TTL=6 > months, and so on. However, tenants are not going to be evenly distributed > - > there will be more tenants with shorter data retention periods, which means > the > CFs where these tenants have their data will grow faster. > > If I'm reading > http://hbase.apache.org/book/schema.html#number.of.cfscorrectly, > the advice is not to have more than 2-3 CFs per table? > And what happens if I have say 6 CFs per table? > > Again if I read the above page correctly, the problem is that uneven data > distribution will mean that whenever 1 of my CFs needs to be flushed, the > remaining 5 CFs will also get flushed at the same time, and this may (or > will?) > trigger compaction for all CFs' files creating a sudden IO hit? > > Is there a good solution for this problem? > Should one then have 6 different tables, each with just 1 CF instead of > having 1 > table with 6 CFs? > > Thanks, > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > >
