Hi, My Q is around the suggested or maximum number of CFs per table (see http://hbase.apache.org/book/schema.html#number.of.cfs )
Consider the following use-case. * A multi-tenant system. * All tenants write data to the same table. * Tenants have different data retention policies. For the above use case I thought one could then just have different CFs with different TTLs because Stack suggested relying on HBase's ability to purge old rows by applying CF-specific TTLs: http://search-hadoop.com/m/VAeb52cvWHV. These CFs would have the same set of columns, just different TTLs. Then tenants who want to keep only last 1 month's worth of data go to the CF where TTL=1 month, tenants who want to keep last 6 months of data go to CF where TTL=6 months, and so on. However, tenants are not going to be evenly distributed - there will be more tenants with shorter data retention periods, which means the CFs where these tenants have their data will grow faster. If I'm reading http://hbase.apache.org/book/schema.html#number.of.cfs correctly, the advice is not to have more than 2-3 CFs per table? And what happens if I have say 6 CFs per table? Again if I read the above page correctly, the problem is that uneven data distribution will mean that whenever 1 of my CFs needs to be flushed, the remaining 5 CFs will also get flushed at the same time, and this may (or will?) trigger compaction for all CFs' files creating a sudden IO hit? Is there a good solution for this problem? Should one then have 6 different tables, each with just 1 CF instead of having 1 table with 6 CFs? Thanks, Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
