Patrick,
Perhaps I misunderstood Otis' design. I thought he'd create the CF based on duration. So you could have a CF for (daily, weekly, monthly, annual, indefinite). So that you set up the table once with all CFs. Then you'd write the data to one and only one of those buckets. The only time you'd have a problem is if you have a tenant who switches their retention policy. Although you could move data still in a CF so that you still only query one CF for data. With respect to your discussion on region splits.. So you're saying that if one CF splits then all of the CFs are affected and split as well? Thx -Mike > Date: Thu, 17 Mar 2011 11:26:35 -0400 > Subject: Re: Suggested and max number of CFs per table > From: [email protected] > To: [email protected] > CC: [email protected] > > Otis, > > Perhaps your biggest issue will be the need to disable the table to add a > new CF. So effectively you need to bring down the application to move in a > new tenant. > > Another thing with multiple CFs is that if one CF tends to get > disproportionally more data, you will get a lot of region splitting, and the > other CFs will have HFiles for a region that are very small. > > I think the only reasonable use of CFs is if you really need row-level > atomicity across CFs. Otherwise just use multiple tables. > > > On Thu, Mar 17, 2011 at 2:30 AM, Otis Gospodnetic < > [email protected]> wrote: > > > Hi, > > > > My Q is around the suggested or maximum number of CFs per table (see > > http://hbase.apache.org/book/schema.html#number.of.cfs ) > > > > Consider the following use-case. > > * A multi-tenant system. > > * All tenants write data to the same table. > > * Tenants have different data retention policies. > > > > For the above use case I thought one could then just have different CFs > > with > > different TTLs because Stack suggested relying on HBase's ability to purge > > old > > rows by applying CF-specific TTLs: http://search-hadoop.com/m/VAeb52cvWHV. > > These CFs would have the same set of columns, just different TTLs. Then > > tenants > > who want to keep only last 1 month's worth of data go to the CF where TTL=1 > > month, tenants who want to keep last 6 months of data go to CF where TTL=6 > > months, and so on. However, tenants are not going to be evenly distributed > > - > > there will be more tenants with shorter data retention periods, which means > > the > > CFs where these tenants have their data will grow faster. > > > > If I'm reading > > http://hbase.apache.org/book/schema.html#number.of.cfscorrectly, > > the advice is not to have more than 2-3 CFs per table? > > And what happens if I have say 6 CFs per table? > > > > Again if I read the above page correctly, the problem is that uneven data > > distribution will mean that whenever 1 of my CFs needs to be flushed, the > > remaining 5 CFs will also get flushed at the same time, and this may (or > > will?) > > trigger compaction for all CFs' files creating a sudden IO hit? > > > > Is there a good solution for this problem? > > Should one then have 6 different tables, each with just 1 CF instead of > > having 1 > > table with 6 CFs? > > > > Thanks, > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > >
