Keep in mind that BigTable can have a large number of CFs because they also have Locality Groups. HBase has a 1:1 mapping of CF -> Locality Group.
I don't know for sure, but I imagine most of these BT tables have a very small number of locality groups, even if they have 20+ CFs. Would be nice to extract CFs from LGs in HBase some day, if anyone has a month free ;-) -Todd On Mon, Jun 13, 2011 at 2:44 PM, Jason Rutherglen < [email protected]> wrote: > > Table 2 provides some actual CF/table numbers. One of the crawl tables > has > > 16 CFs and one of the Google Base tables had 29 CFs > > What's Google doing in BigTable that enables so many CFs? > > Is the cost in HBase the seek to each individual key in the CFs, or is > it the cost of loading each block into RAM (?), which could be > alleviated though bypassing the block cache and accessing the blocks > as if they're local. > > On Mon, Jun 13, 2011 at 2:35 PM, Leif Wickland <[email protected]> > wrote: > > Thanks for replying, J-D. > > > > My interpretation is that they try to keep that number low, from page 2: > >> > >> "It is our intent that the number of distinct column families in a > >> table be small (in the hundreds at most)" > >> > > > > Table 2 provides some actual CF/table numbers. One of the crawl tables > has > > 16 CFs and one of the Google Base tables had 29 CFs. > > > > > >> Could you just store that in the same family? > >> > > > > Yup. I could. Their would be a little weirdness to it, but I think it's > > doable. It seems like that's the consensus suggestion. > > > > > >> Row locking is rarely a good idea, it doesn't scale and they currently > >> aren't persisted anywhere except the RS memory (so if it dies...). > >> Using a single family might be better for you. > > > > > > Thanks for the pointer. > > > > Leif > > > -- Todd Lipcon Software Engineer, Cloudera
