Keep in mind that BigTable can have a large number of CFs because they also
have Locality Groups. HBase has a 1:1 mapping of CF -> Locality Group.

I don't know for sure, but I imagine most of these BT tables have a very
small number of locality groups, even if they have 20+ CFs.

Would be nice to extract CFs from LGs in HBase some day, if anyone has a
month free ;-)

-Todd

On Mon, Jun 13, 2011 at 2:44 PM, Jason Rutherglen <
[email protected]> wrote:

> > Table 2 provides some actual CF/table numbers.  One of the crawl tables
> has
> > 16 CFs and one of the Google Base tables had 29 CFs
>
> What's Google doing in BigTable that enables so many CFs?
>
> Is the cost in HBase the seek to each individual key in the CFs, or is
> it the cost of loading each block into RAM (?), which could be
> alleviated though bypassing the block cache and accessing the blocks
> as if they're local.
>
> On Mon, Jun 13, 2011 at 2:35 PM, Leif Wickland <[email protected]>
> wrote:
> > Thanks for replying, J-D.
> >
> > My interpretation is that they try to keep that number low, from page 2:
> >>
> >> "It is our intent that the number of distinct column families in a
> >> table be small (in the hundreds at most)"
> >>
> >
> > Table 2 provides some actual CF/table numbers.  One of the crawl tables
> has
> > 16 CFs and one of the Google Base tables had 29 CFs.
> >
> >
> >> Could you just store that in the same family?
> >>
> >
> > Yup.  I could.  Their would be a little weirdness to it, but I think it's
> > doable.  It seems like that's the consensus suggestion.
> >
> >
> >> Row locking is rarely a good idea, it doesn't scale and they currently
> >> aren't persisted anywhere except the RS memory (so if it dies...).
> >> Using a single family might be better for you.
> >
> >
> > Thanks for the pointer.
> >
> > Leif
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to