Todd, Can you define what a Locality Group is for HBase and how it would function? Eg, it sounds like the same thing as a column-family, and it's not clear how one would benefit from the usage of an LG.
Jason On Mon, Jun 13, 2011 at 8:45 PM, Todd Lipcon <[email protected]> wrote: > Keep in mind that BigTable can have a large number of CFs because they also > have Locality Groups. HBase has a 1:1 mapping of CF -> Locality Group. > > I don't know for sure, but I imagine most of these BT tables have a very > small number of locality groups, even if they have 20+ CFs. > > Would be nice to extract CFs from LGs in HBase some day, if anyone has a > month free ;-) > > -Todd > > On Mon, Jun 13, 2011 at 2:44 PM, Jason Rutherglen < > [email protected]> wrote: > >> > Table 2 provides some actual CF/table numbers. One of the crawl tables >> has >> > 16 CFs and one of the Google Base tables had 29 CFs >> >> What's Google doing in BigTable that enables so many CFs? >> >> Is the cost in HBase the seek to each individual key in the CFs, or is >> it the cost of loading each block into RAM (?), which could be >> alleviated though bypassing the block cache and accessing the blocks >> as if they're local. >> >> On Mon, Jun 13, 2011 at 2:35 PM, Leif Wickland <[email protected]> >> wrote: >> > Thanks for replying, J-D. >> > >> > My interpretation is that they try to keep that number low, from page 2: >> >> >> >> "It is our intent that the number of distinct column families in a >> >> table be small (in the hundreds at most)" >> >> >> > >> > Table 2 provides some actual CF/table numbers. One of the crawl tables >> has >> > 16 CFs and one of the Google Base tables had 29 CFs. >> > >> > >> >> Could you just store that in the same family? >> >> >> > >> > Yup. I could. Their would be a little weirdness to it, but I think it's >> > doable. It seems like that's the consensus suggestion. >> > >> > >> >> Row locking is rarely a good idea, it doesn't scale and they currently >> >> aren't persisted anywhere except the RS memory (so if it dies...). >> >> Using a single family might be better for you. >> > >> > >> > Thanks for the pointer. >> > >> > Leif >> > >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
