Jason: See the BigTable paper.
The HBase architecture is missing a tier from what is described in the BigTable paper. How LGs would function is how CFs currently work in HBase. In BigTable, they have an added functionality where they can group their notion of CFs into a LG. Apparently they can change their schema as they go to change what CFs are in what LGs (Out in their SST files, I'd imagine, every entry is not from the same CF as it is in our HFiles). St.Ack On Wed, Jun 22, 2011 at 10:13 AM, Jason Rutherglen <[email protected]> wrote: > Todd, > > Can you define what a Locality Group is for HBase and how it would > function? Eg, it sounds like the same thing as a column-family, and > it's not clear how one would benefit from the usage of an LG. > > Jason > > On Mon, Jun 13, 2011 at 8:45 PM, Todd Lipcon <[email protected]> wrote: >> Keep in mind that BigTable can have a large number of CFs because they also >> have Locality Groups. HBase has a 1:1 mapping of CF -> Locality Group. >> >> I don't know for sure, but I imagine most of these BT tables have a very >> small number of locality groups, even if they have 20+ CFs. >> >> Would be nice to extract CFs from LGs in HBase some day, if anyone has a >> month free ;-) >> >> -Todd >> >> On Mon, Jun 13, 2011 at 2:44 PM, Jason Rutherglen < >> [email protected]> wrote: >> >>> > Table 2 provides some actual CF/table numbers. One of the crawl tables >>> has >>> > 16 CFs and one of the Google Base tables had 29 CFs >>> >>> What's Google doing in BigTable that enables so many CFs? >>> >>> Is the cost in HBase the seek to each individual key in the CFs, or is >>> it the cost of loading each block into RAM (?), which could be >>> alleviated though bypassing the block cache and accessing the blocks >>> as if they're local. >>> >>> On Mon, Jun 13, 2011 at 2:35 PM, Leif Wickland <[email protected]> >>> wrote: >>> > Thanks for replying, J-D. >>> > >>> > My interpretation is that they try to keep that number low, from page 2: >>> >> >>> >> "It is our intent that the number of distinct column families in a >>> >> table be small (in the hundreds at most)" >>> >> >>> > >>> > Table 2 provides some actual CF/table numbers. One of the crawl tables >>> has >>> > 16 CFs and one of the Google Base tables had 29 CFs. >>> > >>> > >>> >> Could you just store that in the same family? >>> >> >>> > >>> > Yup. I could. Their would be a little weirdness to it, but I think it's >>> > doable. It seems like that's the consensus suggestion. >>> > >>> > >>> >> Row locking is rarely a good idea, it doesn't scale and they currently >>> >> aren't persisted anywhere except the RS memory (so if it dies...). >>> >> Using a single family might be better for you. >>> > >>> > >>> > Thanks for the pointer. >>> > >>> > Leif >>> > >>> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >
