HBase currently keeps a single META region (Doesn't split it). ROOT holds META region location, and META has a few rows in it, a few of them for each table. See also the class MetaScanner.
On Thu, Aug 23, 2012 at 9:00 PM, Lin Ma <[email protected]> wrote: > Dong, > > Some more thoughts, after reading data structure for HRegionInfo => > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html, > start key and end key looks informative which we could leverage, > > - I am not sure if we could leverage this information (stored as part of > value in table ROOT) to find which META region may contains region server > information for row-key 123 of data table ABC; > - But I think unfortunately the information is stored in value of table > ROOT, other than key field of table ROOT, so that we have to iterate each > row in ROOT table one by one to figure out which META region server to > access. > > Not sure if I get the points. Please feel free to correct me. > > regards, > Lin > > On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma <[email protected]> wrote: > >> Doug, very informative document. Thanks a lot! >> >> I read through it and have some thoughts, >> >> - Supposing at the beginning, client side cache for region information is >> empty, and the client wants to GET row-key 123 from table ABC; >> - The client will read from ROOT table at first. But unfortunately, ROOT >> table only contains region information for META table (please correct me if >> I am wrong), but not region information for real data table (e.g. table >> ABC); >> - Does the client have to call each META region server one by one, in >> order to find which META region contains information for region owner of >> row-key 123 of data table ABC? >> >> BTW: I think if there is a way to expose information about what range of >> table/region each META region contains from .META. region key, it will be >> better to save time to iterate META region server one by one. Please feel >> free to correct me if I am wrong. >> >> regards, >> Lin >> >> >> On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil >> <[email protected]>wrote: >> >>> >>> For further information about the catalog tables and region-regionserver >>> assignment, see thisŠ >>> >>> http://hbase.apache.org/book.html#arch.catalog >>> >>> >>> >>> >>> >>> >>> On 8/19/12 7:36 AM, "Lin Ma" <[email protected]> wrote: >>> >>> >Thank you Stack, especially for the smart 6 round trip guess for the >>> >puzzle. :-) >>> > >>> >1. "Yeah, we client cache's locations, not the data." -- does it mean for >>> >each client, it will cache all location information of a HBase cluster, >>> >i.e. which physical server owns which region? Supposing each region has >>> >128M bytes, for a big cluster (P-bytes level), total data size / 128M is >>> >not a trivial number, not sure if any overhead to client? >>> >2. A bit confused by what do you mean "not the data"? For the client >>> >cached >>> >location information, it should be the data in table METADATA, which is >>> >region / physical server mapping data. Why you say not data (do you mean >>> >real content in each region)? >>> > >>> >regards, >>> >Lin >>> > >>> >On Sun, Aug 19, 2012 at 12:40 PM, Stack <[email protected]> wrote: >>> > >>> >> On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma <[email protected]> wrote: >>> >> > Hello guys, >>> >> > >>> >> > I am referencing the Big Table paper about how a client locates a >>> >>tablet. >>> >> > In section 5.1 Tablet location, it is mentioned that client will >>> cache >>> >> all >>> >> > tablet locations, I think it means client will cache root tablet in >>> >> > METADATA table, and all other tablets in METADATA table (which means >>> >> client >>> >> > cache the whole METADATA table?). My question is, whether HBase >>> >> implements >>> >> > in the same or similar way? My concern or confusion is, supposing >>> each >>> >> > tablet or region file is 128M bytes, it will be very huge space (i.e. >>> >> > memory footprint) for each client to cache all tablets or region >>> >>files of >>> >> > METADATA table. Is it doable or feasible in real HBase clusters? >>> >>Thanks. >>> >> > >>> >> >>> >> Yeah, we client cache's locations, not the data. >>> >> >>> >> >>> >> > BTW: another confusion from me is in the paper of Big Table section >>> >>5.1 >>> >> > Tablet location, it is mentioned that "If the client¹s cache is >>> stale, >>> >> the >>> >> > location algorithm could take up to six round-trips, because stale >>> >>cache >>> >> > entries are only discovered upon misses (assuming that METADATA >>> >>tablets >>> >> do >>> >> > not move very frequently).", I do not know how the 6 times round trip >>> >> time >>> >> > is calculated, if anyone could answer this puzzle, it will be great. >>> >>:-) >>> >> > >>> >> >>> >> I'm not sure what the 6 is about either. Here is a guesstimate: >>> >> >>> >> 1. Go to cached location for a server for a particular user region, >>> >> but server says that it does not have a region, the client location is >>> >> stale >>> >> 2. Go back to client cached meta region that holds user region w/ row >>> >> we want, but its location is stale. >>> >> 3. Go to root location, to find new location of meta, but the root >>> >> location has moved.... what the client has is stale >>> >> 4. Find new root location and do lookup of meta region location >>> >> 5. Go to meta region location to find new user region >>> >> 6. Go to server w/ user region >>> >> >>> >> St.Ack >>> >> >>> >>> >>> >> -- Harsh J
