RE: Incoming Row Distribution Strategy/Algorithm Among Region Servers?

Buttler, David Wed, 15 Jun 2011 15:42:09 -0700

Maybe the first point could be further expanded on:  A region has a maximum 
size defined (either by default, or in the hbase-site.xml).  When you get too 
large, hbase will automatically trigger a split (you can also manually trigger 
a split for a specific region, or all regions, from the shell or web UI).  If 
you dig through the code enough you will discover that it splits on the midkey: 
for the region, the middle key of the index.  Relevant code at the end.


So, the begin and end keys are chosen how you might expect: naturally based on 
the key distribution that is encountered.  It does not try to infer your scheme 
for generating keys.

Dave



    /*
     * @return File midkey.  Inexact.  Operates on block boundaries.  Does
     * not go into blocks.
     */
    byte [] midkey() throws IOException {
      int pos = ((this.count - 1)/2);              // middle of the index
      if (pos < 0) {
        throw new IOException("HFile empty");
      }
      return this.blockKeys[pos];
    }


-----Original Message-----
From: Christopher Tarnas [mailto:[email protected]] On Behalf Of Chris Tarnas
Sent: Wednesday, June 15, 2011 10:59 AM
To: [email protected]
Subject: Re: Incoming Row Distribution Strategy/Algorithm Among Region Servers?

There are a few ways:

1) Dynamically as data added. You start with one region and all data goes 
there. When a region grows to big, it gets split in half. So if a region had 
keys 1-10 we now have 1-5 and 5-10.

2) Manually at table creation. You can specify your regions ahead of time if 
you have a good handle on the data distribution. 

-chris


On Jun 15, 2011, at 10:47 AM, Shuja Rehman wrote:

> yeah, i understand this but my question was that who will define the start
> and stop key of a region server? did u get my point?
> 
> On Wed, Jun 15, 2011 at 9:53 PM, Doug Meil 
> <[email protected]>wrote:
> 
>> This is briefly covered in the client architecture overview...
>> 
>> http://hbase.apache.org/book.html#client
>> 
>> ... the gist is that as David describes the client talks directly to the
>> RegionServers, and knows the start/end keys available.
>> 
>> -----Original Message-----
>> From: Buttler, David [mailto:[email protected]]
>> Sent: Wednesday, June 15, 2011 12:28 PM
>> To: [email protected]
>> Subject: RE: Incoming Row Distribution Strategy/Algorithm Among Region
>> Servers?
>> 
>> Seems pretty simple to me, but I am probably glossing over details:
>> You insert a row with key '3'
>> 
>> Hbase has regions (format start key, end key): (0,1), (1,4), (4,10) Assume
>> three region servers A, B, C holding the corresponding region
>> 
>> Your client gets the location of the region server holding the meta data
>> (from zookeeper) and asks for the region server that is responsible for key
>> '3'.  It caches this information so that it doesn't have to ask again for
>> awhile.  It then sends the insert statement to that region server.
>> 
>> Asking for the region server that contains key '3' is probably a simple
>> binary search, but I haven't looked it up. The client could likely easily
>> hold the entire list of regions to region server mappings in memory and do
>> the binary search locally.
>> 
>> Dave
>> 
>> 
>> -----Original Message-----
>> From: Shuja Rehman [mailto:[email protected]]
>> Sent: Wednesday, June 15, 2011 4:25 AM
>> To: [email protected]
>> Subject: Incoming Row Distribution Strategy/Algorithm Among Region Servers?
>> 
>> Hi,
>> 
>> I am wondering if anybody let me know that how Hbase redirects the input
>> row to particular region server?  What is the exact algorithm which is used
>> to distribute the incoming rows to particular region servers?  Can I get
>> detail information/flow diagram about this? e.g Row1 ->Some Algorithm->
>> RegionServerX and in this,"Some Algorithm" details needed.
>> 
>> Thanks
>> 
>> --
>> Regards
>> Shuja-ur-Rehman Baig
>> <http://pk.linkedin.com/in/shujamughal>
>> 
> 
> 
> 
> -- 
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>

RE: Incoming Row Distribution Strategy/Algorithm Among Region Servers?

Reply via email to