Maybe the first point could be further expanded on: A region has a maximum
size defined (either by default, or in the hbase-site.xml). When you get too
large, hbase will automatically trigger a split (you can also manually trigger
a split for a specific region, or all regions, from the shell or web UI). If
you dig through the code enough you will discover that it splits on the midkey:
for the region, the middle key of the index. Relevant code at the end.
So, the begin and end keys are chosen how you might expect: naturally based on
the key distribution that is encountered. It does not try to infer your scheme
for generating keys.
Dave
/*
* @return File midkey. Inexact. Operates on block boundaries. Does
* not go into blocks.
*/
byte [] midkey() throws IOException {
int pos = ((this.count - 1)/2); // middle of the index
if (pos < 0) {
throw new IOException("HFile empty");
}
return this.blockKeys[pos];
}
-----Original Message-----
From: Christopher Tarnas [mailto:[email protected]] On Behalf Of Chris Tarnas
Sent: Wednesday, June 15, 2011 10:59 AM
To: [email protected]
Subject: Re: Incoming Row Distribution Strategy/Algorithm Among Region Servers?
There are a few ways:
1) Dynamically as data added. You start with one region and all data goes
there. When a region grows to big, it gets split in half. So if a region had
keys 1-10 we now have 1-5 and 5-10.
2) Manually at table creation. You can specify your regions ahead of time if
you have a good handle on the data distribution.
-chris
On Jun 15, 2011, at 10:47 AM, Shuja Rehman wrote:
> yeah, i understand this but my question was that who will define the start
> and stop key of a region server? did u get my point?
>
> On Wed, Jun 15, 2011 at 9:53 PM, Doug Meil
> <[email protected]>wrote:
>
>> This is briefly covered in the client architecture overview...
>>
>> http://hbase.apache.org/book.html#client
>>
>> ... the gist is that as David describes the client talks directly to the
>> RegionServers, and knows the start/end keys available.
>>
>> -----Original Message-----
>> From: Buttler, David [mailto:[email protected]]
>> Sent: Wednesday, June 15, 2011 12:28 PM
>> To: [email protected]
>> Subject: RE: Incoming Row Distribution Strategy/Algorithm Among Region
>> Servers?
>>
>> Seems pretty simple to me, but I am probably glossing over details:
>> You insert a row with key '3'
>>
>> Hbase has regions (format start key, end key): (0,1), (1,4), (4,10) Assume
>> three region servers A, B, C holding the corresponding region
>>
>> Your client gets the location of the region server holding the meta data
>> (from zookeeper) and asks for the region server that is responsible for key
>> '3'. It caches this information so that it doesn't have to ask again for
>> awhile. It then sends the insert statement to that region server.
>>
>> Asking for the region server that contains key '3' is probably a simple
>> binary search, but I haven't looked it up. The client could likely easily
>> hold the entire list of regions to region server mappings in memory and do
>> the binary search locally.
>>
>> Dave
>>
>>
>> -----Original Message-----
>> From: Shuja Rehman [mailto:[email protected]]
>> Sent: Wednesday, June 15, 2011 4:25 AM
>> To: [email protected]
>> Subject: Incoming Row Distribution Strategy/Algorithm Among Region Servers?
>>
>> Hi,
>>
>> I am wondering if anybody let me know that how Hbase redirects the input
>> row to particular region server? What is the exact algorithm which is used
>> to distribute the incoming rows to particular region servers? Can I get
>> detail information/flow diagram about this? e.g Row1 ->Some Algorithm->
>> RegionServerX and in this,"Some Algorithm" details needed.
>>
>> Thanks
>>
>> --
>> Regards
>> Shuja-ur-Rehman Baig
>> <http://pk.linkedin.com/in/shujamughal>
>>
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>