On Thu, Jun 17, 2010 at 11:58 AM, Daniel Einspanjer <[email protected]
> wrote:

>  Here is an example of a region split with both daughters being assigned to
> the same region.  Is this expected?
>
> 2010-06-17 08:34:53,060 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_SPLIT_INCLUDES_DAUGHTERS:
> crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276776160508:
> Daughters;
> crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647,
> crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 from
> cm-hadoop14.mozilla.org,60020,1276560962019; 1 of 1
> 2010-06-17 08:34:54,316 INFO org.apache.hadoop.hbase.master.RegionManager:
> Assigning region
> crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647 to
> cm-hadoop15.mozilla.org,60020,1276778868841
> 2010-06-17 08:34:54,316 INFO org.apache.hadoop.hbase.master.RegionManager:
> Assigning region
> crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 to
> cm-hadoop15.mozilla.org,60020,12767788688412010-06-17 08:34:55,432 INFO
> org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN:
> crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647 from
> cm-hadoop15.mozilla.org,60020,1276778868841;
> 1 of 1
> 2010-06-17 08:34:55,432 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation:
> crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647 open
> on 10.2.72.74:60020
> 2010-06-17 08:34:55,436 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
> crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647 in
> region .META.,,1 with startcode=1276778868841, server=1
> 0.2.72.74:60020
> 2010-06-17 08:34:56,044 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_OPEN:
> crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 from
> cm-hadoop15.mozilla.org,60020,1276778868841;
> 1 of 1
> 2010-06-17 08:34:56,044 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation:
> crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 open
> on 10.2.72.74:60020
> 2010-06-17 08:34:56,048 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
> crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 in
> region .META.,,1 with startcode=1276778868841, server=1
> 0.2.72.74:60020
>
>
>
> On 6/17/10 11:42 AM, Daniel Einspanjer wrote:
>
>>  Currently, in our production cluster, almost all of the traffic for a day
>> ends up assigned to a single RS and that causes the load on that machine to
>> be too high.
>>
>> With our last release, we salted our rowkeys so that rather than starting
>> with the date:
>> 100617<guid>
>> 
they now start with the first letter of the guid followed by the date:
>> 
e100617<guid_that_starts_with_e>
>>
>> When I look at the region assignments though, I see a single server
>> assigned the following regions:
>> 
0100617...
>> 
1100617...
>> 
2100617...
>> 
3100617...
>> 
4100617...
>> 
...
>> 
d100617...
>> 
e100617...
>> 
f100617...
>>
>> Is there anything we can do to try to get the cluster to shuffle this up
>> some more?
>> We are getting compaction times in the minutes (one I saw was over 12
>> minutes) and this causes our clients to time out and shut down which causes
>> production outages.
>>
>> -Daniel
>>
>
Here comes a stone age, stop gap suggestion. If you shutdown the region
server you would get them to move, but there is a period of time where the
region is inaccessible so that is never good.

Reply via email to