Here is an example of a region split with both daughters being
assigned to the same region. Is this expected?
2010-06-17 08:34:53,060 INFO
org.apache.hadoop.hbase.master.ServerManager: Processing
MSG_REPORT_SPLIT_INCLUDES_DAUGHTERS:
crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276776160508:
Daughters;
crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647,
crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647
from cm-hadoop14.mozilla.org,60020,1276560962019; 1 of 1
2010-06-17 08:34:54,316 INFO
org.apache.hadoop.hbase.master.RegionManager: Assigning region
crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647
to cm-hadoop15.mozilla.org,60020,1276778868841
2010-06-17 08:34:54,316 INFO
org.apache.hadoop.hbase.master.RegionManager: Assigning region
crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647
to cm-hadoop15.mozilla.org,60020,12767788688412010-06-17 08:34:55,432
INFO org.apache.hadoop.hbase.master.ServerManager: Processing
MSG_REPORT_OPEN:
crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647
from cm-hadoop15.mozilla.org,60020,1276778868841;
1 of 1
2010-06-17 08:34:55,432 INFO
org.apache.hadoop.hbase.master.RegionServerOperation:
crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647
open on 10.2.72.74:60020
2010-06-17 08:34:55,436 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
crash_reports,21006172700f355-1d02-485a-90d9-0e8182100617,1276788891647
in region .META.,,1 with startcode=1276778868841, server=1
0.2.72.74:60020
2010-06-17 08:34:56,044 INFO
org.apache.hadoop.hbase.master.ServerManager: Processing
MSG_REPORT_OPEN:
crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647
from cm-hadoop15.mozilla.org,60020,1276778868841;
1 of 1
2010-06-17 08:34:56,044 INFO
org.apache.hadoop.hbase.master.RegionServerOperation:
crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647
open on 10.2.72.74:60020
2010-06-17 08:34:56,048 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647
in region .META.,,1 with startcode=1276778868841, server=1
0.2.72.74:60020
On 6/17/10 11:42 AM, Daniel Einspanjer wrote:
Currently, in our production cluster, almost all of the traffic for a
day ends up assigned to a single RS and that causes the load on that
machine to be too high.
With our last release, we salted our rowkeys so that rather than
starting with the date:
100617<guid>
they now start with the first letter of the guid followed by the date:
e100617<guid_that_starts_with_e>
When I look at the region assignments though, I see a single server
assigned the following regions:
0100617...
1100617...
2100617...
3100617...
4100617...
...
d100617...
e100617...
f100617...
Is there anything we can do to try to get the cluster to shuffle this
up some more?
We are getting compaction times in the minutes (one I saw was over 12
minutes) and this causes our clients to time out and shut down which
causes production outages.
-Daniel