After about a minute everything appears to have been moved, since the
master only prints out lines like:

2014-08-15 16:40:56,945 INFO org.apache.hadoop.hbase.master.LoadBalancer:
Skipping load balancing because balanced cluster; servers=8 regions=126
average=15.75 mostloaded=16 leastloaded=15

Within the first minute, though, after adding the new nodes, I see a lot of
activity like this:

2014-08-15 16:40:49,142 INFO org.apache.hadoop.hbase.master.HMaster:
balance
hri=usertable,user9673372036854775807,1408128925670.e3c3e82d52f54692a70e7b59aeb56b31.,
src=172-19-152-247,60020,1408131968632,
dest=172-19-149-231,60020,1408135240788
2014-08-15 16:40:49,142 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of
region
usertable,user9673372036854775807,1408128925670.e3c3e82d52f54692a70e7b59aeb56b31.
(offlining)
2014-08-15 16:40:49,142 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x147db34a2c80000 Creating unassigned node for
e3c3e82d52f54692a70e7b59aeb56b31 in a CLOSING state
2014-08-15 16:40:49,181 DEBUG org.apache.hadoop.hbase.master.ServerManager:
New connection to 172-19-152-247,60020,1408131968632
2014-08-15 16:40:49,189 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to
172-19-152-247,60020,1408131968632 for region
usertable,user9673372036854775807,1408128925670.e3c3e82d52f54692a70e7b59aeb56b31.
2014-08-15 16:40:49,189 INFO org.apache.hadoop.hbase.master.HMaster:
balance
hri=usertable,user1068528100307123131,1408128925659.1fe73c5f470b78e7e9d2a0691659967a.,
src=172-19-152-241,60020,1408131968630,
dest=172-19-149-89,60020,1408135238164
2014-08-15 16:40:49,189 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of
region
usertable,user1068528100307123131,1408128925659.1fe73c5f470b78e7e9d2a0691659967a.
(offlining)
2014-08-15 16:40:49,189 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x147db34a2c80000 Creating unassigned node for
1fe73c5f470b78e7e9d2a0691659967a in a CLOSING state
2014-08-15 16:40:49,223 DEBUG org.apache.hadoop.hbase.master.ServerManager:
New connection to 172-19-152-241,60020,1408131968630
2014-08-15 16:40:49,223 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_OPENED, server=172-19-149-231,60020,1408135240788,
region=7b08e35b1fb9f445d91a90ae89bb401a
2014-08-15 16:40:49,224 DEBUG
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED
event for
othertable,user1342640501535615655,1408128955190.7b08e35b1fb9f445d91a90ae89bb401a.
from 172-19-149-231,60020,1408135240788; deleting unassigned node
2014-08-15 16:40:49,224 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x147db34a2c80000 Deleting existing unassigned node for
7b08e35b1fb9f445d91a90ae89bb401a that is in expected state
RS_ZK_REGION_OPENED
2014-08-15 16:40:49,225 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to
172-19-152-241,60020,1408131968630 for region
usertable,user1068528100307123131,1408128925659.1fe73c5f470b78e7e9d2a0691659967a.
2014-08-15 16:40:49,225 INFO org.apache.hadoop.hbase.master.HMaster:
balance
hri=usertable,user9873372036854775807,1408128925670.3eeb1e68835cbb98c9f796d437fe7ebc.,
src=172-19-152-241,60020,1408131968630,
dest=172-19-149-89,60020,1408135238164
2014-08-15 16:40:49,225 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of
region
usertable,user9873372036854775807,1408128925670.3eeb1e68835cbb98c9f796d437fe7ebc.
(offlining)
2014-08-15 16:40:49,225 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x147db34a2c80000 Creating unassigned node for
3eeb1e68835cbb98c9f796d437fe7ebc in a CLOSING state
2014-08-15 16:40:49,281 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x147db34a2c80000 Successfully deleted unassigned node for
region 7b08e35b1fb9f445d91a90ae89bb401a in expected state
RS_ZK_REGION_OPENED

Thanks,
Eric



On Fri, Aug 22, 2014 at 2:24 PM, Ted Yu <[email protected]> wrote:

> Suppose you add new nodes at time T, can you check master log between T and
> T+45(min) to see how long region movements took ?
>
> Cheers
>
>
> On Fri, Aug 22, 2014 at 11:19 AM, Eric K <[email protected]> wrote:
>
> > I'm experimenting with HBase (0.94.4) and Hadoop (1.0.4) runtime
> balancers
> > on just a very tiny 4 node cluster and finding that performance is bad
> for
> > an hour after adding new nodes, even though all the data is supposedly
> > offloaded within a few minutes.
> >
> > Using YCSB-generated requests, I load the database with about 18 GB worth
> > of data across 12 million records.  The keyspace is initially pre-split
> to
> > have ~30 regions per node with single replication of data.  Then I hit
> > HBase with read requests from a set of clients so that there are 2000
> > requests outstanding, and new requests are immediately made after replies
> > are received.  While these requests are running, after about 2 minutes, I
> > double the nodes from 4 to 8, add the new node information to the slaves
> > and regionservers files, start up the new datanode and regionserver
> > processes, and call the hdfs balancer with the smallest possible
> threshold
> > of 1.  The hbase.balancer.period is also set to 10 seconds so as to
> respond
> > fast to new nodes.  The dfs.bandwidth.bandwidthPerSec is set to 8 MB/s
> per
> > node, but I have also tried higher numbers that don't bottleneck the
> > offload rate and gotten similar results.
> >
> > The expected response is that about half the data and regions are
> offloaded
> > to the new nodes in the next few minutes, and the logs (and hdfsadmin
> > reports) confirm that this is indeed happening.  However, I'm seeing the
> > throughput drop from 3000 to 500 requests/sec when the nodes are added,
> > with latencies jumping from ~1s to 4 or 5 seconds, and this poor
> > performance persists for almost 45 minutes, when it abruptly gets better
> to
> > 4000 requests/sec and 500ms latencies.
> >
> > I'd appreciate any ideas as to what could be causing that 45 minute
> > performance delay, and how I can debug this further.  If I add the new
> > nodes and balance onto them before starting up any read requests,
> > performance is much better from the start.
> >
> > Thanks,
> > Eric
> >
>

Reply via email to