After about a minute everything appears to have been moved, since the master only prints out lines like:
2014-08-15 16:40:56,945 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=8 regions=126 average=15.75 mostloaded=16 leastloaded=15 Within the first minute, though, after adding the new nodes, I see a lot of activity like this: 2014-08-15 16:40:49,142 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=usertable,user9673372036854775807,1408128925670.e3c3e82d52f54692a70e7b59aeb56b31., src=172-19-152-247,60020,1408131968632, dest=172-19-149-231,60020,1408135240788 2014-08-15 16:40:49,142 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region usertable,user9673372036854775807,1408128925670.e3c3e82d52f54692a70e7b59aeb56b31. (offlining) 2014-08-15 16:40:49,142 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x147db34a2c80000 Creating unassigned node for e3c3e82d52f54692a70e7b59aeb56b31 in a CLOSING state 2014-08-15 16:40:49,181 DEBUG org.apache.hadoop.hbase.master.ServerManager: New connection to 172-19-152-247,60020,1408131968632 2014-08-15 16:40:49,189 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 172-19-152-247,60020,1408131968632 for region usertable,user9673372036854775807,1408128925670.e3c3e82d52f54692a70e7b59aeb56b31. 2014-08-15 16:40:49,189 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=usertable,user1068528100307123131,1408128925659.1fe73c5f470b78e7e9d2a0691659967a., src=172-19-152-241,60020,1408131968630, dest=172-19-149-89,60020,1408135238164 2014-08-15 16:40:49,189 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region usertable,user1068528100307123131,1408128925659.1fe73c5f470b78e7e9d2a0691659967a. (offlining) 2014-08-15 16:40:49,189 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x147db34a2c80000 Creating unassigned node for 1fe73c5f470b78e7e9d2a0691659967a in a CLOSING state 2014-08-15 16:40:49,223 DEBUG org.apache.hadoop.hbase.master.ServerManager: New connection to 172-19-152-241,60020,1408131968630 2014-08-15 16:40:49,223 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=172-19-149-231,60020,1408135240788, region=7b08e35b1fb9f445d91a90ae89bb401a 2014-08-15 16:40:49,224 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for othertable,user1342640501535615655,1408128955190.7b08e35b1fb9f445d91a90ae89bb401a. from 172-19-149-231,60020,1408135240788; deleting unassigned node 2014-08-15 16:40:49,224 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x147db34a2c80000 Deleting existing unassigned node for 7b08e35b1fb9f445d91a90ae89bb401a that is in expected state RS_ZK_REGION_OPENED 2014-08-15 16:40:49,225 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 172-19-152-241,60020,1408131968630 for region usertable,user1068528100307123131,1408128925659.1fe73c5f470b78e7e9d2a0691659967a. 2014-08-15 16:40:49,225 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=usertable,user9873372036854775807,1408128925670.3eeb1e68835cbb98c9f796d437fe7ebc., src=172-19-152-241,60020,1408131968630, dest=172-19-149-89,60020,1408135238164 2014-08-15 16:40:49,225 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region usertable,user9873372036854775807,1408128925670.3eeb1e68835cbb98c9f796d437fe7ebc. (offlining) 2014-08-15 16:40:49,225 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x147db34a2c80000 Creating unassigned node for 3eeb1e68835cbb98c9f796d437fe7ebc in a CLOSING state 2014-08-15 16:40:49,281 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x147db34a2c80000 Successfully deleted unassigned node for region 7b08e35b1fb9f445d91a90ae89bb401a in expected state RS_ZK_REGION_OPENED Thanks, Eric On Fri, Aug 22, 2014 at 2:24 PM, Ted Yu <[email protected]> wrote: > Suppose you add new nodes at time T, can you check master log between T and > T+45(min) to see how long region movements took ? > > Cheers > > > On Fri, Aug 22, 2014 at 11:19 AM, Eric K <[email protected]> wrote: > > > I'm experimenting with HBase (0.94.4) and Hadoop (1.0.4) runtime > balancers > > on just a very tiny 4 node cluster and finding that performance is bad > for > > an hour after adding new nodes, even though all the data is supposedly > > offloaded within a few minutes. > > > > Using YCSB-generated requests, I load the database with about 18 GB worth > > of data across 12 million records. The keyspace is initially pre-split > to > > have ~30 regions per node with single replication of data. Then I hit > > HBase with read requests from a set of clients so that there are 2000 > > requests outstanding, and new requests are immediately made after replies > > are received. While these requests are running, after about 2 minutes, I > > double the nodes from 4 to 8, add the new node information to the slaves > > and regionservers files, start up the new datanode and regionserver > > processes, and call the hdfs balancer with the smallest possible > threshold > > of 1. The hbase.balancer.period is also set to 10 seconds so as to > respond > > fast to new nodes. The dfs.bandwidth.bandwidthPerSec is set to 8 MB/s > per > > node, but I have also tried higher numbers that don't bottleneck the > > offload rate and gotten similar results. > > > > The expected response is that about half the data and regions are > offloaded > > to the new nodes in the next few minutes, and the logs (and hdfsadmin > > reports) confirm that this is indeed happening. However, I'm seeing the > > throughput drop from 3000 to 500 requests/sec when the nodes are added, > > with latencies jumping from ~1s to 4 or 5 seconds, and this poor > > performance persists for almost 45 minutes, when it abruptly gets better > to > > 4000 requests/sec and 500ms latencies. > > > > I'd appreciate any ideas as to what could be causing that 45 minute > > performance delay, and how I can debug this further. If I add the new > > nodes and balance onto them before starting up any read requests, > > performance is much better from the start. > > > > Thanks, > > Eric > > >
