Was the last 'Successfully deleted unassigned node' log close to T+45(min) ?
If so, it means that AssignmentManager was busy assigning these regions during this period. Cheers On Fri, Aug 22, 2014 at 1:50 PM, Eric K <[email protected]> wrote: > After about a minute everything appears to have been moved, since the > master only prints out lines like: > > 2014-08-15 16:40:56,945 INFO org.apache.hadoop.hbase.master.LoadBalancer: > Skipping load balancing because balanced cluster; servers=8 regions=126 > average=15.75 mostloaded=16 leastloaded=15 > > Within the first minute, though, after adding the new nodes, I see a lot of > activity like this: > > 2014-08-15 16:40:49,142 INFO org.apache.hadoop.hbase.master.HMaster: > balance > > hri=usertable,user9673372036854775807,1408128925670.e3c3e82d52f54692a70e7b59aeb56b31., > src=172-19-152-247,60020,1408131968632, > dest=172-19-149-231,60020,1408135240788 > 2014-08-15 16:40:49,142 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of > region > > usertable,user9673372036854775807,1408128925670.e3c3e82d52f54692a70e7b59aeb56b31. > (offlining) > 2014-08-15 16:40:49,142 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x147db34a2c80000 Creating unassigned node for > e3c3e82d52f54692a70e7b59aeb56b31 in a CLOSING state > 2014-08-15 16:40:49,181 DEBUG org.apache.hadoop.hbase.master.ServerManager: > New connection to 172-19-152-247,60020,1408131968632 > 2014-08-15 16:40:49,189 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > 172-19-152-247,60020,1408131968632 for region > > usertable,user9673372036854775807,1408128925670.e3c3e82d52f54692a70e7b59aeb56b31. > 2014-08-15 16:40:49,189 INFO org.apache.hadoop.hbase.master.HMaster: > balance > > hri=usertable,user1068528100307123131,1408128925659.1fe73c5f470b78e7e9d2a0691659967a., > src=172-19-152-241,60020,1408131968630, > dest=172-19-149-89,60020,1408135238164 > 2014-08-15 16:40:49,189 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of > region > > usertable,user1068528100307123131,1408128925659.1fe73c5f470b78e7e9d2a0691659967a. > (offlining) > 2014-08-15 16:40:49,189 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x147db34a2c80000 Creating unassigned node for > 1fe73c5f470b78e7e9d2a0691659967a in a CLOSING state > 2014-08-15 16:40:49,223 DEBUG org.apache.hadoop.hbase.master.ServerManager: > New connection to 172-19-152-241,60020,1408131968630 > 2014-08-15 16:40:49,223 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENED, server=172-19-149-231,60020,1408135240788, > region=7b08e35b1fb9f445d91a90ae89bb401a > 2014-08-15 16:40:49,224 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED > event for > > othertable,user1342640501535615655,1408128955190.7b08e35b1fb9f445d91a90ae89bb401a. > from 172-19-149-231,60020,1408135240788; deleting unassigned node > 2014-08-15 16:40:49,224 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x147db34a2c80000 Deleting existing unassigned node for > 7b08e35b1fb9f445d91a90ae89bb401a that is in expected state > RS_ZK_REGION_OPENED > 2014-08-15 16:40:49,225 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > 172-19-152-241,60020,1408131968630 for region > > usertable,user1068528100307123131,1408128925659.1fe73c5f470b78e7e9d2a0691659967a. > 2014-08-15 16:40:49,225 INFO org.apache.hadoop.hbase.master.HMaster: > balance > > hri=usertable,user9873372036854775807,1408128925670.3eeb1e68835cbb98c9f796d437fe7ebc., > src=172-19-152-241,60020,1408131968630, > dest=172-19-149-89,60020,1408135238164 > 2014-08-15 16:40:49,225 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of > region > > usertable,user9873372036854775807,1408128925670.3eeb1e68835cbb98c9f796d437fe7ebc. > (offlining) > 2014-08-15 16:40:49,225 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x147db34a2c80000 Creating unassigned node for > 3eeb1e68835cbb98c9f796d437fe7ebc in a CLOSING state > 2014-08-15 16:40:49,281 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x147db34a2c80000 Successfully deleted unassigned node for > region 7b08e35b1fb9f445d91a90ae89bb401a in expected state > RS_ZK_REGION_OPENED > > Thanks, > Eric > > > > On Fri, Aug 22, 2014 at 2:24 PM, Ted Yu <[email protected]> wrote: > > > Suppose you add new nodes at time T, can you check master log between T > and > > T+45(min) to see how long region movements took ? > > > > Cheers > > > > > > On Fri, Aug 22, 2014 at 11:19 AM, Eric K <[email protected]> wrote: > > > > > I'm experimenting with HBase (0.94.4) and Hadoop (1.0.4) runtime > > balancers > > > on just a very tiny 4 node cluster and finding that performance is bad > > for > > > an hour after adding new nodes, even though all the data is supposedly > > > offloaded within a few minutes. > > > > > > Using YCSB-generated requests, I load the database with about 18 GB > worth > > > of data across 12 million records. The keyspace is initially pre-split > > to > > > have ~30 regions per node with single replication of data. Then I hit > > > HBase with read requests from a set of clients so that there are 2000 > > > requests outstanding, and new requests are immediately made after > replies > > > are received. While these requests are running, after about 2 > minutes, I > > > double the nodes from 4 to 8, add the new node information to the > slaves > > > and regionservers files, start up the new datanode and regionserver > > > processes, and call the hdfs balancer with the smallest possible > > threshold > > > of 1. The hbase.balancer.period is also set to 10 seconds so as to > > respond > > > fast to new nodes. The dfs.bandwidth.bandwidthPerSec is set to 8 MB/s > > per > > > node, but I have also tried higher numbers that don't bottleneck the > > > offload rate and gotten similar results. > > > > > > The expected response is that about half the data and regions are > > offloaded > > > to the new nodes in the next few minutes, and the logs (and hdfsadmin > > > reports) confirm that this is indeed happening. However, I'm seeing > the > > > throughput drop from 3000 to 500 requests/sec when the nodes are added, > > > with latencies jumping from ~1s to 4 or 5 seconds, and this poor > > > performance persists for almost 45 minutes, when it abruptly gets > better > > to > > > 4000 requests/sec and 500ms latencies. > > > > > > I'd appreciate any ideas as to what could be causing that 45 minute > > > performance delay, and how I can debug this further. If I add the new > > > nodes and balance onto them before starting up any read requests, > > > performance is much better from the start. > > > > > > Thanks, > > > Eric > > > > > >
