Hi, Someone told me that the rebalance condition in my previous email doesn't seem acculate. Then I found the correct condition in the Javadoc.
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/master/LoadBalancer.html -------------- Cluster-wide load balancing will occur only when there are no regions in transition and according to a fixed period of a time using balanceCluster(Map). -------------- So region rebalance will be triggered after certain period of time even if it's o n a not so busy cluster. Though I was wrong, I'm still thinking to check the number of regions assigned will be a good start to know what's happening on your cluster. -- Tatsuya Kawano (Mr.) Tokyo, Japan On Jan 24, 2011, at 1:36 PM, Tatsuya Kawano <[email protected]> wrote: > Hi, > > Your 4th and 5th region servers are probably getting no region to serve. > Check the number of regions assigned to and number of requests processed by > theose servers (by Ganglia or the HBase status screen) > > HBase won't rebalance regions when you add servers. It does assign regions to > the new server only when the existing regions on other region server get > split. Your workload only updates existing records and the existing regions > won't get split very often. > > I don't know a good way to work with this, so I hope others on this list to > have better idea. But you can manually split a region via HBase shell, and > also disabling / enabling a region may reassign the region to 4th or 5th > server. > > Hope this helps, > Tatsuya > > -- > Tatsuya Kawano (Mr.) > Tokyo, Japan > > > On Jan 24, 2011, at 6:41 AM, Thibault Dory <[email protected]> wrote: > >> Hello, >> >> I'm currently testing the performances of HBase for a specific test case. I >> have downloaded ~20000 articles from Wikipedia and I want to test the >> performances of read/writes and MapReduce. >> I'm using HBase 0.20.6 and Hadoop 0.20.2 on a cluster of Ubuntu powered >> servers connected with Gigabit ethernet. >> >> My test works like this : >> - I start with 3 physical server, used like this : 3 hadoop nodes (1 >> namenode and 3 datanode) and for HBase : 1 master and 3 regionserver. >> - I insert all the articles, with one article by row that contains to cells >> : ID and article >> - I start 3 threads from another machine, reading and updating (I simply >> append the string "1" to the end of the article) articles randomly and I >> measure the time needed for all the operations to finish >> - I build a reverse search index using two phases of MapReduce and measure >> the time to compute it >> - then I add a new server on wich I start a datanode and a region server >> and I start the benchmark again with 4 thread >> - I repeat those steps until I reach the last available server (8 in total) >> >> I am keeping the total number of operations as a constant and appending "1" >> to an article does not change much it's size. >> >> The problem is the kind of results I'm seeing, I believed that the time >> needed to perform the read/writes operations would decrease as I add new >> servers to the cluster but I'm experiencing exactly the opposite. Moreover, >> the more request I make, the slower the cluster become, for a constant size. >> For example here are the results in seconds that I have on my cluster just >> after the first insertion with 3 nodes for 10000 operations ( 20% of wich >> are updating the articles) : >> >> Individual times : [30.338116567, 24.402751402, 25.650858953, 27.699796324, >> 26.589869283, 33.909433157, 52.538378122, 48.0114018, 47.149348721, >> 42.825791078] >> Then one minute after this runs ends, everything else staying the same : >> Individual times : [58.181552147, 48.931155328, 62.509309199, 57.198395723, >> 63.267397201, 54.160937835, 57.635454167, 64.780292628, 62.762390414, >> 61.381563914] >> And finaly five minutes after the last run ends, everything else staying the >> same : >> Individual times : [56.852388792, 58.011768345, 63.578745601, 68.008043323, >> 79.545419247, 87.183962628, 88.1989561, 94.532923849, 99.852569437, >> 102.355709259] >> >> It seems quite clear that the time needed to perform the same amount of >> operations is rising fast. >> >> When I add server to the cluster, the time needed to perform the operations >> keeps rising. Here are the results for 4 servers using the same methodology >> as above : >> Immediately after the new server is added >> Individual times : [86.224951713, 80.777746425, 84.814954717, 93.07842057, >> 83.348558502, 90.037499401, 106.799544002, 98.122952552, 97.057614119, >> 94.277285461] >> One minute after last test >> Individual times : [94.633454698, 101.250176482, 99.945406887, >> 101.754011832, 106.882328108, 97.808320021, 97.050036703, 95.844557847, >> 97.931572694, 92.258327247] >> Five minute after last test >> Individual times : [98.188162512, 96.332809905, 93.598184149, 93.552745204, >> 96.905860067, 102.149408296, 101.545412423, 105.377292242, 108.855117219, >> 110.429000567] >> >> The times needed to compute the inverse search index using MapReduce are >> rising too : >> 3 nodes >> Results : [106.604148815, 104.829340323, 101.986450167, 102.871575842, >> 102.177574017] >> 4 nodes >> Results : [120.451610507, 115.007344179, 115.075212636, 115.146883431, >> 114.216465299] >> 5 nodes >> Results : [139.563445944, 132.933993434, 134.117730658, 132.927127084, >> 132.041046308] >> >> >> I don't think that this behaviour is normal, I should see the time needed to >> complete the same amount of work decreasing as I add more servers in the >> cluster. Unless this is because my cluster is too small? >> I should say that all the servers in the cluster seem to use an equal amount >> of CPU while the test is running, so it looks like all of them are working >> and there is no server that is not storing data. >> >> What do you think? Where did I screw up to see that kind of results with >> HBase?
