Re: Scalability problem with HBase

Tatsuya Kawano Sun, 23 Jan 2011 22:52:03 -0800

Hi, 

Someone told me that the rebalance condition in my previous email doesn't seem 
acculate. Then I found the correct condition in the Javadoc.


http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/master/LoadBalancer.html

--------------
Cluster-wide load balancing will occur only when there are no regions in 
transition and according to a fixed period of a time using balanceCluster(Map).
--------------

So region rebalance will be triggered after certain period of time even if it's 
o n a not so busy cluster. 

Though I was wrong, I'm still thinking to check the number of regions assigned 
will be a good start to know what's happening on your cluster. 

--
Tatsuya Kawano (Mr.)
Tokyo, Japan


On Jan 24, 2011, at 1:36 PM, Tatsuya Kawano <[email protected]> wrote:

> Hi, 
> 
> Your 4th and 5th region servers are probably getting no region to serve. 
> Check the number of regions assigned  to and number of requests processed by 
> theose servers (by Ganglia or the HBase status screen)
> 
> HBase won't rebalance regions when you add servers. It does assign regions to 
> the new server only when the existing regions on other region server get 
> split. Your workload only updates existing records and the existing regions 
> won't get split very often. 
> 
> I don't know a good way to work with this, so I hope others on this list to 
> have better idea. But you can manually split a region via HBase shell, and 
> also disabling / enabling a region may reassign the region to 4th or 5th 
> server. 
> 
> Hope this helps, 
> Tatsuya
> 
> --
> Tatsuya Kawano (Mr.)
> Tokyo, Japan
> 
> 
> On Jan 24, 2011, at 6:41 AM, Thibault Dory <[email protected]> wrote:
> 
>> Hello,
>> 
>> I'm currently testing the performances of HBase for a specific test case. I
>> have downloaded ~20000 articles from Wikipedia and I want to test the
>> performances of read/writes and MapReduce.
>> I'm using HBase 0.20.6 and Hadoop 0.20.2 on a cluster of Ubuntu powered
>> servers connected with Gigabit ethernet.
>> 
>> My test works like this :
>> - I start with 3 physical server, used like this : 3 hadoop nodes (1
>> namenode and 3 datanode) and for HBase : 1 master and 3 regionserver.
>> - I insert all the articles, with one article by row that contains to cells
>> : ID and article
>> - I start 3 threads from another machine, reading and updating (I simply
>> append the string "1" to the end of the article) articles randomly and I
>> measure the time needed for all the operations to finish
>> - I build a reverse search index using two phases of MapReduce and measure
>> the time to compute it
>> - then I add a new server on wich I start a datanode and a region server
>> and I start the benchmark again with 4 thread
>> - I repeat those steps until I reach the last available server (8 in total)
>> 
>> I am keeping the total number of operations as a constant and appending "1"
>> to an article does not change much it's size.
>> 
>> The problem is the kind of results I'm seeing, I believed that the time
>> needed to perform the read/writes operations would decrease as I add new
>> servers to the cluster but I'm experiencing exactly the opposite. Moreover,
>> the more request I make, the slower the cluster become, for a constant size.
>> For example here are the results in seconds that I have on my cluster just
>> after the first insertion with 3 nodes for 10000 operations ( 20% of wich
>> are updating the articles) :
>> 
>> Individual times : [30.338116567, 24.402751402, 25.650858953, 27.699796324,
>> 26.589869283, 33.909433157, 52.538378122, 48.0114018, 47.149348721,
>> 42.825791078]
>> Then one minute after this runs ends, everything else staying the same :
>> Individual times : [58.181552147, 48.931155328, 62.509309199, 57.198395723,
>> 63.267397201, 54.160937835, 57.635454167, 64.780292628, 62.762390414,
>> 61.381563914]
>> And finaly five minutes after the last run ends, everything else staying the
>> same :
>> Individual times : [56.852388792, 58.011768345, 63.578745601, 68.008043323,
>> 79.545419247, 87.183962628, 88.1989561, 94.532923849, 99.852569437,
>> 102.355709259]
>> 
>> It seems quite clear that the time needed to perform the same amount of
>> operations is rising fast.
>> 
>> When I add server to the cluster, the time needed to perform the operations
>> keeps rising. Here are the results for 4 servers using the same methodology
>> as above :
>> Immediately after the new server is added
>> Individual times : [86.224951713, 80.777746425, 84.814954717, 93.07842057,
>> 83.348558502, 90.037499401, 106.799544002, 98.122952552, 97.057614119,
>> 94.277285461]
>> One minute after last test
>> Individual times : [94.633454698, 101.250176482, 99.945406887,
>> 101.754011832, 106.882328108, 97.808320021, 97.050036703, 95.844557847,
>> 97.931572694, 92.258327247]
>> Five minute after last test
>> Individual times : [98.188162512, 96.332809905, 93.598184149, 93.552745204,
>> 96.905860067, 102.149408296, 101.545412423, 105.377292242, 108.855117219,
>> 110.429000567]
>> 
>> The times needed to compute the inverse search index using MapReduce are
>> rising too :
>> 3 nodes
>> Results : [106.604148815, 104.829340323, 101.986450167, 102.871575842,
>> 102.177574017]
>> 4 nodes
>> Results : [120.451610507, 115.007344179, 115.075212636, 115.146883431,
>> 114.216465299]
>> 5 nodes
>> Results : [139.563445944, 132.933993434, 134.117730658, 132.927127084,
>> 132.041046308]
>> 
>> 
>> I don't think that this behaviour is normal, I should see the time needed to
>> complete the same amount of work decreasing as I add more servers in the
>> cluster. Unless this is because my cluster is too small?
>> I should say that all the servers in the cluster seem to use an equal amount
>> of CPU while the test is running, so it looks like all of them are working
>> and there is no server that is not storing data.
>> 
>> What do you think? Where did I screw up to see that kind of results with
>> HBase?

Re: Scalability problem with HBase

Reply via email to