The starvation problem in current load balance algorithm
--------------------------------------------------------

                 Key: HBASE-3663
                 URL: https://issues.apache.org/jira/browse/HBASE-3663
             Project: HBase
          Issue Type: Bug
            Reporter: Liyin Tang
         Attachments: result_new_load_balance.txt, result_old_load_balance.txt

This is an interesting starvation case. There are 2 conditions to trigger this 
problem.
Condition1: r/s - r/(s+1) << 1 
Let r: the number of regions
Let s: the number of servers

Condition2: for each server, the load of each server is less or equal the ceil 
of avg load.

Here is the unit test to verify this problem: 
For example, there are 16 servers and 62 regions. The avg load is 
3.875. And setting the slot to 0 to keep the load of each server either 3 or 4. 
When a new server is coming,  no server needs to assign regions to this new 
server, since no one is larger the ceil of the avg.
(Setting slot to 0 is to easily trigger this situation, otherwise it needs much 
larger numbers)

Solutions is pretty straightforward. Just compare the floor of the avg instead 
of the ceil. This solution will evenly balance the load from the servers which 
is little more loaded than others. 

I also attached the comparison result  for the case mentioned above between the 
old balance algorithm and new balance algorithm. (I set the slot = 0 when 
testing)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to