[ https://issues.apache.org/jira/browse/HBASE-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211831#comment-16211831 ]
Jerry He edited comment on HBASE-19021 at 10/19/17 10:09 PM: ------------------------------------------------------------- More explanation. In the branch-1 RegionStates.getAssignmentsByTable() https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L1115 there is a part to deal with servers w/o assignments and draining mode. This is missing after AMv2. But the draining mode is actually ok after a 'detour' in AMv2. The balancer's balanceCluster() can pick a plan to move regions to the draining servers. The regions will be 'unassigned'. But in the 'assign' phase, when going thru retainAssignment check, the plan is checked against the server list obtained from ServerManager.createDestinationServersList(). This list is a good list without the draining servers. So it is like a detour, but the end result is ok. But I restored the branch-1 behavior, which is to take the draining servers out of consideration from the beginning. The balancer's retainAssignment, randomAssignment and roundRobinAssignment all take a server list as parameter. We seem to be always calling ServerManager.createDestinationServersList() to pass the server list. They are all good. Only the big balanceCluster() call has the issue. was (Author: jinghe): More explanation. In the branch-1 RegionStates.getAssignmentsByTable() https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L1115 there is a part to deal with servers w/o assignments and draining mode. This is missing after AMv2. But the draining mode is actually ok after a 'detour' in AMv2. The balancer's balanceCluster() can pick a plan to move regions to the draining servers. The regions will be 'unassigned'. But in the 'assign' phase, when going thru retainAssignment check, the plan is checked against the server list obtained from ServerManager.createDestinationServersList(). This list is a good list without the draining servers. So it is like a detour, but the end result is ok. But I restored the branch-1 behavior, which is to take the draining servers out of consideration from the beginning. The balancer's retainAssignment, randomAssignment and roundRobinAssignment all take a server list an parameters. We seem to be always calling ServerManager.createDestinationServersList() to pass the server list. They are all good. Only the big balanceCluster() call has the issue. > Restore a few important missing logics for balancer in 2.0 > ---------------------------------------------------------- > > Key: HBASE-19021 > URL: https://issues.apache.org/jira/browse/HBASE-19021 > Project: HBase > Issue Type: Bug > Reporter: Jerry He > Assignee: Jerry He > Priority: Critical > Attachments: HBASE-19021-master.patch, HBASE-19021-master.patch > > > After looking at the code, and some testing, I see the following things are > missing for balancer to work properly after AMv2. > # hbase.master.loadbalance.bytable is not respected. It is always 'bytable'. > Previous default is cluster wide, not by table. > # Servers with no assignments is not added for balance consideration. > # Crashed server is not removed from the in-memory server map in > RegionStates, which affects balance. > # Draining marker is not respected when balance. > Also try to re-enable {{TestRegionRebalancing}}, which has a > {{testRebalanceOnRegionServerNumberChange}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)