The below seems right. Do you have a patch Jieshan? St.Ack
On Sun, Jun 12, 2011 at 2:41 AM, bijieshan <[email protected]> wrote: > From the HMaster logs, I found something weird: > > 2011-05-24 11:12:11,152 INFO org.apache.hadoop.hbase.master.HMaster: balance > hri=hello,122130,1305944329350.7d6c96428e2563c3d8676474d0a9f814., > src=158-1-101-202,20020,1306205409671, dest=158-1-101-222,20020,1306205940117 > 2011-05-24 11:12:31,536 INFO org.apache.hadoop.hbase.master.HMaster: balance > hri=hello,122130,1305944329350.7d6c96428e2563c3d8676474d0a9f814., > src=158-1-101-202,20020,1306205409671, dest=158-1-101-222,20020,1306205940117 > > We can see that, the same region was balanced twice. > > To describe the problem, I give out one simple example: > > 1. Suppose regions count is 10 in RegionServer A. > Max: 5 Min:4 > 2. So the regions count need to move is: 5. > 3. Before the movement of calculate, the list was shuffled. > 4. The 5 moving region was picked out from the back. > 5. The nextRegionForUnload value is 5. > 6. So if the neededRegions is not zero. Maybe there's still one region should > be picked out from RegionServer A. > This time , the picked Index is 5 which has been picked once!!!!! > > |<-----5-------| > ------------*--*--*--*--*--*--*--*--*--*---- > | > getNextRegionForUnload > > Here's the analysis from code: > > 1. Walk down most loaded, pruning each to the max. Picked region from back of > the list(by reverse order) > Map<HServerInfo,BalanceInfo> serverBalanceInfo = > new TreeMap<HServerInfo,BalanceInfo>(); > for(Map.Entry<HServerInfo, List<HRegionInfo>> server : > serversByLoad.descendingMap().entrySet()) { > HServerInfo serverInfo = server.getKey(); > int regionCount = serverInfo.getLoad().getNumberOfRegions(); > if(regionCount <= max) { > serverBalanceInfo.put(serverInfo, new BalanceInfo(0, 0)); > break; > } > serversOverloaded++; > List<HRegionInfo> regions = randomize(server.getValue()); > int numToOffload = Math.min(regionCount - max, regions.size()); > int numTaken = 0; > for (int i = regions.size() - 1; i >= 0; i--) { > HRegionInfo hri = regions.get(i); > // Don't rebalance meta regions. > if (hri.isMetaRegion()) continue; > regionsToMove.add(new RegionPlan(hri, serverInfo, null)); > numTaken++; > if (numTaken >= numToOffload) break; > } > /**********************************************************/ > /***set the nextRegionForUnload value is numToOffload ****/ > /**********************************************************/ > serverBalanceInfo.put(serverInfo, > new BalanceInfo(numToOffload, (-1)*numTaken)); > } > 2. The second pass of picked one region from the Max regionserver by order. > if (neededRegions != 0) { > // Walk down most loaded, grabbing one from each until we get enough > for(Map.Entry<HServerInfo, List<HRegionInfo>> server : > serversByLoad.descendingMap().entrySet()) { > BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey()); > int idx = > balanceInfo == null ? 0 : balanceInfo.getNextRegionForUnload(); > if (idx >= server.getValue().size()) break; > HRegionInfo region = server.getValue().get(idx); > if (region.isMetaRegion()) continue; // Don't move meta regions. > regionsToMove.add(new RegionPlan(region, server.getKey(), null)); > if(--neededRegions == 0) { > // No more regions needed, done shedding > break; > } > } > } > > If I make any mistakes in the analysis, please point them out. > Thanks! > > Jieshan Bean > >
