Can you file an issue Jieshan? Do you have a suggested patch? Thank you for digging in on this, St.Ack
On Tue, May 31, 2011 at 8:06 PM, bijieshan <[email protected]> wrote: > Sorry for a long time break of the discussion about this problem. > Till now, I found one possible reason cause this problem. > > The main reason of this problem is the splitted region could be online again. > The following is my anylysis: > > (The cluster has two HMatser, one active and one standby) > > 1.While the active HMaster shutdown, the standby one would become the active > one, and went into the processFailover() method: > if (regionCount == 0) { > LOG.info("Master startup proceeding: cluster startup"); > this.assignmentManager.cleanoutUnassigned(); > this.assignmentManager.assignAllUserRegions(); > } else { > > LOG.info("Master startup proceeding: master failover"); > this.assignmentManager.processFailover(); > } > 2.After that, the user regions would be rebuild. > Map<HServerInfo,List<Pair<HRegionInfo,Result>>> deadServers = > rebuildUserRegions(); > > 3.Here's how the rebuildUserRegions worked. All the regions(contain the > splitted regions) would be added to the offlineRegions of offlineServers. > > for (Result result : results) { > Pair<HRegionInfo,HServerInfo> region = > MetaReader.metaRowToRegionPairWithInfo(result); > if (region == null) continue; > HServerInfo regionLocation = region.getSecond(); > HRegionInfo regionInfo = region.getFirst(); > if (regionLocation == null) { > // Region not being served, add to region map with no assignment > // If this needs to be assigned out, it will also be in ZK as RIT > this.regions.put(regionInfo, null); > } else if (!serverManager.isServerOnline( > regionLocation.getServerName())) { > // Region is located on a server that isn't online > List<Pair<HRegionInfo,Result>> offlineRegions = > offlineServers.get(regionLocation); > if (offlineRegions == null) { > offlineRegions = new ArrayList<Pair<HRegionInfo,Result>>(1); > offlineServers.put(regionLocation, offlineRegions); > } > offlineRegions.add(new Pair<HRegionInfo,Result>(regionInfo, result)); > } else { > // Region is being served and on an active server > regions.put(regionInfo, regionLocation); > addToServers(regionLocation, regionInfo); > } > } > > 4.It seems that all the offline regions will be added to RIT and online again: > ZKAssign will creat node for each offline never consider the splitted ones. > > AssignmentManager# processDeadServers > private void processDeadServers( > Map<HServerInfo, List<Pair<HRegionInfo, Result>>> deadServers) > throws IOException, KeeperException { > for (Map.Entry<HServerInfo, List<Pair<HRegionInfo,Result>>> deadServer : > deadServers.entrySet()) { > List<Pair<HRegionInfo,Result>> regions = deadServer.getValue(); > for (Pair<HRegionInfo,Result> region : regions) { > HRegionInfo regionInfo = region.getFirst(); > Result result = region.getSecond(); > // If region was in transition (was in zk) force it offline for > reassign > try { > ZKAssign.createOrForceNodeOffline(watcher, regionInfo, > master.getServerName()); > } catch (KeeperException.NoNodeException nne) { > // This is fine > } > // Process with existing RS shutdown code > ServerShutdownHandler.processDeadRegion(regionInfo, result, this, > this.catalogTracker); > } > } > } > > AssignmentManager# processFailover > // Process list of dead servers > processDeadServers(deadServers); > // Check existing regions in transition > List<String> nodes = ZKUtil.listChildrenAndWatchForNewChildren(watcher, > watcher.assignmentZNode); > if (nodes.isEmpty()) { > LOG.info("No regions in transition in ZK to process on failover"); > return; > } > LOG.info("Failed-over master needs to process " + nodes.size() + > " regions in transition"); > for (String encodedRegionName: nodes) { > processRegionInTransition(encodedRegionName, null); > } > > So I think before add the region into RIT, check it at first. > > Thanks! > > Jieshan Bean > > > -------------------------------- > > > I didn't run the hbck to check the system. The environment has been recovered > now, so I need to recreate the problem,and run then run hbck, maybe it could > give some helpful information. > Thanks! > > Jieshan Bean > > ------------- > > Can you run hbck? > > J-D > > 2011/5/17 bijieshan <[email protected]>: >> Yes, you're right. While count the .META., the result will exclude the >> -ROOT- region and the .META. region. Pardon me ,I should not mention about >> that. >> Maybe the less 2 region is just a coincidence here, I can show another >> scenario about this problem: >> >> From WebUI, we can see that: >> >> Address Start Code Load >> C4C3.site:60030 1305620850621 requests=0, regions=10227, usedHeap=2161, >> maxHeap=8175 >> C4C4.site:60030 1305620851291 requests=0, regions=10234, usedHeap=2593, >> maxHeap=8175 >> C4C5.site:60030 1305620851505 requests=0, regions=10227, usedHeap=2191, >> maxHeap=8175 >> Total: servers: 3 requests=0, regions=30688 >> >> From HMaster logs: >> >> 2011-05-18 10:06:08,382 INFO org.apache.hadoop.hbase.master.LoadBalancer: >> Skipping load balancing. servers=3 regions=30681 average=10227.0 >> mostloaded=10227 leastloaded=10227 >> 2011-05-18 10:06:13,365 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: >> hconnection opening connection to ZooKeeper with ensemble >> (C4C3:2181,C4C2:2181,C4C1:2181) >> >> >> 30681 & 30688 >> >> 7 region differ from this scenario. >> >> Thanks! >> >> Regards, >> Jieshan Bean >> >> >> ----- ----- >> Re: Regions count is not consistant between the WEBUI and LoaderBalancer >> >> .META. does not include region information for itself or -ROOT- so a count >> of .META. should always be two regions less than the total number. >> >> -chris >> >> On Tue, May 17, 2011 at 4:32 AM, bijieshan <[email protected]> wrote: >> >>> This problem seems happened with high probability, I have saw it several >>> times. >>> >>> Here's the description about this problem: >>> >>> 1.From the WEBUI, I can see the regions count is:35087. >>> 2.From the HBase shell, I counted the '.META.' table, and it showed there's >>> 35085 regions. >>> >>> hbase(main):001:0> count '.META.',10000 >>> 11/05/14 14:12:22 INFO jute.BinaryInputArchive: Expecting maxBufferString >>> as an integer but got as null.Hence setting to default value 1048575 >>> Current count: 10000, row: >>> hello,080472,1305272720373.ec0eb8c1bb414967241155cf0985241e. >>> Current count: 20000, row: >>> hello,161272,1305272956858.e16008645368590dfa735abc61024db6. >>> Current count: 30000, row: >>> ufdr,061870,1305171874909.eddc77cbfa2f93feef49f4485d9b88a5. >>> 35085 row(s) in 2.5250 seconds >>> >>> 3.From the HMaster logs, I saw that: >>> >>> 2011-05-14 14:13:57,546 INFO org.apache.zookeeper.ClientCnxn: EventThread >>> shut down >>> 2011-05-14 14:14:01,303 INFO org.apache.hadoop.hbase.master.LoadBalancer: >>> Skipping load balancing. servers=4 regions=35089 average=8772.25 >>> mostloaded=8773 leastloaded=8773 >>> 2011-05-14 14:14:08,706 DEBUG >>> org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 35085 catalog row(s) >>> and gc'd 0 unreferenced parent region(s) >>> >>> 35087 & 35085 , so where's the problem? Were they from the different sets? >>> >>> Thanks! >>> >>> Regards, >>> Jieshan Bean >>> >>> >>> >>> >> >
