Matt: >From the following we can see that region bc62a8a72124a4ba3f6b9f302587903c cannot be found:
2012-11-02 00:00:02,909 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=HadoopNode162.hotpads.srv,60020,1351788248279, region=bc62a8a72124a4ba3f6b9f302587903c 2012-11-02 00:00:02,909 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region bc62a8a72124a4ba3f6b9f302587903c *not found on server HadoopNode162.hotpads*.srv,60020,1351788248279;failed processing 2012-11-02 00:00:02,909 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region bc62a8a72124a4ba3f6b9f302587903c from server HadoopNode162.hotpads.srv,60020,1351788248279 but it doesn't exist anymore, probably already processed its split Have you run hbck to repair your cluster ? Thanks On Sat, Nov 3, 2012 at 2:29 PM, Matt Corgan <[email protected]> wrote: > Here's a sample of the master's logs from yesterday. It's not correlated > exactly with the other pastebin log, but there's 3GB of this from > yesterday: http://pastebin.com/wP2rNN1t > > I'm am pushing the cluster a bit with importing data so testing the split > code harder than normal. The regions are 500-1GB gzipped. I can look into > it more but trying to figure out what to look for. > > Thanks Ted, > Matt > > > On Sat, Nov 3, 2012 at 2:03 PM, Ted Yu <[email protected]> wrote: > > > Matt: > > This is the method which made the logging: > > private static int tickleNodeSplit(ZooKeeperWatcher zkw, > > HRegionInfo parent, HRegionInfo a, HRegionInfo b, ServerName > > serverName, > > final int znodeVersion) > > throws KeeperException, IOException { > > byte [] payload = Writables.getBytes(a, b); > > return ZKAssign.transitionNode(zkw, parent, serverName, > > EventType.RS_ZK_REGION_SPLIT, EventType.RS_ZK_REGION_SPLIT, > > znodeVersion, payload); > > } > > > > transitionZKNode() calls tickleNodeSplit() when waiting for master to > split > > the region. Obviously something caused the master not able to split. > > > > How large is the region ? > > > > Can you pastebin master log for that period of time ? > > > > Thanks > > > > On Sat, Nov 3, 2012 at 1:54 PM, Matt Corgan <[email protected]> wrote: > > > > > We upgraded from .94.0 to .94.2 last week and have started to encounter > > > infinite loops of region-transition on splits. I'm not sure yet if > it's > > > all splits nor if it's related to load. Solution so far has been to > > > restart the regionserver process. > > > > > > log snippet: > > > http://pastebin.com/LpienZ7B > > > > > > It's repeating these two lines: > > > 2012-11-02 01:35:33,312 DEBUG > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > regionserver:60020-0x13ab46479832b76 Attempting to transition node > > > cf3e9bc069e1888983c06dc8e053ffcf from RS_ZK_REGION_SPLIT to > > > RS_ZK_REGION_SPLIT > > > 2012-11-02 01:35:33,364 DEBUG > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > regionserver:60020-0x13ab46479832b76 Successfully transitioned node > > > cf3e9bc069e1888983c06dc8e053ffcf from RS_ZK_REGION_SPLIT to > > > RS_ZK_REGION_SPLIT > > > > > > with the occasional: > > > 2012-11-02 01:35:34,476 DEBUG > > > org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on > > the > > > master to process the split for cf3e9bc069e1888983c06dc8e053ffcf > > > > > > Should the region transition from RS_ZK_REGION_SPLIT to itself? It > looks > > > wrong, but I'm not familiar with the code at all. > > > > > > Thanks, > > > Matt > > > > > >
