Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2

Ted Yu Sat, 03 Nov 2012 14:56:01 -0700

Matt:
>From the following we can see that region bc62a8a72124a4ba3f6b9f302587903c
cannot be found:


2012-11-02 00:00:02,909 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_SPLIT,
server=HadoopNode162.hotpads.srv,60020,1351788248279,
region=bc62a8a72124a4ba3f6b9f302587903c
2012-11-02 00:00:02,909 WARN
org.apache.hadoop.hbase.master.AssignmentManager: Region
bc62a8a72124a4ba3f6b9f302587903c *not found on server
HadoopNode162.hotpads*.srv,60020,1351788248279;failed
processing
2012-11-02 00:00:02,909 WARN
org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region
bc62a8a72124a4ba3f6b9f302587903c from server
HadoopNode162.hotpads.srv,60020,1351788248279 but it doesn't exist anymore,
probably already processed its split

Have you run hbck to repair your cluster ?

Thanks

On Sat, Nov 3, 2012 at 2:29 PM, Matt Corgan <[email protected]> wrote:

> Here's a sample of the master's logs from yesterday.  It's not correlated
> exactly with the other pastebin log, but there's 3GB of this from
> yesterday: http://pastebin.com/wP2rNN1t
>
> I'm am pushing the cluster a bit with importing data so testing the split
> code harder than normal.  The regions are 500-1GB gzipped.  I can look into
> it more but trying to figure out what to look for.
>
> Thanks Ted,
> Matt
>
>
> On Sat, Nov 3, 2012 at 2:03 PM, Ted Yu <[email protected]> wrote:
>
> > Matt:
> > This is the method which made the logging:
> >   private static int tickleNodeSplit(ZooKeeperWatcher zkw,
> >       HRegionInfo parent, HRegionInfo a, HRegionInfo b, ServerName
> > serverName,
> >       final int znodeVersion)
> >   throws KeeperException, IOException {
> >     byte [] payload = Writables.getBytes(a, b);
> >     return ZKAssign.transitionNode(zkw, parent, serverName,
> >       EventType.RS_ZK_REGION_SPLIT, EventType.RS_ZK_REGION_SPLIT,
> >       znodeVersion, payload);
> >   }
> >
> > transitionZKNode() calls tickleNodeSplit() when waiting for master to
> split
> > the region. Obviously something caused the master not able to split.
> >
> > How large is the region ?
> >
> > Can you pastebin master log for that period of time ?
> >
> > Thanks
> >
> > On Sat, Nov 3, 2012 at 1:54 PM, Matt Corgan <[email protected]> wrote:
> >
> > > We upgraded from .94.0 to .94.2 last week and have started to encounter
> > > infinite loops of region-transition on splits.  I'm not sure yet if
> it's
> > > all splits nor if it's related to load.  Solution so far has been to
> > > restart the regionserver process.
> > >
> > > log snippet:
> > > http://pastebin.com/LpienZ7B
> > >
> > > It's repeating these two lines:
> > > 2012-11-02 01:35:33,312 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x13ab46479832b76 Attempting to transition node
> > > cf3e9bc069e1888983c06dc8e053ffcf from RS_ZK_REGION_SPLIT to
> > > RS_ZK_REGION_SPLIT
> > > 2012-11-02 01:35:33,364 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > regionserver:60020-0x13ab46479832b76 Successfully transitioned node
> > > cf3e9bc069e1888983c06dc8e053ffcf from RS_ZK_REGION_SPLIT to
> > > RS_ZK_REGION_SPLIT
> > >
> > > with the occasional:
> > > 2012-11-02 01:35:34,476 DEBUG
> > > org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on
> > the
> > > master to process the split for cf3e9bc069e1888983c06dc8e053ffcf
> > >
> > > Should the region transition from RS_ZK_REGION_SPLIT to itself?  It
> looks
> > > wrong, but I'm not familiar with the code at all.
> > >
> > > Thanks,
> > > Matt
> > >
> >
>

Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2

Reply via email to