A new version of HBCK should be available soon which will detect and repair 
this situation.

Hopefully we can have a patch up tomorrow.

JG

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Stack
> Sent: Tuesday, August 24, 2010 10:09 AM
> To: [email protected]
> Subject: Re: Regions offlined..
> 
> On Tue, Aug 24, 2010 at 7:40 AM, Vidhyashankar Venkataraman
> <[email protected]> wrote:
> > I keep getting 1 or 2 out of 80000 regions offlined (and this is on a
> version where the offline-region bug was fixed: see below for the
> link). Can you guys let me know a likely cause?
> >
> 
> If same stacktrace as pasted in previous message, please provide more
> from the log file.  I'd like to see how the scenario came about.  We
> want to cut another 0.89 in next day or so.  Would be good to get fix
> in for your issue if it not fixed already.
> 
> > I was restarting the db as a way to sidestep for now, but it takes a
> long time to enable the db contents..
> 
>   <property>
>     <name>hbase.regions.percheckin</name>
>     <value>10</value>
>     <description>Maximum number of regions that can be assigned in a
> single go
>     to a region server.
>     </description>
>   </property>
> 
> Make the above setting 100 for your case.
> 
> 
> Other ways I can think of are 1) deleting those entries from the META
> table and reinsert
> >       2) Is it possible to manually override the state in zk?
> 
> 
> Yes, you can manually edit zk. Its messy but its no different than
> updating a row in a table.
> 
> Are the regions offlined or is there a hole in the table?
> 
> If you do:
> 
> echo "scan '.META.'" | ./bin/hbase shell --format-width=300  &>
> /tmp/meta.txt
> 
> .. can you find the rows that are giving you issue and search their
> location in meta.txt and see if offlined or missing regions?
> 
> (I can take a look if you want me to send me meta.txt and the problem
> rows on back channel?)
> 
> St.Ack
> 
> >  Can you let me know what can be done to get around this problem for
> now?
> >
> > Thank you
> > Vidhya
> >
> >
> > On 8/20/10 5:01 PM, "Vidhyashankar Venkataraman" <vidhy...@yahoo-
> inc.com> wrote:
> >
> > Changes.txt says that this particular issue was fixed..
> >
> > Could there be another reason why I see this problem?
> >
> > I know that restarting might resolve this issue but I just wanted to
> check with you guys the potential cause for the problem..
> >
> > Thank you
> > Vidhya
> >
> > On 8/20/10 4:56 PM, "Jean-Daniel Cryans" <[email protected]> wrote:
> >
> > 0.89 are snapshots of trunk, so you may or may not have it in your
> > version. Check you CHANGES.txt file to be sure.
> >
> > J-D
> >
> > On Fri, Aug 20, 2010 at 4:52 PM, Vidhyashankar Venkataraman
> > <[email protected]> wrote:
> >> I am seeing a couple of regions offlined by the master because of an
> exception (attached below) at the RS to which the master tried to
> assign...
> >>
> >>  The following jira says the issue has been resolved: But the change
> is in 0.90.. I am using 0.89 right now: Can you guys let me know of
> what changes went into  0.89 and what did not?
> >>
> >> https://issues.apache.org/jira/browse/HBASE-
> 2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=12891806#action_12891806
> >>
> >> Thank you
> >> Vidhya
> >>
> >>
> >> 2010-08-20 19:18:27,333 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper
> event, state: SyncConnected, type: NodeDataChanged, path:
> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
> >> 2010-08-20 19:18:27,335 WARN
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b
> 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,b3130247.yst.yahoo.
> net,60020,1282326954084>Failed to write data to ZooKeeper
> >> org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
> >>        at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
> >>        at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> >>        at
> org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
> >>        at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeper
> Wrapper.java:1062)
> >>        at
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEve
> ntData(RSZookeeperUpdater.java:161)
> >>        at
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpen
> Event(RSZookeeperUpdater.java:115)
> >>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionSe
> rver.java:1441)
> >>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionSe
> rver.java:1350)
> >>        at java.lang.Thread.run(Thread.java:619)
> >> 2010-08-20 19:18:27,335 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening
> DocDB,0000010644000000,1282331147892.5da7abbffde229aaab56382c3812363d.
> >> java.io.IOException:
> org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
> >>        at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeper
> Wrapper.java:1072)
> >>        at
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEve
> ntData(RSZookeeperUpdater.java:161)
> >>        at
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpen
> Event(RSZookeeperUpdater.java:115)
> >>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionSe
> rver.java:1441)
> >>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionSe
> rver.java:1350)
> >>        at java.lang.Thread.run(Thread.java:619)
> >> Caused by: org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
> >>        at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
> >>        at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> >>        at
> org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
> >>        at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeper
> Wrapper.java:1062)
> >>        ... 5 more
> >> 2010-08-20 19:18:27,336 ERROR
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Aborting open
> of region 5da7abbffde229aaab56382c3812363d
> >> 2010-08-20 19:18:27,337 DEBUG
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode
> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d with
> [RS2ZK_REGION_CLOSED] expected version = 2
> >>
> >>
> >>
> >>
> >
> >
> >

Reply via email to