A new version of HBCK should be available soon which will detect and repair this situation.
Hopefully we can have a patch up tomorrow. JG > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of > Stack > Sent: Tuesday, August 24, 2010 10:09 AM > To: [email protected] > Subject: Re: Regions offlined.. > > On Tue, Aug 24, 2010 at 7:40 AM, Vidhyashankar Venkataraman > <[email protected]> wrote: > > I keep getting 1 or 2 out of 80000 regions offlined (and this is on a > version where the offline-region bug was fixed: see below for the > link). Can you guys let me know a likely cause? > > > > If same stacktrace as pasted in previous message, please provide more > from the log file. I'd like to see how the scenario came about. We > want to cut another 0.89 in next day or so. Would be good to get fix > in for your issue if it not fixed already. > > > I was restarting the db as a way to sidestep for now, but it takes a > long time to enable the db contents.. > > <property> > <name>hbase.regions.percheckin</name> > <value>10</value> > <description>Maximum number of regions that can be assigned in a > single go > to a region server. > </description> > </property> > > Make the above setting 100 for your case. > > > Other ways I can think of are 1) deleting those entries from the META > table and reinsert > > 2) Is it possible to manually override the state in zk? > > > Yes, you can manually edit zk. Its messy but its no different than > updating a row in a table. > > Are the regions offlined or is there a hole in the table? > > If you do: > > echo "scan '.META.'" | ./bin/hbase shell --format-width=300 &> > /tmp/meta.txt > > .. can you find the rows that are giving you issue and search their > location in meta.txt and see if offlined or missing regions? > > (I can take a look if you want me to send me meta.txt and the problem > rows on back channel?) > > St.Ack > > > Can you let me know what can be done to get around this problem for > now? > > > > Thank you > > Vidhya > > > > > > On 8/20/10 5:01 PM, "Vidhyashankar Venkataraman" <vidhy...@yahoo- > inc.com> wrote: > > > > Changes.txt says that this particular issue was fixed.. > > > > Could there be another reason why I see this problem? > > > > I know that restarting might resolve this issue but I just wanted to > check with you guys the potential cause for the problem.. > > > > Thank you > > Vidhya > > > > On 8/20/10 4:56 PM, "Jean-Daniel Cryans" <[email protected]> wrote: > > > > 0.89 are snapshots of trunk, so you may or may not have it in your > > version. Check you CHANGES.txt file to be sure. > > > > J-D > > > > On Fri, Aug 20, 2010 at 4:52 PM, Vidhyashankar Venkataraman > > <[email protected]> wrote: > >> I am seeing a couple of regions offlined by the master because of an > exception (attached below) at the RS to which the master tried to > assign... > >> > >> The following jira says the issue has been resolved: But the change > is in 0.90.. I am using 0.89 right now: Can you guys let me know of > what changes went into 0.89 and what did not? > >> > >> https://issues.apache.org/jira/browse/HBASE- > 2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment- > tabpanel&focusedCommentId=12891806#action_12891806 > >> > >> Thank you > >> Vidhya > >> > >> > >> 2010-08-20 19:18:27,333 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper > event, state: SyncConnected, type: NodeDataChanged, path: > /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d > >> 2010-08-20 19:18:27,335 WARN > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: > <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b > 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,b3130247.yst.yahoo. > net,60020,1282326954084>Failed to write data to ZooKeeper > >> org.apache.zookeeper.KeeperException$BadVersionException: > KeeperErrorCode = BadVersion for > /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d > >> at > org.apache.zookeeper.KeeperException.create(KeeperException.java:106) > >> at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > >> at > org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038) > >> at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeper > Wrapper.java:1062) > >> at > org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEve > ntData(RSZookeeperUpdater.java:161) > >> at > org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpen > Event(RSZookeeperUpdater.java:115) > >> at > org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionSe > rver.java:1441) > >> at > org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionSe > rver.java:1350) > >> at java.lang.Thread.run(Thread.java:619) > >> 2010-08-20 19:18:27,335 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening > DocDB,0000010644000000,1282331147892.5da7abbffde229aaab56382c3812363d. > >> java.io.IOException: > org.apache.zookeeper.KeeperException$BadVersionException: > KeeperErrorCode = BadVersion for > /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d > >> at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeper > Wrapper.java:1072) > >> at > org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEve > ntData(RSZookeeperUpdater.java:161) > >> at > org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpen > Event(RSZookeeperUpdater.java:115) > >> at > org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionSe > rver.java:1441) > >> at > org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionSe > rver.java:1350) > >> at java.lang.Thread.run(Thread.java:619) > >> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: > KeeperErrorCode = BadVersion for > /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d > >> at > org.apache.zookeeper.KeeperException.create(KeeperException.java:106) > >> at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > >> at > org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038) > >> at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeper > Wrapper.java:1062) > >> ... 5 more > >> 2010-08-20 19:18:27,336 ERROR > org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Aborting open > of region 5da7abbffde229aaab56382c3812363d > >> 2010-08-20 19:18:27,337 DEBUG > org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode > /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d with > [RS2ZK_REGION_CLOSED] expected version = 2 > >> > >> > >> > >> > > > > > >
