On Tue, Aug 24, 2010 at 7:40 AM, Vidhyashankar Venkataraman
<[email protected]> wrote:
> I keep getting 1 or 2 out of 80000 regions offlined (and this is on a version 
> where the offline-region bug was fixed: see below for the link). Can you guys 
> let me know a likely cause?
>

If same stacktrace as pasted in previous message, please provide more
from the log file.  I'd like to see how the scenario came about.  We
want to cut another 0.89 in next day or so.  Would be good to get fix
in for your issue if it not fixed already.

> I was restarting the db as a way to sidestep for now, but it takes a long 
> time to enable the db contents..

  <property>
    <name>hbase.regions.percheckin</name>
    <value>10</value>
    <description>Maximum number of regions that can be assigned in a single go
    to a region server.
    </description>
  </property>

Make the above setting 100 for your case.


Other ways I can think of are 1) deleting those entries from the META
table and reinsert
>       2) Is it possible to manually override the state in zk?


Yes, you can manually edit zk. Its messy but its no different than
updating a row in a table.

Are the regions offlined or is there a hole in the table?

If you do:

echo "scan '.META.'" | ./bin/hbase shell --format-width=300  &> /tmp/meta.txt

.. can you find the rows that are giving you issue and search their
location in meta.txt and see if offlined or missing regions?

(I can take a look if you want me to send me meta.txt and the problem
rows on back channel?)

St.Ack

>  Can you let me know what can be done to get around this problem for now?
>
> Thank you
> Vidhya
>
>
> On 8/20/10 5:01 PM, "Vidhyashankar Venkataraman" <[email protected]> 
> wrote:
>
> Changes.txt says that this particular issue was fixed..
>
> Could there be another reason why I see this problem?
>
> I know that restarting might resolve this issue but I just wanted to check 
> with you guys the potential cause for the problem..
>
> Thank you
> Vidhya
>
> On 8/20/10 4:56 PM, "Jean-Daniel Cryans" <[email protected]> wrote:
>
> 0.89 are snapshots of trunk, so you may or may not have it in your
> version. Check you CHANGES.txt file to be sure.
>
> J-D
>
> On Fri, Aug 20, 2010 at 4:52 PM, Vidhyashankar Venkataraman
> <[email protected]> wrote:
>> I am seeing a couple of regions offlined by the master because of an 
>> exception (attached below) at the RS to which the master tried to assign...
>>
>>  The following jira says the issue has been resolved: But the change is in 
>> 0.90.. I am using 0.89 right now: Can you guys let me know of what changes 
>> went into  0.89 and what did not?
>>
>> https://issues.apache.org/jira/browse/HBASE-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891806#action_12891806
>>
>> Thank you
>> Vidhya
>>
>>
>> 2010-08-20 19:18:27,333 INFO 
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, 
>> state: SyncConnected, type: NodeDataChanged, path: 
>> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>> 2010-08-20 19:18:27,335 WARN 
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: 
>> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,b3130247.yst.yahoo.net,60020,1282326954084>Failed
>>  to write data to ZooKeeper
>> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
>> BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>>        at 
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>>        at 
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>>        at 
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
>>        at 
>> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
>>        at 
>> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
>>        at 
>> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1441)
>>        at 
>> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1350)
>>        at java.lang.Thread.run(Thread.java:619)
>> 2010-08-20 19:18:27,335 ERROR 
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening 
>> DocDB,0000010644000000,1282331147892.5da7abbffde229aaab56382c3812363d.
>> java.io.IOException: 
>> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
>> BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>>        at 
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1072)
>>        at 
>> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
>>        at 
>> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
>>        at 
>> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1441)
>>        at 
>> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1350)
>>        at java.lang.Thread.run(Thread.java:619)
>> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
>> KeeperErrorCode = BadVersion for 
>> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>>        at 
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>>        at 
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>>        at 
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
>>        ... 5 more
>> 2010-08-20 19:18:27,336 ERROR 
>> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Aborting open of 
>> region 5da7abbffde229aaab56382c3812363d
>> 2010-08-20 19:18:27,337 DEBUG 
>> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode 
>> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d with 
>> [RS2ZK_REGION_CLOSED] expected version = 2
>>
>>
>>
>>
>
>
>

Reply via email to