Hi,

A developer on our team created a table today and something failed and
we fell back into the dire scenario we were in earlier this week. When
I got on the scene 2 of our 4 regions had crashed. When I brought them
back up, they wouldn't come online and the master was scrolling
messages like those in
https://issues.apache.org/jira/browse/HBASE-3406.

I'm running 0.90.0-rc1 and CDH3b2 with append enabled.

I shut down the entire cluster + zookeeper and restarted it. Now, I'm
getting two types of errors and the cluster won't come up:

- On one of the regionservers:
2011-01-25 15:12:00,287 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; Region is not online: -ROOT-,,0

- And on the master this scrolls every few seconds. the log file
referenced is empty in HDFS.
2011-01-25 15:12:26,897 WARN org.apache.hadoop.hbase.util.FSUtils:
Waited 275444ms for lease recovery on
hdfs://mymaster.com:9000/hbase-app/hbase/.logs/hadoop-wkr-r14-n1.mydomain.com,60020,1295900457489/hadoop-wkr-r14-n1.mydomain.com%3A60020.1295907659592:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
failed to create file
/hbase-app/hbase/.logs/hadoop-wkr-r14-n1.mydomain.com,60020,1295900457489/hadoop-wkr-r14-n1.mydomain.com%3A60020.1295907659592
for DFSClient_hb_m_mymaster.com:60000_1295996847777 on client
10.14.98.90, because this file is already being created by NN_Recovery
on 10.10.220.15
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1093)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1181)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:422)
        at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)

Any suggestions for how to get the -ROOT- back? I can see it in HDFS.

thanks,
Bill

Reply via email to