On Tue, Jan 25, 2011 at 3:27 PM, Bill Graham <[email protected]> wrote: > Hi, > > A developer on our team created a table today and something failed and > we fell back into the dire scenario we were in earlier this week. When > I got on the scene 2 of our 4 regions had crashed. When I brought them > back up, they wouldn't come online and the master was scrolling > messages like those in > https://issues.apache.org/jira/browse/HBASE-3406. > > I'm running 0.90.0-rc1 and CDH3b2 with append enabled. > Can you move to 0.90.0 release?
> I shut down the entire cluster + zookeeper and restarted it. Now, I'm > getting two types of errors and the cluster won't come up: > > - On one of the regionservers: > 2011-01-25 15:12:00,287 DEBUG > org.apache.hadoop.hbase.regionserver.HRegionServer: > NotServingRegionException; Region is not online: -ROOT-,,0 > Can I see master log around startup please? > - And on the master this scrolls every few seconds. the log file > referenced is empty in HDFS. > 2011-01-25 15:12:26,897 WARN org.apache.hadoop.hbase.util.FSUtils: > Waited 275444ms for lease recovery on > hdfs://mymaster.com:9000/hbase-app/hbase/.logs/hadoop-wkr-r14-n1.mydomain.com,60020,1295900457489/hadoop-wkr-r14-n1.mydomain.com%3A60020.1295907659592:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: > failed to create file > /hbase-app/hbase/.logs/hadoop-wkr-r14-n1.mydomain.com,60020,1295900457489/hadoop-wkr-r14-n1.mydomain.com%3A60020.1295907659592 > for DFSClient_hb_m_mymaster.com:60000_1295996847777 on client > 10.14.98.90, because this file is already being created by NN_Recovery > on 10.10.220.15 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1093) > As Ryan says, this would seem to indicate the owning RegionServer is still up. Is that the case? Did the restart of the cluster for sure put down al RSs? at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1181) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:422) > at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) > > Any suggestions for how to get the -ROOT- back? I can see it in HDFS. > Root will come back once master moves past log file splitting. St.Ack
