Ah very helpful, see how .META. is getting reassigned even if it has a valid assignment? Some environments get this for some reason, and this is fixed by https://issues.apache.org/jira/browse/HBASE-2599 which you will need to apply on your hbase.
J-D On Fri, Aug 13, 2010 at 5:22 PM, Marchwiak, Patrick D. <[email protected]> wrote: > I've attached the log. > > One more thing I'll add is that the the stop-hbase.sh script hangs hangs on > the "stopping master..." line so I had to manually kill the Hmaster process > before doing a restart. > > On 8/13/10 5:00 PM, "Jean-Daniel Cryans" <[email protected]> wrote: > >> A clean log of a full master startup would be really useful, can't >> tell much more by the current info you provided. >> >> J-D >> >> On Fri, Aug 13, 2010 at 4:50 PM, Marchwiak, Patrick D. >> <[email protected]> wrote: >>> I am having issues performing any operations (list/create/put) on my hbase >>> instance once it starts up. >>> >>> The environment: >>> Red Hat 5.5 >>> Hadoop 0.20.2 >>> HBase 0.20.4 >>> java 1.6.0_20 >>> 1 running master >>> 23 running regionserver + 3 also running zookeeper >>> >>> When attemting to do a list from the hbase shell it returns this error: >>> NativeException: org.apache.hadoop.hbase.MasterNotRunningException: null >>> >>> When attempting to perform inserts from a hadoop job I see the following >>> error in my application: >>> >>> 2010-08-13 14:03:22.207 INFO [main] JobClient:1317 Task Id : >>> attempt_201006091333_0031_m_000000_0, Status : FAILED >>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying >>> to locate root region >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootReg >>> ion(HConnectionManager.java:930) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion( >>> HConnectionManager.java:581) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio >>> n(HConnectionManager.java:563) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI >>> nMeta(HConnectionManager.java:694) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion( >>> HConnectionManager.java:590) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio >>> n(HConnectionManager.java:563) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI >>> nMeta(HConnectionManager.java:694) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion( >>> HConnectionManager.java:594) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion( >>> HConnectionManager.java:557) >>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:127) >>> ... >>> >>> Now contrary to what the shell is reporting, the HMaster process is >>> definitely running (along with HRegionServer and HQuorumPeer on the >>> appropriate other nodes in the cluster). I do not see any errors in the >>> master log, though interestingly I noticed a log message mentioning only 7 >>> region servers - in fact there are more than twice that many in the cluster. >>> >>> 2010-08-13 14:04:32,018 INFO org.apache.hadoop.hbase.master.ServerManager: 7 >>> region servers, 0 dead, average load 3.142857142857143 >>> >>> The last clue I have is some exceptions in the zookeeper logs: >>> >>> 2010-08-13 13:34:16,041 WARN >>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when >>> processing sessionid:0x12a6d2847e40000 type:create cxid:0x28 >>> zxid:0xfffffffffffffffe txntype:unknown n/a >>> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = >>> NodeExists >>> at >>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess >>> or.java:245) >>> at >>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja >>> va:114) >>> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> Connected to /128.115.210.161:35883 lastZxid 0 >>> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> Creating new session 0x12a6d2847e40001 >>> 2010-08-13 14:05:08,800 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> Finished init of 0x12a6d2847e40001 valid:true >>> 2010-08-13 14:05:08,802 WARN >>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when >>> processing sessionid:0x12a6d2847e40001 type:create cxid:0x1 >>> zxid:0xfffffffffffffffe txntype:unknown n/a >>> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = >>> NodeExists >>> at >>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess >>> or.java:245) >>> at >>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja >>> va:114) >>> 2010-08-13 14:05:09,762 WARN org.apache.zookeeper.server.NIOServerCnxn: >>> Exception causing close of session 0x12a6d2847e40001 due to >>> java.io.IOException: Read error >>> 2010-08-13 14:05:09,763 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> closing session:0x12a6d2847e40001 NIOServerCnxn: >>> java.nio.channels.SocketChannel[connected local=/128.115.210.149:2181 >>> remote=/128.115.210.161:35883] >>> >>> HBase was running on this cluster a few months ago so I doubt it is a >>> blatant misconfiguration at fault. I've tried restarting everything hbase or >>> hadoop related as well as wiping out the hbase data directory on hdfs to >>> start fresh with no result. Any hints or suggestions as to what the problem >>> might be are greatly appreciated. Thanks! >>> >>> >>> >>> >>> >>> >>> > >
