Re: Unable to perform list/create after startup

Jean-Daniel Cryans Fri, 13 Aug 2010 17:36:54 -0700

Ah very helpful, see how .META. is getting reassigned even if it has a
valid assignment? Some environments get this for some reason, and this
is fixed by https://issues.apache.org/jira/browse/HBASE-2599 which you
will need to apply on your hbase.


J-D

On Fri, Aug 13, 2010 at 5:22 PM, Marchwiak, Patrick D.
<[email protected]> wrote:
> I've attached the log.
>
> One more thing I'll add is that the the stop-hbase.sh script hangs hangs on
> the "stopping master..." line so I had to manually kill the Hmaster process
> before doing a restart.
>
> On 8/13/10 5:00 PM, "Jean-Daniel Cryans" <[email protected]> wrote:
>
>> A clean log of a full master startup would be really useful, can't
>> tell much more by the current info you provided.
>>
>> J-D
>>
>> On Fri, Aug 13, 2010 at 4:50 PM, Marchwiak, Patrick D.
>> <[email protected]> wrote:
>>> I am having issues performing any operations (list/create/put) on my hbase
>>> instance once it starts up.
>>>
>>> The environment:
>>> Red Hat 5.5
>>> Hadoop 0.20.2
>>> HBase 0.20.4
>>> java 1.6.0_20
>>> 1 running master
>>> 23 running regionserver + 3 also running zookeeper
>>>
>>> When attemting to do a list from the hbase shell it returns this error:
>>> NativeException: org.apache.hadoop.hbase.MasterNotRunningException: null
>>>
>>> When attempting to perform inserts from a hadoop job I see the following
>>> error in my application:
>>>
>>> 2010-08-13 14:03:22.207 INFO  [main] JobClient:1317 Task Id :
>>> attempt_201006091333_0031_m_000000_0, Status : FAILED
>>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
>>> to locate root region
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootReg
>>> ion(HConnectionManager.java:930)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>>> HConnectionManager.java:581)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
>>> n(HConnectionManager.java:563)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
>>> nMeta(HConnectionManager.java:694)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>>> HConnectionManager.java:590)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
>>> n(HConnectionManager.java:563)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
>>> nMeta(HConnectionManager.java:694)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>>> HConnectionManager.java:594)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>>> HConnectionManager.java:557)
>>>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:127)
>>> ...
>>>
>>> Now contrary to what the shell is reporting, the HMaster process is
>>> definitely running (along with HRegionServer and HQuorumPeer on the
>>> appropriate other nodes in the cluster). I do not see any errors in the
>>> master log, though interestingly I noticed a log message mentioning only 7
>>> region servers - in fact there are more than twice that many in the cluster.
>>>
>>> 2010-08-13 14:04:32,018 INFO org.apache.hadoop.hbase.master.ServerManager: 7
>>> region servers, 0 dead, average load 3.142857142857143
>>>
>>> The last clue I have is some exceptions in the zookeeper logs:
>>>
>>> 2010-08-13 13:34:16,041 WARN
>>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
>>> processing sessionid:0x12a6d2847e40000 type:create cxid:0x28
>>> zxid:0xfffffffffffffffe txntype:unknown n/a
>>> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
>>> NodeExists
>>>        at
>>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
>>> or.java:245)
>>>        at
>>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
>>> va:114)
>>> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> Connected to /128.115.210.161:35883 lastZxid 0
>>> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> Creating new session 0x12a6d2847e40001
>>> 2010-08-13 14:05:08,800 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> Finished init of 0x12a6d2847e40001 valid:true
>>> 2010-08-13 14:05:08,802 WARN
>>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
>>> processing sessionid:0x12a6d2847e40001 type:create cxid:0x1
>>> zxid:0xfffffffffffffffe txntype:unknown n/a
>>> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
>>> NodeExists
>>>        at
>>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
>>> or.java:245)
>>>        at
>>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
>>> va:114)
>>> 2010-08-13 14:05:09,762 WARN org.apache.zookeeper.server.NIOServerCnxn:
>>> Exception causing close of session 0x12a6d2847e40001 due to
>>> java.io.IOException: Read error
>>> 2010-08-13 14:05:09,763 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> closing session:0x12a6d2847e40001 NIOServerCnxn:
>>> java.nio.channels.SocketChannel[connected local=/128.115.210.149:2181
>>> remote=/128.115.210.161:35883]
>>>
>>> HBase was running on this cluster a few months ago so I doubt it is a
>>> blatant misconfiguration at fault. I've tried restarting everything hbase or
>>> hadoop related as well as wiping out the hbase data directory on hdfs to
>>> start fresh with no result. Any hints or suggestions as to what the problem
>>> might be are greatly appreciated. Thanks!
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>

Re: Unable to perform list/create after startup

Reply via email to