one of my zookeepers was unhappy, and did not report /hbase directory,
I shut it down, and things started to work much better.

-Jack

On Fri, Oct 22, 2010 at 10:56 AM, Stack <[email protected]> wrote:
> Hmm... does it emit that message once or continuously.  In log we emit
> the ensemble we're trying to contact.  Does it look correct?  When the
> machine is having this issue next time, try running the zk cmdline
> client and see if you can see a znode at /hbase/master:
>
> $ ./bin/hbase org.apache.zookeeper.ZooKeeperMain -server HOST:PORT
>
> Where HOST:PORT are what the RS is reporting for zk ensemble.
>
> Once you have the zk cmdline client up, do something like
>
> ls /hbase
>
>
> ....
>
>
> St.Ack
>
> On Fri, Oct 22, 2010 at 10:42 AM, Jack Levin <[email protected]> wrote:
>> Same ZK all the time, restart of regionserver clears the issue.  I
>> even see them talking to ZK via tcpdump, is there a way to enable
>> debug log output on ZK to see with might be going on?
>>
>> -Jack
>>
>> On Fri, Oct 22, 2010 at 10:28 AM, Stack <[email protected]> wrote:
>>> Are they pointed to the same zk ensemble as the other 22 servers? That
>>> is, are they running with the same config?  The below complaint is
>>> that the regionserver is not seeing master register, perhaps because
>>> they are homed at the wrong location in zk or because they are going
>>> to a different zk?
>>> St.Ack
>>>
>>> On Fri, Oct 22, 2010 at 8:34 AM, Jack Levin <[email protected]> wrote:
>>>> I have 30 region servers, after cold restart (master, zookepeers, and
>>>> all regionservers), 22 regionservers start, but the other 8 have
>>>> following errors,
>>>> any idea how to debug this?  Is zookeeper giving the RS wrong msg?
>>>> Can I log it via tcpdump maybe?
>>>>
>>>> 2010-10-22 08:32:42,035 WARN
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read
>>>> master address from ZooKeeper. Retrying. Error was:
>>>> java.io.IOException:
>>>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
>>>> = NoNode for /hbase/master
>>>>        at 
>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:481)
>>>>        at 
>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:377)
>>>>        at 
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1289)
>>>>        at 
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1320)
>>>>        at 
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:519)
>>>>        at java.lang.Thread.run(Thread.java:619)
>>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>>>> KeeperErrorCode = NoNode for /hbase/master
>>>>        at 
>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>>>        at 
>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
>>>>        at 
>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:477)
>>>>        ... 5 more
>>>>
>>>
>>
>

Reply via email to