Hmm... does it emit that message once or continuously. In log we emit the ensemble we're trying to contact. Does it look correct? When the machine is having this issue next time, try running the zk cmdline client and see if you can see a znode at /hbase/master:
$ ./bin/hbase org.apache.zookeeper.ZooKeeperMain -server HOST:PORT Where HOST:PORT are what the RS is reporting for zk ensemble. Once you have the zk cmdline client up, do something like ls /hbase .... St.Ack On Fri, Oct 22, 2010 at 10:42 AM, Jack Levin <[email protected]> wrote: > Same ZK all the time, restart of regionserver clears the issue. I > even see them talking to ZK via tcpdump, is there a way to enable > debug log output on ZK to see with might be going on? > > -Jack > > On Fri, Oct 22, 2010 at 10:28 AM, Stack <[email protected]> wrote: >> Are they pointed to the same zk ensemble as the other 22 servers? That >> is, are they running with the same config? The below complaint is >> that the regionserver is not seeing master register, perhaps because >> they are homed at the wrong location in zk or because they are going >> to a different zk? >> St.Ack >> >> On Fri, Oct 22, 2010 at 8:34 AM, Jack Levin <[email protected]> wrote: >>> I have 30 region servers, after cold restart (master, zookepeers, and >>> all regionservers), 22 regionservers start, but the other 8 have >>> following errors, >>> any idea how to debug this? Is zookeeper giving the RS wrong msg? >>> Can I log it via tcpdump maybe? >>> >>> 2010-10-22 08:32:42,035 WARN >>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read >>> master address from ZooKeeper. Retrying. Error was: >>> java.io.IOException: >>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode >>> = NoNode for /hbase/master >>> at >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:481) >>> at >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:377) >>> at >>> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1289) >>> at >>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1320) >>> at >>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:519) >>> at java.lang.Thread.run(Thread.java:619) >>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >>> KeeperErrorCode = NoNode for /hbase/master >>> at >>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102) >>> at >>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921) >>> at >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:477) >>> ... 5 more >>> >> >
