one of my zookeepers was unhappy, and did not report /hbase directory, I shut it down, and things started to work much better.
-Jack On Fri, Oct 22, 2010 at 10:56 AM, Stack <[email protected]> wrote: > Hmm... does it emit that message once or continuously. In log we emit > the ensemble we're trying to contact. Does it look correct? When the > machine is having this issue next time, try running the zk cmdline > client and see if you can see a znode at /hbase/master: > > $ ./bin/hbase org.apache.zookeeper.ZooKeeperMain -server HOST:PORT > > Where HOST:PORT are what the RS is reporting for zk ensemble. > > Once you have the zk cmdline client up, do something like > > ls /hbase > > > .... > > > St.Ack > > On Fri, Oct 22, 2010 at 10:42 AM, Jack Levin <[email protected]> wrote: >> Same ZK all the time, restart of regionserver clears the issue. I >> even see them talking to ZK via tcpdump, is there a way to enable >> debug log output on ZK to see with might be going on? >> >> -Jack >> >> On Fri, Oct 22, 2010 at 10:28 AM, Stack <[email protected]> wrote: >>> Are they pointed to the same zk ensemble as the other 22 servers? That >>> is, are they running with the same config? The below complaint is >>> that the regionserver is not seeing master register, perhaps because >>> they are homed at the wrong location in zk or because they are going >>> to a different zk? >>> St.Ack >>> >>> On Fri, Oct 22, 2010 at 8:34 AM, Jack Levin <[email protected]> wrote: >>>> I have 30 region servers, after cold restart (master, zookepeers, and >>>> all regionservers), 22 regionservers start, but the other 8 have >>>> following errors, >>>> any idea how to debug this? Is zookeeper giving the RS wrong msg? >>>> Can I log it via tcpdump maybe? >>>> >>>> 2010-10-22 08:32:42,035 WARN >>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read >>>> master address from ZooKeeper. Retrying. Error was: >>>> java.io.IOException: >>>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode >>>> = NoNode for /hbase/master >>>> at >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:481) >>>> at >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:377) >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1289) >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1320) >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:519) >>>> at java.lang.Thread.run(Thread.java:619) >>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >>>> KeeperErrorCode = NoNode for /hbase/master >>>> at >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102) >>>> at >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921) >>>> at >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:477) >>>> ... 5 more >>>> >>> >> >
