Are they pointed to the same zk ensemble as the other 22 servers? That is, are they running with the same config? The below complaint is that the regionserver is not seeing master register, perhaps because they are homed at the wrong location in zk or because they are going to a different zk? St.Ack
On Fri, Oct 22, 2010 at 8:34 AM, Jack Levin <[email protected]> wrote: > I have 30 region servers, after cold restart (master, zookepeers, and > all regionservers), 22 regionservers start, but the other 8 have > following errors, > any idea how to debug this? Is zookeeper giving the RS wrong msg? > Can I log it via tcpdump maybe? > > 2010-10-22 08:32:42,035 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read > master address from ZooKeeper. Retrying. Error was: > java.io.IOException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode > = NoNode for /hbase/master > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:481) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:377) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1289) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1320) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:519) > at java.lang.Thread.run(Thread.java:619) > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /hbase/master > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:102) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:477) > ... 5 more >
