This is at the root of the trouble with the REST server also I expect.
You said your ZooKeeper ensemble peer was unhappy? Can we see the logs? Did you
report this to the ZK guys?
Best regards,
- Andy
--- On Fri, 10/22/10, Jack Levin <[email protected]> wrote:
> From: Jack Levin <[email protected]>
> Subject: Re: cold restart/region servers issue
> To: [email protected]
> Date: Friday, October 22, 2010, 1:31 PM
> one of my zookeepers was unhappy, and
> did not report /hbase directory,
> I shut it down, and things started to work much better.
>
> -Jack
>
> On Fri, Oct 22, 2010 at 10:56 AM, Stack <[email protected]>
> wrote:
> > Hmm... does it emit that message once or continuously.
> In log we emit
> > the ensemble we're trying to contact. Does it look
> correct? When the
> > machine is having this issue next time, try running
> the zk cmdline
> > client and see if you can see a znode at
> /hbase/master:
> >
> > $ ./bin/hbase org.apache.zookeeper.ZooKeeperMain
> -server HOST:PORT
> >
> > Where HOST:PORT are what the RS is reporting for zk
> ensemble.
> >
> > Once you have the zk cmdline client up, do something
> like
> >
> > ls /hbase
> >
> >
> > ....
> >
> >
> > St.Ack
> >
> > On Fri, Oct 22, 2010 at 10:42 AM, Jack Levin <[email protected]>
> wrote:
> >> Same ZK all the time, restart of regionserver
> clears the issue. I
> >> even see them talking to ZK via tcpdump, is there
> a way to enable
> >> debug log output on ZK to see with might be going
> on?
> >>
> >> -Jack
> >>
> >> On Fri, Oct 22, 2010 at 10:28 AM, Stack <[email protected]>
> wrote:
> >>> Are they pointed to the same zk ensemble as
> the other 22 servers? That
> >>> is, are they running with the same config?
> The below complaint is
> >>> that the regionserver is not seeing master
> register, perhaps because
> >>> they are homed at the wrong location in zk or
> because they are going
> >>> to a different zk?
> >>> St.Ack
> >>>
> >>> On Fri, Oct 22, 2010 at 8:34 AM, Jack Levin
> <[email protected]>
> wrote:
> >>>> I have 30 region servers, after cold
> restart (master, zookepeers, and
> >>>> all regionservers), 22 regionservers
> start, but the other 8 have
> >>>> following errors,
> >>>> any idea how to debug this? Is zookeeper
> giving the RS wrong msg?
> >>>> Can I log it via tcpdump maybe?
> >>>>
> >>>> 2010-10-22 08:32:42,035 WARN
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable
> to read
> >>>> master address from ZooKeeper. Retrying.
> Error was:
> >>>> java.io.IOException:
> >>>>
> org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode
> >>>> = NoNode for /hbase/master
> >>>> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:481)
> >>>> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:377)
> >>>> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1289)
> >>>> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1320)
> >>>> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:519)
> >>>> at
> java.lang.Thread.run(Thread.java:619)
> >>>> Caused by:
> org.apache.zookeeper.KeeperException$NoNodeException:
> >>>> KeeperErrorCode = NoNode for
> /hbase/master
> >>>> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> >>>> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> >>>> at
> org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
> >>>> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:477)
> >>>> ... 5 more
> >>>>
> >>>
> >>
> >
>