Re: Not running balancer because processing dead regionserver(s)

Yi Liang Fri, 25 Feb 2011 00:02:37 -0800

Thanks you Stack!

On Wed, Feb 23, 2011 at 6:25 AM, Stack <[email protected]> wrote:


> On Mon, Feb 21, 2011 at 10:04 PM, Yi Liang <[email protected]> wrote:
> > Yes, the server zcl crashed at that time.
> >
> > But after I restarted it later, it's still in the dead server list.
> >
>
> We failed processing its death:
>
> 2011-02-18 10:08:14,873 ERROR org.apache.hadoop.hbase.HServerAddress:
> Could not resolve the DNS name of zcl.local:60020
> 2011-02-18 10:08:14,874 ERROR
> org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while
> processing event M_SERVER_SHUTDOWN
> java.lang.IllegalArgumentException: Could not resolve the DNS name of
> zcl.local:60020
>        at
> org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
>        at
> org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:66)
>        at
> org.apache.hadoop.hbase.catalog.MetaReader.metaRowToRegionPairWithInfo(MetaReader.java:407)
>        at
> org.apache.hadoop.hbase.catalog.MetaReader.getServerUserRegions(MetaReader.java:594)
>        at
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:124)
>        at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
>
> It looks like the above exception caused us to jump out of the
> processing of the server shutdown.  Above is related to the no route
> to host.
>
> I filed HBASE-3556.  It'll be 'fixed' by HBASE-1501 but we should
> never just give up processing.  Need to look into that.
>
> While a server is in the dead servers list, we'll not run the
> balancer.  The dead servers list is an in-memory list.  You'd need to
> kill the master and bring it back up again to rid the dead server
> state.
>
> St.Ack
>
>
> > 2011-02-18 10:39:26,895 INFO
> org.apache.hadoop.hbase.master.ServerManager:
> > Registering server=zcl.local,60020,1297996817352, regionCount=0,
> > userLoad=false
> > 2011-02-18 10:39:35,062 DEBUG org.apache.hadoop.hbase.master.HMaster: Not
> > running balancer because processing dead regionserver(s):
> > [Docete.local,60020,1297919410096, liym.local,60020,1297919445796,
> > zcl.local,60020,1297919367472]
> >
> > On Tue, Feb 22, 2011 at 1:48 AM, Ted Yu <[email protected]> wrote:
> >
> >> Looks like there was connectivity issue:
> >>
> >> java.net.NoRouteToHostException: No route to host
> >>
> >> On Sun, Feb 20, 2011 at 10:09 PM, Yi Liang <[email protected]> wrote:
> >>
> >> > The related log is at: http://pastebin.com/0a1CjDUD
> >> >
> >> > It's ok now after restarting hbase, but still curious why it happend.
> >> >
> >> > Thanks,
> >> > Yi
> >> > On Sat, Feb 19, 2011 at 3:58 AM, Jean-Daniel Cryans <
> [email protected]
> >> > >wrote:
> >> >
> >> > > The master should finish processing those dead servers at some point
> >> > > and it seems it's not happening? Unfortunately without the log
> nobody
> >> > > can'tell why. If you can post the complete log in pastebin or put it
> >> > > on a web server then we could take a look.
> >> > >
> >> > > J-D
> >> > >
> >> > > On Fri, Feb 18, 2011 at 12:39 AM, Yi Liang <[email protected]>
> wrote:
> >> > > > Hi all,
> >> > > >
> >> > > > We have a hbase cluster with 10 region servers running HBase
> 0.90.0 +
> >> > > CDH3.
> >> > > > We're now importing big data into HBase.
> >> > > >
> >> > > > During the process, 2 servers crashed, but after restaring them,
> >> > they're
> >> > > no
> >> > > > longer assigned with any region, while regions on other servers
> keep
> >> > > > splitting when more data inserted.
> >> > > >
> >> > > > From the master log, we can see the periodical messages like:
> >> > > >
> >> > > > 2011-02-18 16:09:35,067 DEBUG
> org.apache.hadoop.hbase.master.HMaster:
> >> > Not
> >> > > > running balancer because processing dead regionserver(s):
> >> > > > [zcl.local,60020,1297996817352, qics.local,60020,1297919358488,
> >> > > > Docete.local,60020,1297919410096, liym.local,60020,1297919445796,
> >> > > > zcl.local,60020,1297919367472]
> >> > > >
> >> > > > zcl.local and qics.local are the machines we have restared, other
> 2
> >> > > machine
> >> > > > have kept running without restarting and are actually still
> serving
> >> > > regions.
> >> > > >
> >> > > > From the shell status:
> >> > > > 10 servers, 5 dead, 10.1000 average Load
> >> > > >
> >> > > > Why are there dead servers? And how to clear them so we could
> start
> >> > > > balancer?
> >> > > >
> >> > > > Thanks,
> >> > > > Yi
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: Not running balancer because processing dead regionserver(s)

Reply via email to