Yes, the server zcl crashed at that time. But after I restarted it later, it's still in the dead server list.
2011-02-18 10:39:26,895 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=zcl.local,60020,1297996817352, regionCount=0, userLoad=false 2011-02-18 10:39:35,062 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [Docete.local,60020,1297919410096, liym.local,60020,1297919445796, zcl.local,60020,1297919367472] On Tue, Feb 22, 2011 at 1:48 AM, Ted Yu <[email protected]> wrote: > Looks like there was connectivity issue: > > java.net.NoRouteToHostException: No route to host > > On Sun, Feb 20, 2011 at 10:09 PM, Yi Liang <[email protected]> wrote: > > > The related log is at: http://pastebin.com/0a1CjDUD > > > > It's ok now after restarting hbase, but still curious why it happend. > > > > Thanks, > > Yi > > On Sat, Feb 19, 2011 at 3:58 AM, Jean-Daniel Cryans <[email protected] > > >wrote: > > > > > The master should finish processing those dead servers at some point > > > and it seems it's not happening? Unfortunately without the log nobody > > > can'tell why. If you can post the complete log in pastebin or put it > > > on a web server then we could take a look. > > > > > > J-D > > > > > > On Fri, Feb 18, 2011 at 12:39 AM, Yi Liang <[email protected]> wrote: > > > > Hi all, > > > > > > > > We have a hbase cluster with 10 region servers running HBase 0.90.0 + > > > CDH3. > > > > We're now importing big data into HBase. > > > > > > > > During the process, 2 servers crashed, but after restaring them, > > they're > > > no > > > > longer assigned with any region, while regions on other servers keep > > > > splitting when more data inserted. > > > > > > > > From the master log, we can see the periodical messages like: > > > > > > > > 2011-02-18 16:09:35,067 DEBUG org.apache.hadoop.hbase.master.HMaster: > > Not > > > > running balancer because processing dead regionserver(s): > > > > [zcl.local,60020,1297996817352, qics.local,60020,1297919358488, > > > > Docete.local,60020,1297919410096, liym.local,60020,1297919445796, > > > > zcl.local,60020,1297919367472] > > > > > > > > zcl.local and qics.local are the machines we have restared, other 2 > > > machine > > > > have kept running without restarting and are actually still serving > > > regions. > > > > > > > > From the shell status: > > > > 10 servers, 5 dead, 10.1000 average Load > > > > > > > > Why are there dead servers? And how to clear them so we could start > > > > balancer? > > > > > > > > Thanks, > > > > Yi > > > > > > > > > >
