On Wed, Aug 10, 2011 at 7:05 PM, Gaojinchao <[email protected]> wrote:
> In my cluster(version 0.90.3) , The root table couldn't be opened when one 
> region server crashed because of gc.
>
> The logs show:
>
> // Master assigned the root table to 82
> 2011-07-28 21:34:34,710 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
> -ROOT-,,0.70236052 on 158-1-101-82,20020,1311885942386
>
> //The host of 82 crashed, master finished the split log and reassigned the 
> root and meta. But the region server didn't exit. So the root verified is 
> passed.
>  I think we shouldn't verify the root / meta in shutdownhandler processing
>


82 did not exit?

Why did the master replay its logs if it did not exit?


> 2011-07-28 22:19:53,746 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
> Added=158-1-101-82,20020,1311885942386 to dead servers, submitted shutdown 
> handler to be executed, root=true, meta=true


Isn't this the master handling 82 likes its been shutdown?


> 2011-07-28 22:28:30,577 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
> Server REPORT rejected; currently processing 158-1-101-82,20020,1311885942386 
> as dead server

So, it looks like 82 tried to come in (after GC I suppose) but we told
it go away.

Why did we not notice that -ROOT- was on 82 and as part of the
shutdown handling of 82, we reassigned it.  This is what you are
saying in your subsequent message (I like how you noticed the log
message that says 82 has root and meta).  I'm not sure why it did not
reassign root.  Its skipping something in shutdown handler or the
verify location for root has a bug in it where we are not considering
the fact that current server held -ROOT- so if verification returns
current server as holding -ROOT-, then we should ignore it.

Good stuff Gao,
St.Ack

Reply via email to