Why did the master replay its logs if it did not exit? Zk is expired because of gc. But region server isn't shutdown.
(I like how you noticed the log message that says 82 has root and meta) Added=158-1-101-82,20020,1311885942386 to dead servers, submitted shutdown handler to be executed, root=true, meta=true It said that 82 has root and meta. "root=true" shows the dead region server has root table. -----邮件原件----- 发件人: [email protected] [mailto:[email protected]] 代表 Stack 发送时间: 2011年8月16日 12:12 收件人: [email protected] 主题: Re: Root table couldn't be opened On Wed, Aug 10, 2011 at 7:05 PM, Gaojinchao <[email protected]> wrote: > In my cluster(version 0.90.3) , The root table couldn't be opened when one > region server crashed because of gc. > > The logs show: > > // Master assigned the root table to 82 > 2011-07-28 21:34:34,710 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region > -ROOT-,,0.70236052 on 158-1-101-82,20020,1311885942386 > > //The host of 82 crashed, master finished the split log and reassigned the > root and meta. But the region server didn't exit. So the root verified is > passed. > I think we shouldn't verify the root / meta in shutdownhandler processing > 82 did not exit? Why did the master replay its logs if it did not exit? > 2011-07-28 22:19:53,746 DEBUG org.apache.hadoop.hbase.master.ServerManager: > Added=158-1-101-82,20020,1311885942386 to dead servers, submitted shutdown > handler to be executed, root=true, meta=true Isn't this the master handling 82 likes its been shutdown? > 2011-07-28 22:28:30,577 DEBUG org.apache.hadoop.hbase.master.ServerManager: > Server REPORT rejected; currently processing 158-1-101-82,20020,1311885942386 > as dead server So, it looks like 82 tried to come in (after GC I suppose) but we told it go away. Why did we not notice that -ROOT- was on 82 and as part of the shutdown handling of 82, we reassigned it. This is what you are saying in your subsequent message (I like how you noticed the log message that says 82 has root and meta). I'm not sure why it did not reassign root. Its skipping something in shutdown handler or the verify location for root has a bug in it where we are not considering the fact that current server held -ROOT- so if verification returns current server as holding -ROOT-, then we should ignore it. Good stuff Gao, St.Ack
