Thanks for your reply. It seems hbase-4168 can't fix my case. In my case ,there is not any error in master logs. Master can visit region server and get the region information.
Master logs said that it is a region server with root and meta. But It only had assigned meta and skipped the root. //it said that it is a region server with root and meta. 2011-07-28 22:19:53,746 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=158-1-101-82,20020,1311885942386 to dead servers, submitted shutdown handler to be executed, root=true, meta=true > 2011-07-28 22:19:53,746 DEBUG org.apache.hadoop.hbase.master.ServerManager: > Added=158-1-101-82,20020,1311885942386 to dead servers, submitted shutdown > handler to be executed, root=true, meta=true > 2011-07-28 22:25:10,085 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file splitting > completed in 316329 ms for > hdfs://158.1.101.82:9000/hbase/.logs/158-1-101-82,20020,1311885942386 > 2011-07-28 22:26:54,790 DEBUG org.apache.hadoop.hbase.master.HMaster: Not > running balancer because processing dead regionserver(s): > [158-1-101-82,20020,1311885942386] > 2011-07-28 22:27:11,176 WARN org.apache.hadoop.hbase.master.CatalogJanitor: > Failed scan of catalog table > java.net.SocketTimeoutException: Call to 158-1-101-82/158.1.101.82:20020 > failed on socket timeout exception: java.net.SocketTimeoutException: 60000 > millis timeout while waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/158.1.101.82:57428 > remote=158-1-101-82/158.1.101.82:20020] > at > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) > at $Proxy6.getRegionInfo(Unknown Source) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:426) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:273) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:333) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:366) > at > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:255) > at > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:237) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:139) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:88) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > Caused by: java.net.SocketTimeoutException: 60000 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/158.1.101.82:57428 > remote=158-1-101-82/158.1.101.82:20020] > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.FilterInputStream.read(FilterInputStream.java:116) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at java.io.DataInputStream.readInt(DataInputStream.java:370) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477) > 2011-07-28 22:28:30,577 DEBUG org.apache.hadoop.hbase.master.ServerManager: > Server REPORT rejected; currently processing 158-1-101-82,20020,1311885942386 > as dead server // It said that meta was assigned . but Root had no logs. The above logs said exception is CatalogJanitor not shutdownhandle. > 2011-07-28 22:28:37,591 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:20000-0x23171e103d10018 Creating (or updating) unassigned node for > 1028785192 with OFFLINE state > 2011-07-28 22:28:37,704 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan > was found (or we are ignoring an existing plan) for .META.,,1.1028785192 so > generated a random one; hri=.META.,,1.1028785192, src=, > dest=158-1-101-202,20020,1311878322145; 2 (online=2, exclude=null) available > servers > 2011-07-28 22:28:37,704 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > .META.,,1.1028785192 to 158-1-101-202,20020,1311878322145 > 2011-07-28 22:28:37,733 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=M_ZK_REGION_OFFLINE, server=158-1-101-82:20000, > region=1028785192/.META. > 2011-07-28 22:28:37,766 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, server=158-1-101-202,20020,1311878322145, > region=1028785192/.META. -----邮件原件----- 发件人: Ted Yu [mailto:[email protected]] 发送时间: 2011年8月11日 10:16 收件人: [email protected] 抄送: [email protected] 主题: Re: Root table couldn't be opened I think this is hbase-4168 whose fix got committed today. We run 0.90.4 patched with hbase-4168. Please try the patch. On Aug 10, 2011, at 7:05 PM, Gaojinchao <[email protected]> wrote: > In my cluster(version 0.90.3) , The root table couldn't be opened when one > region server crashed because of gc. > > The logs show: > > // Master assigned the root table to 82 > 2011-07-28 21:34:34,710 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region > -ROOT-,,0.70236052 on 158-1-101-82,20020,1311885942386 > > //The host of 82 crashed, master finished the split log and reassigned the > root and meta. But the region server didn't exit. So the root verified is > passed. > I think we shouldn't verify the root / meta in shutdownhandler processing > > 2011-07-28 22:19:53,746 DEBUG org.apache.hadoop.hbase.master.ServerManager: > Added=158-1-101-82,20020,1311885942386 to dead servers, submitted shutdown > handler to be executed, root=true, meta=true > 2011-07-28 22:25:10,085 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file splitting > completed in 316329 ms for > hdfs://158.1.101.82:9000/hbase/.logs/158-1-101-82,20020,1311885942386 > 2011-07-28 22:26:54,790 DEBUG org.apache.hadoop.hbase.master.HMaster: Not > running balancer because processing dead regionserver(s): > [158-1-101-82,20020,1311885942386] > 2011-07-28 22:27:11,176 WARN org.apache.hadoop.hbase.master.CatalogJanitor: > Failed scan of catalog table > java.net.SocketTimeoutException: Call to 158-1-101-82/158.1.101.82:20020 > failed on socket timeout exception: java.net.SocketTimeoutException: 60000 > millis timeout while waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/158.1.101.82:57428 > remote=158-1-101-82/158.1.101.82:20020] > at > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) > at $Proxy6.getRegionInfo(Unknown Source) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:426) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:273) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:333) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:366) > at > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:255) > at > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:237) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:139) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:88) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > Caused by: java.net.SocketTimeoutException: 60000 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/158.1.101.82:57428 > remote=158-1-101-82/158.1.101.82:20020] > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.FilterInputStream.read(FilterInputStream.java:116) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at java.io.DataInputStream.readInt(DataInputStream.java:370) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477) > 2011-07-28 22:28:30,577 DEBUG org.apache.hadoop.hbase.master.ServerManager: > Server REPORT rejected; currently processing 158-1-101-82,20020,1311885942386 > as dead server > 2011-07-28 22:28:37,591 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:20000-0x23171e103d10018 Creating (or updating) unassigned node for > 1028785192 with OFFLINE state > 2011-07-28 22:28:37,704 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan > was found (or we are ignoring an existing plan) for .META.,,1.1028785192 so > generated a random one; hri=.META.,,1.1028785192, src=, > dest=158-1-101-202,20020,1311878322145; 2 (online=2, exclude=null) available > servers > 2011-07-28 22:28:37,704 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > .META.,,1.1028785192 to 158-1-101-202,20020,1311878322145 > 2011-07-28 22:28:37,733 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=M_ZK_REGION_OFFLINE, server=158-1-101-82:20000, > region=1028785192/.META. > 2011-07-28 22:28:37,766 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, server=158-1-101-202,20020,1311878322145, > region=1028785192/.META. > > > > Region server logs: > 2011-07-28 22:19:17,389 DEBUG > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested > for ufdr7,,1311890810267.c1b556627c511076bafbb1b802589cb6. because > regionserver20020.cacheFlusher; priority=4, compaction queue size=1 > > // blocked for a long time. > 2011-07-28 22:28:24,829 INFO org.apache.zookeeper.ClientCnxn: Client session > timed out, have not heard from server in 552455ms for sessionid > 0x13171e103d7003e, closing socket connection and attempting reconnect > 2011-07-28 22:28:24,829 INFO org.apache.zookeeper.ClientCnxn: Client session > timed out, have not heard from server in 552455ms for sessionid > 0x23171e103d10020, closing socket connection and attempting reconnect > 2011-07-28 22:28:25,186 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server /158.1.101.222:2181 > 2011-07-28 22:28:25,838 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server /158.1.101.82:2181 >
