Ah so 100-13 was the .META. holder but got a socket timeout trying to talk to 100-9 which was the previous .META. holder.
It means that when it goes to do a put at 09:15:44,232 it must be contacting that same region server since the location wasn't updated (the first call failed on socket timeout). Can you check what's going on with 100-9 during that time and if it's really shutting down? J-D On Tue, Jun 7, 2011 at 9:23 AM, bijieshan <[email protected]> wrote: > Thanks J-D. > Sorry for a long time break of the reply! > I check the logs of the .META. regionserver, it's indeed the problem like you > described. > But I found another problem. > > The .META. Region has changed it's address, but last for a long time, > CatalogTracker still cache the old address. > So while another regionserver(not the .META. regionserver) split region, it > will send IPC request to put, this will execute in the old regionserver. > >>>From HMaster, we can see that at 09:02:34, the region has been opened in >>>100-13: > > 2011-05-25 08:37:03,908 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region > .META.,,1.1028785192 on 157-5-100-9,20020,1306257984044 > 2011-05-25 09:02:34,334 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region > .META.,,1.1028785192 on 157-5-100-13,20020,1306266506022 > 2011-05-25 09:15:57,649 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: > Failed verification of .META.,,1 at address=157-5-100-9:20020; > java.io.EOFException > 2011-05-25 09:15:57,649 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: > Current cached META location is not valid, resetting > >>>From RegionServer 100-13 and at 09:15:44, we can see the .META. address >>>cached in CatalogTracker was still 100-9 > > 2011-05-25 09:15:44,232 INFO > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running of failed > split of > ufdr,0065286138106876#4228000,1306260358978.37875b35a870957da534ad29fd2944d5.; > java.io.IOException: Server not running, aborting > at > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2352) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1653) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > 2011-05-25 09:15:44,232 WARN > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Exception > running postOpenDeployTasks; region=11dc72d94c7a5a3d19b0c0c3c49624a5 > java.io.IOException: Call to 157-5-100-9/157.5.100.9:20020 failed on local > exception: java.nio.channels.ClosedByInterruptException > at > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:806) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) > at $Proxy8.getRegionInfo(Unknown Source) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:424) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:272) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:331) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:364) > at > org.apache.hadoop.hbase.catalog.MetaEditor.updateRegionLocation(MetaEditor.java:142) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:1354) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:215) > Caused by: java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:143) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at java.io.DataOutputStream.flush(DataOutputStream.java:106) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:518) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751) > ... 9 more > > > Is it a bug here? I think the cached info should be invalid and reset after > the .META. address has been changed immediately , but it's not. > > Thanks! > > Jieshan Bean > > >
