Hi All, I was just wondering if anyone had a chance too look at this issue, or experienced something similar. I'm stuck with this and would appreciate any hint. Many thanks, Martin
On Dec 27, 2010, at 11:27 AM, Martin Arnandze wrote: > Hi, > I'm using hbase 0.20.5 and hadoop 0.20.1. Some region servers are crashing, > saying that an file cannot be found, and that a lease has expired (log detail > below). Tried to find in this mailing list for the exact problem but was not > successful. These are the symptoms: > > - Typically I see higher used swap on these servers than on the rest (the > ones that didn't crash). > - I don't see any Xciever count exceeded message on the DataNode logs. > - Please find log snippets as well as configuration settings below. > - We don't write to WALs > > I would greatly appreciate any hint on this regard. > > Many thanks, > Martin > > ====================== > > These are some of my settings which I think might be relevant: > > hdfs-site: > <property> > <name>dfs.datanode.max.xcievers</name> > <value>10000</value> > </property> > > hbase-site: > > <property> > <name>hbase.regionserver.handler.count</name> > <value>50</value> > </property> > > <property> > <name>hbase.regionserver.global.memstore.upperLimit</name> > <value>0.3</value> > </property> > <property> > <name>hfile.block.cache.size</name> > <value>0.3</value> > </property> > > > ulimit -n > 100000 > > > LOGS: > > Region Server: > > 2010-12-27 08:35:31,932 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Finished snapshotting, commencing flushing stores > 2010-12-27 08:35:32,426 INFO org.apache.hadoop.hbase.regionserver.Store: > Added > hdfs://ls-nn05.netw.domain01.net:50001/hbase/usertrails_all/1777872268/d/2971304353461036329, > entries= > 22319, sequenceid=1717292326, memsize=16.1m, filesize=13.5m to > usertrails_all,1b005e71-cd19-4d82-87a3-3fa269f4f1d1\x019223370750718307807\x015692970,1292964088406 > 2010-12-27 08:35:32,426 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Caches flushed, doing commit now (which includes update scanners) > 2010-12-27 08:35:32,426 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Finished memstore flush of ~16.1m for region > usertrails_all,1b005e71-cd19-4d82-87a3-3fa269f4f1d1\x019223370750718307807\x015692970,1292964088406 > in 494ms, sequence id=1717292326, compaction requested=false > 2010-12-27 08:35:32,427 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Updates disabled for region, no outstanding scanners on > usertrails_all,1b005e71-cd19-4d82-87a3-3fa269f4 > f1d1\x019223370750718307807\x015692970,1292964088406 > 2010-12-27 08:35:32,427 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > No more row locks outstanding on region > usertrails_all,1b005e71-cd19-4d82-87a3-3fa269f4f1d1\x0192233707 > 50718307807\x015692970,1292964088406 > 2010-12-27 08:35:32,427 DEBUG org.apache.hadoop.hbase.regionserver.Store: > closed d > 2010-12-27 08:35:32,427 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Closed > usertrails_all,1b005e71-cd19-4d82-87a3-3fa269f4f1d1\x019223370750718307807\x015692970,12929640884 > 06 > 2010-12-27 08:35:32,427 DEBUG org.apache.hadoop.hbase.regionserver.HLog: > closing hlog writer in > hdfs://ls-nn05.netw.domain01.net:50001/hbase/.logs/ls-hdfs24.netw.domain01.net,60020,12 > 92875777337 > 2010-12-27 08:35:32,497 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer > Exception: org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredEx > ception: No lease on > /hbase/.logs/ls-hdfs24.netw.domain01.net,60020,1292875777337/hlog.dat.1293458980959 > File does not exist. Holder DFSClient_52782511 does not have any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1328) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1319) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1247) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:434) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) at > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) > at org.apache.hadoop.ipc.Client.call(Client.java:739) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > at $Proxy1.addBlock(Unknown Source) > at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy1.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2906) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2788) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2078) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2264) > > 2010-12-27 08:35:32,497 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery > for block null bad datanode[0] nodes == null2010-12-27 08:35:32,497 WARN > org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file > "/hbase/.logs/ls-hdfs24.netw.domain01.net,60020,1292875777337/hlog.dat.1293 > 458980959" - Aborting... > 2010-12-27 08:35:32,499 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: Close and delete > failedorg.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on > /hbase/.logs/ls-hdfs24.netw.domain01.net,6002 > 0,1292875777337/hlog.dat.1293458980959 File does not exist. Holder > DFSClient_52782511 does not have any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1328) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1319) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1247) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:434) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94) > at > org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:626) > at java.lang.Thread.run(Thread.java:619)2010-12-27 08:35:32,500 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that > region server is shutting down at: > ls-hdfs24.netw.domain01.net,60020,12928757773 > 37 > 2010-12-27 08:35:32,508 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: > ls-hdfs24.netw.domain01.net,60020,1292875777337 > 2010-12-27 08:35:32,509 INFO org.apache.zookeeper.ZooKeeper: Closing session: > 0x2d05625f9b0008 > 2010-12-27 08:35:32,509 INFO org.apache.zookeeper.ClientCnxn: Closing > ClientCnxn for session: 0x2d05625f9b0008 > 2010-12-27 08:35:32,512 INFO org.apache.zookeeper.ClientCnxn: Disconnecting > ClientCnxn for session: 0x2d05625f9b0008 > 2010-12-27 08:35:32,512 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x2d05625f9b0008 closed > 2010-12-27 08:35:32,512 DEBUG > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection with > ZooKeeper > 2010-12-27 08:35:32,512 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2010-12-27 08:35:32,614 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > regionserver/10.32.20.70:60020 exiting > 2010-12-27 08:35:49,448 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner > -4947199215968218273 lease expired > 2010-12-27 08:35:49,538 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner > 3282292946771297386 lease expired > 2010-12-27 08:35:49,582 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner > 7882165461457343907 lease expired > 2010-12-27 08:35:49,582 INFO org.apache.hadoop.hbase.Leases: > regionserver/10.32.20.70:60020.leaseChecker closing leases > 2010-12-27 08:35:49,582 INFO org.apache.hadoop.hbase.Leases: > regionserver/10.32.20.70:60020.leaseChecker closed leases > 2010-12-27 08:35:49,607 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread. > 2010-12-27 08:35:49,607 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete > > > > > Namenode: > > > 2010-12-27 08:35:32,427 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 7 on 50001, call > addBlock(/hbase/.logs/ls-hdfs24.netw.domain01.net,60020,1292875777337/hlog.dat.1293458980959, > DFSClient_52782511) from 10.32.20.70:44967: error: > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on > /hbase/.logs/ls-hdfs24.netw.domain01.net,60020,1292875777337/hlog.dat.1293458980959 > File does not exist. Holder DFSClient_52782511 does not have any open files. > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on > /hbase/.logs/ls-hdfs24.netw.domain01.net,60020,1292875777337/hlog.dat.1293458980959 > File does not exist. Holder DFSClient_52782511 does not have any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1328) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1319) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1247) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:434) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) > > > > Datanode: > > Not too unusual. Some entries like the following: > > 2010-12-27 08:35:52,519 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.32.20.70:50010, > storageID=DS-2137188581-173.192.229.228-50010-1273698157328, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:168) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at java.io.DataInputStream.read(DataInputStream.java:132) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:261) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) > at java.lang.Thread.run(Thread.java:619) > 2010-12-27 09:35:52,728 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > writeBlock blk_-4708329089501359494_30620223 received exception > java.net.SocketException: Connection reset > >
