Grep the missing file in the namenode log and see if you can figure from mentions therein what happend with this file. Had the master taken it from you because it was processing server crash?
St.Ack 2011/6/8 Gaojinchao <[email protected]>: > Two regionservers(My cluster is 7 regionsever / datanode) crashed, saying > that an file didn't not exist, > and that a lease has expired (log detail below). Tried to find in this > mailing list. It seems different: > > Hbase version: 0.90.3 > HDFS version: cloudera 0.20.2+320 > > OS: swappiness :0 and ulimit :600000 > HFDS: dfs.datanode.max.xcievers: 2047 > > I didn't see any Xciever count exceeded message. > > The cluster run normally before I modified some parameters. > Parameters: > Heap size 8G -> 10G > Hfile block 64k -> 640k( Our cluster uses gz ) > > Should I make the timeout to 0 or bigger ? > > 2011-06-02 19:27:11,666 WARN > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region > ufdr,11050,1306570494360.1caa8cf34787ccf12495bf7828e0e11c. has too many store > files; delaying flush up to 90000ms > 2011-06-02 19:27:11,996 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Block cache LRU eviction started; Attempting to free 153.29 MB of total=1.27 > GB > 2011-06-02 19:27:12,000 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Block cache LRU eviction completed; freed=153.77 MB, total=1.12 GB, > single=714.14 MB, multi=576.17 MB, memory=0 KB > 2011-06-02 19:27:13,940 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Block cache LRU eviction started; Attempting to free 153.37 MB of total=1.27 > GB > 2011-06-02 19:27:13,943 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Block cache LRU eviction completed; freed=153.71 MB, total=1.12 GB, > single=712.98 MB, multi=576.79 MB, memory=0 KB > 2011-06-02 19:27:15,937 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Block cache LRU eviction started; Attempting to free 153.52 MB of total=1.27 > GB > 2011-06-02 19:27:15,940 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Block cache LRU eviction completed; freed=153.9 MB, total=1.12 GB, > single=714.37 MB, multi=576.17 MB, memory=0 KB > 2011-06-02 19:27:18,870 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Block cache LRU eviction started; Attempting to free 153.48 MB of total=1.27 > GB > 2011-06-02 19:27:18,873 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Block cache LRU eviction completed; freed=153.76 MB, total=1.12 GB, > single=716.21 MB, multi=574.92 MB, memory=0 KB > 2011-06-02 19:27:20,087 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Flush requested on ufdr,11006,1306570494359.8d605fcdef79e342a8626062bf046a14. > 2011-06-02 19:27:20,087 WARN > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region > ufdr,11006,1306570494359.8d605fcdef79e342a8626062bf046a14. has too many store > files; delaying flush up to 90000ms > 2011-06-02 19:27:20,619 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Block cache LRU eviction started; Attempting to free 153.58 MB of total=1.27 > GB > 2011-06-02 19:27:20,621 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Block cache LRU eviction completed; freed=153.84 MB, total=1.12 GB, > single=711.31 MB, multi=578.67 MB, memory=0 KB > 2011-06-02 19:27:22,152 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Block cache LRU eviction started; Attempting to free 153.6 MB of total=1.27 GB > 2011-06-02 19:27:22,155 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: > Block cache LRU eviction completed; freed=153.86 MB, total=1.12 GB, > single=714.87 MB, multi=575.12 MB, memory=0 KB > 2011-06-02 19:27:23,021 INFO > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs > -- HDFS-200 > 2011-06-02 19:27:23,089 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > serverName=c3s6.site,60020,1306570384166, load=(requests=192741, regions=434, > usedHeap=4924, maxHeap=10213): IOE in log roller > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on > /hbase/.logs/c3s6.site,60020,1306570384166/c3s6.site%3A60020.1307014000772 > File does not exist. [Lease. Holder: > DFSClient_hb_rs_c3s6.site,60020,1306570384166_1306570388616, pendingcreates: > 2] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1378) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1369) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1424) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1412) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:491) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96) > at > org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48) > at > org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66) > at > org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:104) > 2011-06-02 19:27:23,090 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: > requests=63742, regions=434, stores=434, storefiles=1226, > storefileIndexSize=135, memstoreSize=1077, compactionQueueSize=241, > flushQueueSize=4, usedHeap=4937, maxHeap=10213, blockCacheSize=1232047432, > blockCacheFree=374318648, blockCacheCount=1867, blockCacheHitCount=95411305, > blockCacheMissCount=12524075, blockCacheEvictedCount=6657895, > blockCacheHitRatio=88, blockCacheHitCachingRatio=93 > 2011-06-02 19:27:23,090 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: IOE in log roller > 2011-06-02 19:27:2 >
