Usually that happens when the region server is considered dead and the
master moves the logs away, there should be clues in the master log
and probably more relevant info in the region server log either after
or before what you pasted.

J-D

2011/6/8 Gaojinchao <[email protected]>:
> Two regionservers(My cluster is 7 regionsever / datanode) crashed, saying 
> that an file didn't not exist,
> and that a lease has expired (log detail below). Tried to find in this 
> mailing list. It seems different:
>
> Hbase version: 0.90.3
> HDFS version: cloudera 0.20.2+320
>
> OS: swappiness :0 and ulimit :600000
> HFDS:  dfs.datanode.max.xcievers: 2047
>
> I didn't see any Xciever count exceeded message.
>
> The cluster run normally before I modified some parameters.
> Parameters:
> Heap size 8G -> 10G
> Hfile block 64k -> 640k( Our cluster uses gz )
>
> Should I make the timeout to 0 or bigger ?
>
> 2011-06-02 19:27:11,666 WARN 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region 
> ufdr,11050,1306570494360.1caa8cf34787ccf12495bf7828e0e11c. has too many store 
> files; delaying flush up to 90000ms
> 2011-06-02 19:27:11,996 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
> Block cache LRU eviction started; Attempting to free 153.29 MB of total=1.27 
> GB
> 2011-06-02 19:27:12,000 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
> Block cache LRU eviction completed; freed=153.77 MB, total=1.12 GB, 
> single=714.14 MB, multi=576.17 MB, memory=0 KB
> 2011-06-02 19:27:13,940 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
> Block cache LRU eviction started; Attempting to free 153.37 MB of total=1.27 
> GB
> 2011-06-02 19:27:13,943 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
> Block cache LRU eviction completed; freed=153.71 MB, total=1.12 GB, 
> single=712.98 MB, multi=576.79 MB, memory=0 KB
> 2011-06-02 19:27:15,937 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
> Block cache LRU eviction started; Attempting to free 153.52 MB of total=1.27 
> GB
> 2011-06-02 19:27:15,940 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
> Block cache LRU eviction completed; freed=153.9 MB, total=1.12 GB, 
> single=714.37 MB, multi=576.17 MB, memory=0 KB
> 2011-06-02 19:27:18,870 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
> Block cache LRU eviction started; Attempting to free 153.48 MB of total=1.27 
> GB
> 2011-06-02 19:27:18,873 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
> Block cache LRU eviction completed; freed=153.76 MB, total=1.12 GB, 
> single=716.21 MB, multi=574.92 MB, memory=0 KB
> 2011-06-02 19:27:20,087 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Flush requested on ufdr,11006,1306570494359.8d605fcdef79e342a8626062bf046a14.
> 2011-06-02 19:27:20,087 WARN 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region 
> ufdr,11006,1306570494359.8d605fcdef79e342a8626062bf046a14. has too many store 
> files; delaying flush up to 90000ms
> 2011-06-02 19:27:20,619 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
> Block cache LRU eviction started; Attempting to free 153.58 MB of total=1.27 
> GB
> 2011-06-02 19:27:20,621 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
> Block cache LRU eviction completed; freed=153.84 MB, total=1.12 GB, 
> single=711.31 MB, multi=578.67 MB, memory=0 KB
> 2011-06-02 19:27:22,152 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
> Block cache LRU eviction started; Attempting to free 153.6 MB of total=1.27 GB
> 2011-06-02 19:27:22,155 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
> Block cache LRU eviction completed; freed=153.86 MB, total=1.12 GB, 
> single=714.87 MB, multi=575.12 MB, memory=0 KB
> 2011-06-02 19:27:23,021 INFO 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs 
> -- HDFS-200
> 2011-06-02 19:27:23,089 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> serverName=c3s6.site,60020,1306570384166, load=(requests=192741, regions=434, 
> usedHeap=4924, maxHeap=10213): IOE in log roller
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: 
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
> /hbase/.logs/c3s6.site,60020,1306570384166/c3s6.site%3A60020.1307014000772 
> File does not exist. [Lease.  Holder: 
> DFSClient_hb_rs_c3s6.site,60020,1306570384166_1306570388616, pendingcreates: 
> 2]
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1378)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1369)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1424)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1412)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:491)
>       at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>       at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
>       at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>       at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>       at 
> org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:104)
> 2011-06-02 19:27:23,090 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
> requests=63742, regions=434, stores=434, storefiles=1226, 
> storefileIndexSize=135, memstoreSize=1077, compactionQueueSize=241, 
> flushQueueSize=4, usedHeap=4937, maxHeap=10213, blockCacheSize=1232047432, 
> blockCacheFree=374318648, blockCacheCount=1867, blockCacheHitCount=95411305, 
> blockCacheMissCount=12524075, blockCacheEvictedCount=6657895, 
> blockCacheHitRatio=88, blockCacheHitCachingRatio=93
> 2011-06-02 19:27:23,090 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: IOE in log roller
> 2011-06-02 19:27:2
>

Reply via email to