How much heap did you give to region server ? Can you show us log snippet shortly before 12:12:08 ?
Which HBase version were you using ? On Aug 13, 2013, at 6:33 PM, 李佳 <[email protected]> wrote: > Hi , Devs/Users ; > Recently I use HBase API to insert big data into hbase;It's about 77G and > my cluster has one hbase-master,two hbase-regionservers ; > When the program executes a period of time, the regionserver automaticlly > shutdown. And I restart regionservers , > but this thing happens again . > from the regionserver log : > > > 2013-08-13 12:12:08,983 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer > abort: loaded coprocessors are: > [org.apache.hadoop.hbase.coprocessor.AggregateImplementation, > com.zsmar.hbase.query.rowkey.RowKey3Endpoint] > 2013-08-13 12:12:08,988 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: > requestsPerSecond=49507, numberOfOnlineRegions=2260, > numberOfStores=2260, numberOfStorefiles=2368, storefileIndexSizeMB=12, > rootIndexSizeKB=12786, totalStaticIndexSizeKB=466828, > totalStaticBloomSizeKB=54671, memstoreSizeMB=711, > mbInMemoryWithoutWAL=267, numberOfPutsWithoutWAL=2409813, > readRequestsCount=65095, writeRequestsCount=299008, > compactionQueueSize=92, flushQueueSize=0, usedHeapMB=1182, > maxHeapMB=1991, blockCacheSizeMB=54.35, blockCacheFreeMB=443.57, > blockCacheCount=5779, blockCacheHitCount=1455353, > blockCacheMissCount=240305, blockCacheEvictedCount=49, > blockCacheHitRatio=85%, blockCacheHitCachingRatio=99%, > hdfsBlocksLocalityIndex=75, slowHLogAppendCount=0, > fsReadLatencyHistogramMean=0, fsReadLatencyHistogramCount=0, > fsReadLatencyHistogramMedian=0, fsReadLatencyHistogram75th=0, > fsReadLatencyHistogram95th=0, fsReadLatencyHistogram99th=0, > fsReadLatencyHistogram999th=0, fsPreadLatencyHistogramMean=0, > fsPreadLatencyHistogramCount=0, fsPreadLatencyHistogramMedian=0, > fsPreadLatencyHistogram75th=0, fsPreadLatencyHistogram95th=0, > fsPreadLatencyHistogram99th=0, fsPreadLatencyHistogram999th=0, > fsWriteLatencyHistogramMean=0, fsWriteLatencyHistogramCount=0, > fsWriteLatencyHistogramMedian=0, fsWriteLatencyHistogram75th=0, > fsWriteLatencyHistogram95th=0, fsWriteLatencyHistogram99th=0, > fsWriteLatencyHistogram999th=0 > 2013-08-13 12:12:08,991 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Replay of > HLog required. Forcing server shutdown > 2013-08-13 12:12:08,991 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Excluding > unflushable region > lbc_zte_1_nbr_index,436238C32A97DAB59D72E810C313CF4F100230,1376366743650.cf99f5047c85e155069c3970cdaf03c6. > - trying to find a different region to flush. > 2013-08-13 12:12:08,991 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region > lbc_zte_1_imei_index,3333332A,1376364729049.4469e6b0500bf3f5ed0ac1247d249537. > due to global heap pressure > 2013-08-13 12:12:08,991 INFO org.apache.hadoop.ipc.HBaseServer: > Stopping server on 60020 > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: REPL > IPC Server handler 2 on 60020: exiting > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 8 on 60020: exiting > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: PRI > IPC Server handler 1 on 60020: exiting > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: > Stopping IPC Server listener on 60020 > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: PRI > IPC Server handler 6 on 60020: exiting > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: PRI > IPC Server handler 2 on 60020: exiting > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: REPL > IPC Server handler 0 on 60020: exiting > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 9 on 60020: exiting > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: REPL > IPC Server handler 1 on 60020: exiting > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 6 on 60020: exiting > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: PRI > IPC Server handler 9 on 60020: exiting > 2013-08-13 12:12:08,993 INFO > org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt > to stop the worker thread > 2013-08-13 12:12:08,993 INFO org.apache.hadoop.ipc.HBaseServer: PRI > IPC Server handler 8 on 60020: exiting > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: PRI > IPC Server handler 3 on 60020: exiting > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 7 on 60020: exiting > 2013-08-13 12:12:08,992 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 2 on 60020: exiting > 2013-08-13 12:12:08,993 INFO org.apache.hadoop.ipc.HBaseServer: PRI > IPC Server handler 5 on 60020: exiting > 2013-08-13 12:12:08,993 INFO org.apache.hadoop.ipc.HBaseServer: PRI > IPC Server handler 0 on 60020: exiting > 2013-08-13 12:12:08,993 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 5 on 60020: exiting > 2013-08-13 12:12:08,993 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 3 on 60020: exiting > 2013-08-13 12:12:08,993 INFO org.apache.hadoop.ipc.HBaseServer: > Stopping IPC Server Responder > 2013-08-13 12:12:08,994 INFO org.apache.hadoop.ipc.HBaseServer: > Stopping IPC Server Responder > 2013-08-13 12:12:08,993 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 0 on 60020: exiting > 2013-08-13 12:12:08,993 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping > infoServer > 2013-08-13 12:12:08,993 INFO org.apache.hadoop.ipc.HBaseServer: PRI > IPC Server handler 7 on 60020: exiting > 2013-08-13 12:12:08,993 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 1 on 60020: exiting > 2013-08-13 12:12:08,993 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 4 on 60020: exiting > 2013-08-13 12:12:08,993 INFO > org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker > interrupted while waiting for task, exiting: > java.lang.InterruptedException > 2013-08-13 12:12:08,998 INFO > org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker > phd03.hadoop.audaque.com,60020,1376364509049 exiting > 2013-08-13 12:12:09,023 INFO org.mortbay.log: Stopped > [email protected]:60030 > 2013-08-13 12:12:09,029 WARN org.apache.hadoop.ipc.HBaseServer: IPC > Server Responder, call delete([B@69fba6f8 > , > {"ts":9223372036854775807,"totalColumns":2,"families":{"info":[{"timestamp":1376367128966,"qualifier":"splitA","vlen":0},{"timestamp":1376367128966,"qualifier":"splitB","vlen":0}]},"row":"lbc_zte_1,063AE3C37783FD39EE2142BEE43576C2200901,1376366676823.9b4e8b4ce35bf541fc1e9b5b77a22b62."}), > rpc version=1, client version=29, methodsFingerPrint=-56040613 from > 172.16.1.91:46113: output error > 2013-08-13 12:12:09,032 WARN org.apache.hadoop.ipc.HBaseServer: PRI > IPC Server handler 4 on 60020 caught a ClosedChannelException, this > means that the server was processing a request but the client went > away. The error message was: null > 2013-08-13 12:12:09,032 INFO org.apache.hadoop.ipc.HBaseServer: PRI > IPC Server handler 4 on 60020: exiting > 2013-08-13 12:12:09,038 WARN > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > Failed all from region=.META.,,1.1028785192, hostname= > phd03.hadoop.audaque.com, port=60020 > java.util.concurrent.ExecutionException: java.io.IOException: Call to > phd03.hadoop.audaque.com/172.16.1.93:60020 > failed on local exception: java.io.EOFException > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1544) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1396) > at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:918) > at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:774) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:749) > at org.apache.hadoop.hbase.catalog.MetaEditor.put(MetaEditor.java:99) > at > org.apache.hadoop.hbase.catalog.MetaEditor.putToMetaTable(MetaEditor.java:66) > at > org.apache.hadoop.hbase.catalog.MetaEditor.offlineParentInMeta(MetaEditor.java:188) > at > org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:327) > at > org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:457) > at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.IOException: Call to > phd03.hadoop.audaque.com/172.16.1.93:60020 > failed on local exception: java.io.EOFException > at > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1056) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1025) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) > at com.sun.proxy.$Proxy20.multi(Unknown Source) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1373) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1371) > at > org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1380) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1368) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > ... 3 more > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:672) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:606) > 2013-08-13 12:12:09,131 INFO > org.apache.hadoop.hbase.regionserver.StoreFile: Bloom filter type for > hdfs://phd01:8020/apps/hbase/data/lbc_zte_1_imei_index/4469e6b0500bf3f5ed0ac1247d249537/.tmp/e7bb489662344b26bc6de1e72c122eec: > ROW, CompoundBloomFilterWriter > 2013-08-13 12:12:09,131 INFO > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom > filter type for > hdfs://phd01:8020/apps/hbase/data/lbc_zte_1_imei_index/4469e6b0500bf3f5ed0ac1247d249537/.tmp/e7bb489662344b26bc6de1e72c122eec: > CompoundBloomFilterWriter > 2013-08-13 12:12:09,138 WARN org.apache.hadoop.hdfs.DFSClient: > DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /apps/hbase/data/lbc_zte_1_imei_index/4469e6b0500bf3f5ed0ac1247d249537/.tmp/e7bb489662344b26bc6de1e72c122eec > could only be replicated to 0 nodes instead of minReplication (=1). > There are 3 datanode(s) running and no node(s) are excluded in this > operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2369) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:469) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:300) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:45843) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1694) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1690) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1688) > at org.apache.hadoop.ipc.Client.call(Client.java:1164) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > at com.sun.proxy.$Proxy14.addBlock(Unknown Source) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > at com.sun.proxy.$Proxy14.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:288) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463) > 2013-08-13 12:12:09,140 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region > server > phd03.hadoop.audaque.com > ,60020,1376364509049: Replay of HLog required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > lbc_zte_1_imei_index,3333332A,1376364729049.4469e6b0500bf3f5ed0ac1247d249537. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351) > at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:202) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) > at java.lang.Thread.run(Thread.java:662) > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): > File > /apps/hbase/data/lbc_zte_1_imei_index/4469e6b0500bf3f5ed0ac1247d249537/.tmp/e7bb489662344b26bc6de1e72c122eec > could only be replicated to 0 nodes instead of minReplication (=1). > There are 3 datanode(s) running and no node(s) are excluded in this > operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2369) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:469) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:300) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:45843) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1694) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1690) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1688) > at org.apache.hadoop.ipc.Client.call(Client.java:1164) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > at com.sun.proxy.$Proxy14.addBlock(Unknown Source) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > at com.sun.proxy.$Proxy14.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:288) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463) > 2013-08-13 12:12:09,140 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer > abort: loaded coprocessors are: > [org.apache.hadoop.hbase.coprocessor.AggregateImplementation, > com.zsmar.hbase.query.rowkey.RowKey3Endpoint] > 2013-08-13 12:12:09,140 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: > requestsPerSecond=49507, numberOfOnlineRegions=2260, > numberOfStores=2260, numberOfStorefiles=2368, storefileIndexSizeMB=12, > rootIndexSizeKB=12786, totalStaticIndexSizeKB=466828, > totalStaticBloomSizeKB=54671, memstoreSizeMB=711, > mbInMemoryWithoutWAL=267, numberOfPutsWithoutWAL=2409813, > readRequestsCount=65095, writeRequestsCount=299008, > compactionQueueSize=92, flushQueueSize=0, usedHeapMB=1173, > maxHeapMB=1991, blockCacheSizeMB=54.35, blockCacheFreeMB=443.57, > blockCacheCount=5779, blockCacheHitCount=1455353, > blockCacheMissCount=240305, blockCacheEvictedCount=49, > blockCacheHitRatio=85%, blockCacheHitCachingRatio=99%, > hdfsBlocksLocalityIndex=75, slowHLogAppendCount=0, > fsReadLatencyHistogramMean=0, fsReadLatencyHistogramCount=0, > fsReadLatencyHistogramMedian=0, fsReadLatencyHistogram75th=0, > fsReadLatencyHistogram95th=0, fsReadLatencyHistogram99th=0, > fsReadLatencyHistogram999th=0, fsPreadLatencyHistogramMean=0, > fsPreadLatencyHistogramCount=0, fsPreadLatencyHistogramMedian=0, > fsPreadLatencyHistogram75th=0, fsPreadLatencyHistogram95th=0, > fsPreadLatencyHistogram99th=0, fsPreadLatencyHistogram999th=0, > fsWriteLatencyHistogramMean=0, fsWriteLatencyHistogramCount=0, > fsWriteLatencyHistogramMedian=0, fsWriteLatencyHistogram75th=0, > fsWriteLatencyHistogram95th=0, fsWriteLatencyHistogram99th=0, > fsWriteLatencyHistogram999th=0 > 2013-08-13 12:12:09,145 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Replay of > HLog required. Forcing server shutdown > 2013-08-13 12:12:09,146 INFO > org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. > > > could someone know the reason why it happens ? and give some mesages .
