Can any experts look at this? Similar posts are : 1. http://mail-archives.apache.org/mod_mbox/hbase-user/201406.mbox/%3C001001cf863e$c81a60e0$584f22a0$@com%3E 2. https://issues.apache.org/jira/browse/HBASE-11306 "Client connection starvation issues under high load on Amazon EC2" 3. https://issues.apache.org/jira/browse/HBASE-11277
Regards, -- Yu Ming 2014-07-10 16:19 GMT+08:00 李玉明 <[email protected]>: > Hi, > > The HBase version is 0.9.6. I experienced the HBase region servers > refuse connection problem. Could anyone please help? Thank you in > advance. > > Summary of the problem. > > 1. Several Region Servers refuse service. The Requests Per Second become 0. > > 2. The application client can't connect to the Region Server. Even > with the nc simple linux command, the connection is refused. > For example: nc 10.207.27.41 8420 > > 3. Even restart the HBase cluster, the service can't recover. > > 4. Snippet of some log at the application client : > > 2014-07-10 16:03:51[htable-pool20-t13:2931892] - [INFO] #3541, > table=monitor-data, attempt=702/1 failed 117 ops, last exception: > org.apache.hadoop.hbase.ipc.RpcClient$ailedServerException: This > server is in the failed servers list: > nz-cloudera1.xxx.com/10.207.27.41:8420 on > nz-cloudera1.xxx.com,8420,1404973262139, tracking started Thu Jul 10 > 15:16:19 CST 2014, retrying after 4034 ms, replay 117 ops. > 2014-07-10 16:03:51[htable-pool27-t45:2931892] - [INFO] #5443, > table=monitor-data, attempt=694/1 failed 3 ops, last exception: > org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This > server is in the failed servers list: > nz-cloudera67.xxx.com/10.208.244.41:8420 on > nz-cloudera67.xxx.com,8420,1404973261815, tracking started Thu Jul 10 > 15:17:02 CST 2014, retrying after 4028 ms, replay 3 ops. > > 5. Snippet of some log at the Region Server: it does > periodicFlusher and compaction again and again. > > 2014-07-10 15:26:36,508 INFO > org.apache.hadoop.hbase.regionserver.HStore: Completed major > compaction of 6 file(s) in t of > monitor-data,ig\x01RdB\xCD\x1CS\xAA\xF2\x00,1403772336393.73dba0c6346574f324d79e976db64def. > into 493864bb386042099e1ef6be1b9770b2(size=4.4 G), total size for > store is 4.4 G. This selection was in queue for 0sec, and took 5mins, > 7sec to execute. > 2014-07-10 15:26:36,508 INFO > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed > compaction: Request = > regionName=monitor-data,ig\x01RdB\xCD\x1CS\xAA\xF2\x00,1403772336393.73dba0c6346574f324d79e976db64def., > storeName=t, fileCount=6, fileSize=4.4 G, priority=44, > time=1397522352382230; duration=5mins, 7sec > 2014-07-10 15:26:36,509 INFO > org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on t > in region monitor-data,\x12\xDB\x13?dB\xCD > S\xAA\xF2\x00,1403772319221.8a56fad579ebdefc2cb6622d715fe7a6. > 2014-07-10 15:26:36,509 INFO > org.apache.hadoop.hbase.regionserver.HStore: Starting compaction of 5 > file(s) in t of monitor-data,\x12\xDB\x13?dB\xCD > S\xAA\xF2\x00,1403772319221.8a56fad579ebdefc2cb6622d715fe7a6. into > tmpdir=hdfs://nz-cloudera-namenode.xxx.com:8020/hbase/data/default/monitor-data/8a56fad579ebdefc2cb6622d715fe7a6/.tmp, > totalSize=4.9 G > 2014-07-10 15:32:17,359 INFO > org.apache.hadoop.hbase.regionserver.HStore: Completed major > compaction of 5 file(s) in t of monitor-data,\x12\xDB\x13?dB\xCD > S\xAA\xF2\x00,1403772319221.8a56fad579ebdefc2cb6622d715fe7a6. into > 3377dcd054244e42825227dc93d94bcb(size=4.9 G), total size for store is > 4.9 G. This selection was in queue for 0sec, and took 5mins, 40sec to > execute. > 2014-07-10 15:32:17,359 INFO > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed > compaction: Request = regionName=monitor-data,\x12\xDB\x13?dB\xCD > S\xAA\xF2\x00,1403772319221.8a56fad579ebdefc2cb6622d715fe7a6., > storeName=t, fileCount=5, fileSize=4.9 G, priority=45, > time=1397829741046386; duration=5mins, 40sec > 2014-07-10 15:43:34,272 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > regionserver8420.periodicFlusher requesting flush for region > monitor-meta,\x18:\x9B\xFB,1403771550932.f581188d3dd9d6d136888821a374de55. > after a delay of 14966 > 2014-07-10 15:43:44,272 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > regionserver8420.periodicFlusher requesting flush for region > monitor-meta,\x18:\x9B\xFB,1403771550932.f581188d3dd9d6d136888821a374de55. > after a delay of 16416 > 2014-07-10 15:43:49,240 WARN > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Couldn't find oldest > seqNum for the region we are about to flush: > [f581188d3dd9d6d136888821a374de55] > 2014-07-10 15:43:49,629 INFO > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, > sequenceid=63172, memsize=37.0 M, hasBloomFilter=true, into tmp file > hdfs://nz-cloudera-namenode.xxx.com:8020/hbase/data/default/monitor-meta/f581188d3dd9d6d136888821a374de55/.tmp/5d07d2aa3dba4e85ab48326846a90ba9 > 2014-07-10 15:43:49,639 INFO > org.apache.hadoop.hbase.regionserver.HStore: Added > hdfs://nz-cloudera-namenode.xxx.com:8020/hbase/data/default/monitor-meta/f581188d3dd9d6d136888821a374de55/t/5d07d2aa3dba4e85ab48326846a90ba9, > entries=158209, sequenceid=63172, filesize=3.4 M > 2014-07-10 15:43:49,640 INFO > org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush > of ~46.0 M/48218512, currentsize=0/0 for region > monitor-meta,\x18:\x9B\xFB,1403771550932.f581188d3dd9d6d136888821a374de55. > in 399ms, sequenceid=63172, compaction requested=false > 2014-07-10 15:45:14,273 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > regionserver8420.periodicFlusher requesting flush for region > monitor-meta,,?\xBC\xA9,1403771552560.0c9e4bd771a895e2ebe2c146809c82ce. > after a delay of 21673 > 2014-07-10 15:45:24,273 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > regionserver8420.periodicFlusher requesting flush for region > monitor-meta,,?\xBC\xA9,1403771552560.0c9e4bd771a895e2ebe2c146809c82ce. > after a delay of 10915 > 2014-07-10 15:45:34,273 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > regionserver8420.periodicFlusher requesting flush for region > monitor-meta,,?\xBC\xA9,1403771552560.0c9e4bd771a895e2ebe2c146809c82ce. > after a delay of 17862 > 2014-07-10 15:45:35,948 WARN > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Couldn't find oldest > seqNum for the region we are about to flush: > [0c9e4bd771a895e2ebe2c146809c82ce] > 2014-07-10 15:45:36,142 INFO > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, > sequenceid=63173, memsize=18.2 M, hasBloomFilter=true, into tmp file > hdfs://nz-cloudera-namenode.xxx.com:8020/hbase/data/default/monitor-meta/0c9e4bd771a895e2ebe2c146809c82ce/.tmp/0f3bba8b09ed41fbb21395fc9ae94e04 > 2014-07-10 15:45:36,152 INFO > org.apache.hadoop.hbase.regionserver.HStore: Added > hdfs://nz-cloudera-namenode.xxx.com:8020/hbase/data/default/monitor-meta/0c9e4bd771a895e2ebe2c146809c82ce/t/0f3bba8b09ed41fbb21395fc9ae94e04, > entries=78724, sequenceid=63173, filesize=1.7 M > 2014-07-10 15:45:36,152 INFO > org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush > of ~22.5 M/23614384, currentsize=0/0 for region > monitor-meta,,?\xBC\xA9,1403771552560.0c9e4bd771a895e2ebe2c146809c82ce. > in 204ms, sequenceid=63173, compaction requested=false > > -- > Vito Li
