2015-11-29 00:48:30,521 DEBUG 2015-11-29 00:49:46,912 WARN Seems there is no log entry for about 76 seconds so I think there is a Full GC. Have you guy turned on gc log? Please check the gc log file(maybe xxx.out) to see if there is a long-paused GC.
And what is your GC options along with G1 garbage collector? I used to meet a problem called "Humongous Objects" which causes a bad impact on the performance of G1 GC. So if you guys are not expert of G1 GC, I suggest you switch back to CMS in the production environment first. Thanks. 2015-11-30 18:42 GMT+08:00 吴国泉 <[email protected]>: > hi: > I also met this problem, cong,pu is my partner, > hbase regionserver is configured with 92G RAM, > sometimes the regionserver would crash,i checked the log, but there > was no GC happened ,that is very strange, > here is part of the regionserver log: > > > > 2015-11-29 00:48:30,521 DEBUG > [regionserver60020-smallCompactions-1447234955800] compactions.Compactor: > Compacting > hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/b993875f0f384597900d2a2d3e69fde8, > keycount=136626, bloomtype=ROW, size=49.6 M, encoding=NONE, seqNum=22891 > 2015-11-29 00:49:46,912 WARN [regionserver60020.periodicFlusher] > util.Sleeper: We slept 71814ms instead of 10000ms, this is likely due to a > long garbage collecting pause and it's usually bad, see > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > 2015-11-29 00:49:46,912 INFO [SplitLogWorker-l-hbase27.dba.cn2.qunar.com > ,60020,1447234715888-SendThread(l-zk2.dba.cn6.qunar.com:2181)] > zookeeper.ClientCnxn: Client session timed out, have not heard from server > in 77297ms for sessionid 0x150ad3cf038d6e6, closing socket connection and > attempting reconnect > 2015-11-29 00:49:46,912 WARN [regionserver60020.compactionChecker] > util.Sleeper: We slept 74029ms instead of 10000ms, this is likely due to a > long garbage collecting pause and it's usually bad, see > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > 2015-11-29 00:49:46,912 INFO [regionserver60020-SendThread( > l-zk1.dba.cn6.qunar.com:2181)] zookeeper.ClientCnxn: Client session timed > out, have not heard from server in 74953ms for sessionid 0x34f16f641cc2e77, > closing socket connection and attempting reconnect > 2015-11-29 00:49:46,913 WARN [JvmPauseMonitor] util.JvmPauseMonitor: > Detected pause in JVM or host machine (eg GC): pause of approximately > 70587ms > No GCs detected > 2015-11-29 00:49:46,913 WARN [regionserver60020] util.Sleeper: We slept > 72658ms instead of 3000ms, this is likely due to a long garbage collecting > pause and it's usually bad, see > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > 2015-11-29 00:49:46,913 INFO [regionserver60020-SendThread( > l-zk4.dba.cn8.qunar.com:2181)] zookeeper.ClientCnxn: Client session timed > out, have not heard from server in 81433ms for sessionid 0x54f16f4aa3363fa, > closing socket connection and attempting reconnect > 2015-11-29 00:49:46,913 INFO [regionserver60020-SendThread( > l-zk2.dba.cn6.qunar.com:2181)] zookeeper.ClientCnxn: Client session timed > out, have not heard from server in 79798ms for sessionid 0x54f16f4aa3363fe, > closing socket connection and attempting reconnect > 2015-11-29 00:49:46,922 WARN [ResponseProcessor for block > BP-329468547-10.90.18.19-1444480193809:blk_1078012346_4271549] > hdfs.DFSClient: Slow ReadProcessor read fields took 72827ms > (threshold=30000ms); ack: seqno: -2 status: SUCCESS status: ERROR > downstreamAckTimeNanos: 0, targets: [10.90.18.47:50010, 10.90.18.60:50010, > 10.90.18.63:50010] > 2015-11-29 00:49:46,921 WARN [ResponseProcessor for block > BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823] > hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block > BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823 > java.io.EOFException: Premature EOF: no length prefix available > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2103) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:810) > > > as we can see , the log shows that :"We slept 74029ms instead of > 10000ms, this is likely due to a long garbage collecting pause ",but > following :" it No GCs deteted ", > this confused me. Can you give me some advice? > I use java7, G1 garbage collector, > > > > On 2015年11月29日 08:44, 聪聪 wrote: > >> hi,all: >> This morning regionserver shutdown.By analyzing the log, I guess it's >> about GC.Then,How to optimize the GC in HBase. >> I use the HBase version is hbase-0.98.6-cdh5.2.0.I look for your help in >> this. >> >> >> JVM configuration is as follows: >> export HBASE_REGIONSERVER_OPTS="-Xms92g -Xmx92g -XX:PermSize=256m >> -XX:MaxPermSize=256m -XX:+UseG1GC -server -XX:+DisableExplicitGC >> -XX:+UseFastAccessorMethods -XX:SoftRefLRUPolicyMSPerMB=0 >> -XX:G1ReservePercent=15 -XX:InitiatingHeapOccupancyPercent=40 >> -XX:ConcGCThreads=18 -XX:+ParallelRefProcEnabled -XX:-ResizePLAB >> -XX:ParallelGCThreads=18 -XX:+PrintClassHistogram -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC >> -Xloggc:/home/q/hbase/hbase-0.98.6-cdh5.2.0/logs/gc-$(hostname)-hbase.log" >> >> >> Regionserver on the log is as follows: >> 2015-11-29 00:48:30,521 DEBUG >> [regionserver60020-smallCompactions-1447234955800] compactions.Compactor: >> Compacting >> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/2becc4de83bc4102802a4f69adfba433, >> keycount=1244806, bloomtype=ROW, size=462.2 M, encoding=NONE, seqNum=22878 >> 2015-11-29 00:48:30,521 DEBUG >> [regionserver60020-smallCompactions-1447234955800] compactions.Compactor: >> Compacting >> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/52669d6cf3a34dc89af6faf23ea03fa0, >> keycount=613700, bloomtype=ROW, size=236.4 M, encoding=NONE, seqNum=22885 >> 2015-11-29 00:48:30,521 DEBUG >> [regionserver60020-smallCompactions-1447234955800] compactions.Compactor: >> Compacting >> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/1ff32b168134412ea3f6942126623ef6, >> keycount=373741, bloomtype=ROW, size=143.9 M, encoding=NONE, seqNum=22889 >> 2015-11-29 00:48:30,521 DEBUG >> [regionserver60020-smallCompactions-1447234955800] compactions.Compactor: >> Compacting >> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/b993875f0f384597900d2a2d3e69fde8, >> keycount=136626, bloomtype=ROW, size=49.6 M, encoding=NONE, seqNum=22891 >> 2015-11-29 00:49:46,912 WARN [regionserver60020.periodicFlusher] >> util.Sleeper: We slept 71814ms instead of 10000ms, this is likely due to a >> long garbage collecting pause and it's usually bad, see >> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> 2015-11-29 00:49:46,912 INFO [SplitLogWorker-l-hbase27.dba.cn2.qunar.com >> ,60020,1447234715888-SendThread(l-zk2.dba.cn6.qunar.com:2181)] >> zookeeper.ClientCnxn: Client session timed out, have not heard from server >> in 77297ms for sessionid 0x150ad3cf038d6e6, closing socket connection and >> attempting reconnect >> 2015-11-29 00:49:46,912 WARN [regionserver60020.compactionChecker] >> util.Sleeper: We slept 74029ms instead of 10000ms, this is likely due to a >> long garbage collecting pause and it's usually bad, see >> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> 2015-11-29 00:49:46,912 INFO [regionserver60020-SendThread( >> l-zk1.dba.cn6.qunar.com:2181)] zookeeper.ClientCnxn: Client session >> timed out, have not heard from server in 74953ms for sessionid >> 0x34f16f641cc2e77, closing socket connection and attempting reconnect >> 2015-11-29 00:49:46,913 WARN [JvmPauseMonitor] util.JvmPauseMonitor: >> Detected pause in JVM or host machine (eg GC): pause of approximately >> 70587ms >> No GCs detected >> 2015-11-29 00:49:46,913 WARN [regionserver60020] util.Sleeper: We slept >> 72658ms instead of 3000ms, this is likely due to a long garbage collecting >> pause and it's usually bad, see >> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired >> 2015-11-29 00:49:46,913 INFO [regionserver60020-SendThread( >> l-zk4.dba.cn8.qunar.com:2181)] zookeeper.ClientCnxn: Client session >> timed out, have not heard from server in 81433ms for sessionid >> 0x54f16f4aa3363fa, closing socket connection and attempting reconnect >> 2015-11-29 00:49:46,913 INFO [regionserver60020-SendThread( >> l-zk2.dba.cn6.qunar.com:2181)] zookeeper.ClientCnxn: Client session >> timed out, have not heard from server in 79798ms for sessionid >> 0x54f16f4aa3363fe, closing socket connection and attempting reconnect >> 2015-11-29 00:49:46,922 WARN [ResponseProcessor for block >> BP-329468547-10.90.18.19-1444480193809:blk_1078012346_4271549] >> hdfs.DFSClient: Slow ReadProcessor read fields took 72827ms >> (threshold=30000ms); ack: seqno: -2 status: SUCCESS status: ERROR >> downstreamAckTimeNanos: 0, targets: [10.90.18.47:50010, 10.90.18.60:50010, >> 10.90.18.63:50010] >> 2015-11-29 00:49:46,921 WARN [ResponseProcessor for block >> BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823] >> hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block >> BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823 >> java.io.EOFException: Premature EOF: no length prefix available >> at >> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2103) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:810) >> 2015-11-29 00:49:46,914 DEBUG [LruStats #0] hfile.LruBlockCache: >> Total=20.98 MB, free=18.38 GB, max=18.40 GB, blocks=19756849152, >> accesses=564477317, hits=166122280, hitRatio=29.43%, , >> cachingAccesses=166131865, cachingHits=166015128, cachingHitsRatio=99.93%, >> evictions=149453, evicted=116562, evictedPerRun=0.7799240946769714 >> 2015-11-29 00:49:46,929 WARN [DataStreamer for file >> /hbase/tc_qtrace/WALs/l-hbase27.dba.cn2.qunar.com,60020,1447234715888/ >> l-hbase27.dba.cn2.qunar.com%2C60020%2C1447234715888.1448726130837.meta >> block BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823] >> hdfs.DFSClient: Error Recovery for block >> BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823 in pipeline >> 10.90.18.47:50010, 10.90.18.61:50010, 10.90.18.53:50010: bad datanode >> 10.90.18.47:50010 >> >> >> 安全提示:本邮件非QUNAR内部邮件,请注意保护个人及公司信息安全,如有索取帐号密码等可疑情况请向 secteam发送邮件 >> > > -- > *吴国泉 wgq.wu* > ---------------------------------------------------- > *Position*:技术部DBA-HBase > *Email*:[email protected] > *Mob*:13036134151 > *Location*:北京市海淀区丹棱街3号中国电子大厦 1708 > *Post Code*:100080 >
