Re: How to optimize the GC in HBase

张铎 Mon, 30 Nov 2015 03:06:08 -0800

2015-11-29 00:48:30,521 DEBUG
2015-11-29 00:49:46,912 WARN

Seems there is no log entry for about 76 seconds so I think there is a Full
GC. Have you guy turned on gc log? Please check the gc log file(maybe
xxx.out) to see if there is a long-paused GC.


And what is your GC options along with G1 garbage collector? I used to meet
a problem called "Humongous Objects" which causes a bad impact on the
performance of G1 GC. So if you guys are not expert of G1 GC, I suggest you
switch back to CMS in the production environment first.

Thanks.

2015-11-30 18:42 GMT+08:00 吴国泉 <[email protected]>:

> hi:
>     I also met this problem,    cong,pu is my partner,
>     hbase regionserver is configured with 92G RAM,
>     sometimes the regionserver would crash,i checked the log, but there
> was no GC happened ,that is very strange,
>     here is part of the regionserver log:
>
>
>
> 2015-11-29 00:48:30,521 DEBUG
> [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
> Compacting
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/b993875f0f384597900d2a2d3e69fde8,
> keycount=136626, bloomtype=ROW, size=49.6 M, encoding=NONE, seqNum=22891
> 2015-11-29 00:49:46,912 WARN  [regionserver60020.periodicFlusher]
> util.Sleeper: We slept 71814ms instead of 10000ms, this is likely due to a
> long garbage collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2015-11-29 00:49:46,912 INFO [SplitLogWorker-l-hbase27.dba.cn2.qunar.com
> ,60020,1447234715888-SendThread(l-zk2.dba.cn6.qunar.com:2181)]
> zookeeper.ClientCnxn: Client session timed out, have not heard from server
> in 77297ms for sessionid 0x150ad3cf038d6e6, closing socket connection and
> attempting reconnect
> 2015-11-29 00:49:46,912 WARN  [regionserver60020.compactionChecker]
> util.Sleeper: We slept 74029ms instead of 10000ms, this is likely due to a
> long garbage collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2015-11-29 00:49:46,912 INFO [regionserver60020-SendThread(
> l-zk1.dba.cn6.qunar.com:2181)] zookeeper.ClientCnxn: Client session timed
> out, have not heard from server in 74953ms for sessionid 0x34f16f641cc2e77,
> closing socket connection and attempting reconnect
> 2015-11-29 00:49:46,913 WARN  [JvmPauseMonitor] util.JvmPauseMonitor:
> Detected pause in JVM or host machine (eg GC): pause of approximately
> 70587ms
> No GCs detected
> 2015-11-29 00:49:46,913 WARN  [regionserver60020] util.Sleeper: We slept
> 72658ms instead of 3000ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2015-11-29 00:49:46,913 INFO [regionserver60020-SendThread(
> l-zk4.dba.cn8.qunar.com:2181)] zookeeper.ClientCnxn: Client session timed
> out, have not heard from server in 81433ms for sessionid 0x54f16f4aa3363fa,
> closing socket connection and attempting reconnect
> 2015-11-29 00:49:46,913 INFO [regionserver60020-SendThread(
> l-zk2.dba.cn6.qunar.com:2181)] zookeeper.ClientCnxn: Client session timed
> out, have not heard from server in 79798ms for sessionid 0x54f16f4aa3363fe,
> closing socket connection and attempting reconnect
> 2015-11-29 00:49:46,922 WARN  [ResponseProcessor for block
> BP-329468547-10.90.18.19-1444480193809:blk_1078012346_4271549]
> hdfs.DFSClient: Slow ReadProcessor read fields took 72827ms
> (threshold=30000ms); ack: seqno: -2 status: SUCCESS status: ERROR
> downstreamAckTimeNanos: 0, targets: [10.90.18.47:50010, 10.90.18.60:50010,
> 10.90.18.63:50010]
> 2015-11-29 00:49:46,921 WARN  [ResponseProcessor for block
> BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823]
> hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
> BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823
> java.io.EOFException: Premature EOF: no length prefix available
>         at
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2103)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:810)
>
>
>      as we can see , the log shows that :"We slept 74029ms instead of
> 10000ms, this is likely due to a long garbage collecting pause ",but
> following :" it No GCs deteted ",
>      this confused me. Can you give me some advice?
>      I use   java7, G1 garbage collector,
>
>
>
> On 2015年11月29日 08:44, 聪聪 wrote:
>
>> hi，all：
>> This morning regionserver shutdown.By analyzing the log, I guess it's
>> about GC.Then,How to optimize the GC in HBase.
>> I use the HBase version is hbase-0.98.6-cdh5.2.0.I look for your help in
>> this.
>>
>>
>> JVM configuration is as follows：
>> export HBASE_REGIONSERVER_OPTS="-Xms92g -Xmx92g -XX:PermSize=256m
>> -XX:MaxPermSize=256m -XX:+UseG1GC -server -XX:+DisableExplicitGC
>> -XX:+UseFastAccessorMethods -XX:SoftRefLRUPolicyMSPerMB=0
>> -XX:G1ReservePercent=15 -XX:InitiatingHeapOccupancyPercent=40
>> -XX:ConcGCThreads=18 -XX:+ParallelRefProcEnabled -XX:-ResizePLAB
>> -XX:ParallelGCThreads=18 -XX:+PrintClassHistogram -XX:+PrintGCDetails
>> -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>> -Xloggc:/home/q/hbase/hbase-0.98.6-cdh5.2.0/logs/gc-$(hostname)-hbase.log"
>>
>>
>> Regionserver on the log is as follows:
>> 2015-11-29 00:48:30,521 DEBUG
>> [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
>> Compacting
>> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/2becc4de83bc4102802a4f69adfba433,
>> keycount=1244806, bloomtype=ROW, size=462.2 M, encoding=NONE, seqNum=22878
>> 2015-11-29 00:48:30,521 DEBUG
>> [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
>> Compacting
>> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/52669d6cf3a34dc89af6faf23ea03fa0,
>> keycount=613700, bloomtype=ROW, size=236.4 M, encoding=NONE, seqNum=22885
>> 2015-11-29 00:48:30,521 DEBUG
>> [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
>> Compacting
>> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/1ff32b168134412ea3f6942126623ef6,
>> keycount=373741, bloomtype=ROW, size=143.9 M, encoding=NONE, seqNum=22889
>> 2015-11-29 00:48:30,521 DEBUG
>> [regionserver60020-smallCompactions-1447234955800] compactions.Compactor:
>> Compacting
>> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/b993875f0f384597900d2a2d3e69fde8,
>> keycount=136626, bloomtype=ROW, size=49.6 M, encoding=NONE, seqNum=22891
>> 2015-11-29 00:49:46,912 WARN  [regionserver60020.periodicFlusher]
>> util.Sleeper: We slept 71814ms instead of 10000ms, this is likely due to a
>> long garbage collecting pause and it's usually bad, see
>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> 2015-11-29 00:49:46,912 INFO  [SplitLogWorker-l-hbase27.dba.cn2.qunar.com
>> ,60020,1447234715888-SendThread(l-zk2.dba.cn6.qunar.com:2181)]
>> zookeeper.ClientCnxn: Client session timed out, have not heard from server
>> in 77297ms for sessionid 0x150ad3cf038d6e6, closing socket connection and
>> attempting reconnect
>> 2015-11-29 00:49:46,912 WARN  [regionserver60020.compactionChecker]
>> util.Sleeper: We slept 74029ms instead of 10000ms, this is likely due to a
>> long garbage collecting pause and it's usually bad, see
>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> 2015-11-29 00:49:46,912 INFO  [regionserver60020-SendThread(
>> l-zk1.dba.cn6.qunar.com:2181)] zookeeper.ClientCnxn: Client session
>> timed out, have not heard from server in 74953ms for sessionid
>> 0x34f16f641cc2e77, closing socket connection and attempting reconnect
>> 2015-11-29 00:49:46,913 WARN  [JvmPauseMonitor] util.JvmPauseMonitor:
>> Detected pause in JVM or host machine (eg GC): pause of approximately
>> 70587ms
>> No GCs detected
>> 2015-11-29 00:49:46,913 WARN  [regionserver60020] util.Sleeper: We slept
>> 72658ms instead of 3000ms, this is likely due to a long garbage collecting
>> pause and it's usually bad, see
>> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> 2015-11-29 00:49:46,913 INFO  [regionserver60020-SendThread(
>> l-zk4.dba.cn8.qunar.com:2181)] zookeeper.ClientCnxn: Client session
>> timed out, have not heard from server in 81433ms for sessionid
>> 0x54f16f4aa3363fa, closing socket connection and attempting reconnect
>> 2015-11-29 00:49:46,913 INFO  [regionserver60020-SendThread(
>> l-zk2.dba.cn6.qunar.com:2181)] zookeeper.ClientCnxn: Client session
>> timed out, have not heard from server in 79798ms for sessionid
>> 0x54f16f4aa3363fe, closing socket connection and attempting reconnect
>> 2015-11-29 00:49:46,922 WARN  [ResponseProcessor for block
>> BP-329468547-10.90.18.19-1444480193809:blk_1078012346_4271549]
>> hdfs.DFSClient: Slow ReadProcessor read fields took 72827ms
>> (threshold=30000ms); ack: seqno: -2 status: SUCCESS status: ERROR
>> downstreamAckTimeNanos: 0, targets: [10.90.18.47:50010, 10.90.18.60:50010,
>> 10.90.18.63:50010]
>> 2015-11-29 00:49:46,921 WARN  [ResponseProcessor for block
>> BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823]
>> hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
>> BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823
>> java.io.EOFException: Premature EOF: no length prefix available
>>          at
>> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2103)
>>          at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
>>          at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:810)
>> 2015-11-29 00:49:46,914 DEBUG [LruStats #0] hfile.LruBlockCache:
>> Total=20.98 MB, free=18.38 GB, max=18.40 GB, blocks=19756849152,
>> accesses=564477317, hits=166122280, hitRatio=29.43%, ,
>> cachingAccesses=166131865, cachingHits=166015128, cachingHitsRatio=99.93%,
>> evictions=149453, evicted=116562, evictedPerRun=0.7799240946769714
>> 2015-11-29 00:49:46,929 WARN  [DataStreamer for file
>> /hbase/tc_qtrace/WALs/l-hbase27.dba.cn2.qunar.com,60020,1447234715888/
>> l-hbase27.dba.cn2.qunar.com%2C60020%2C1447234715888.1448726130837.meta
>> block BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823]
>> hdfs.DFSClient: Error Recovery for block
>> BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823 in pipeline
>> 10.90.18.47:50010, 10.90.18.61:50010, 10.90.18.53:50010: bad datanode
>> 10.90.18.47:50010
>>
>>
>> 安全提示：本邮件非QUNAR内部邮件，请注意保护个人及公司信息安全，如有索取帐号密码等可疑情况请向 secteam发送邮件
>>
>
> --
> *吴国泉 wgq.wu*
> ----------------------------------------------------
> *Position*:技术部DBA-HBase
> *Email*:[email protected]
> *Mob*:13036134151
> *Location*:北京市海淀区丹棱街3号中国电子大厦 1708
> *Post Code*:100080
>

Re: How to optimize the GC in HBase

Reply via email to