Re: How to optimize the GC in HBase

吴国泉 Mon, 30 Nov 2015 02:44:14 -0800

hi:
    I also met this problem,    cong,pu is my partner,
    hbase regionserver is configured with 92G RAM,

sometimes the regionserver would crash,i checked the log, but therewas no GC happened ,that is very strange,

    here is part of the regionserver log:

2015-11-29 00:48:30,521 DEBUG[regionserver60020-smallCompactions-1447234955800]compactions.Compactor: Compactinghdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/b993875f0f384597900d2a2d3e69fde8,keycount=136626, bloomtype=ROW, size=49.6 M, encoding=NONE, seqNum=228912015-11-29 00:49:46,912 WARN [regionserver60020.periodicFlusher]util.Sleeper: We slept 71814ms instead of 10000ms, this is likely due toa long garbage collecting pause and it's usually bad, seehttp://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired2015-11-29 00:49:46,912 INFO[SplitLogWorker-l-hbase27.dba.cn2.qunar.com,60020,1447234715888-SendThread(l-zk2.dba.cn6.qunar.com:2181)]zookeeper.ClientCnxn: Client session timed out, have not heard fromserver in 77297ms for sessionid 0x150ad3cf038d6e6, closing socketconnection and attempting reconnect2015-11-29 00:49:46,912 WARN [regionserver60020.compactionChecker]util.Sleeper: We slept 74029ms instead of 10000ms, this is likely due toa long garbage collecting pause and it's usually bad, seehttp://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired2015-11-29 00:49:46,912 INFO[regionserver60020-SendThread(l-zk1.dba.cn6.qunar.com:2181)]zookeeper.ClientCnxn: Client session timed out, have not heard fromserver in 74953ms for sessionid 0x34f16f641cc2e77, closing socketconnection and attempting reconnect2015-11-29 00:49:46,913 WARN [JvmPauseMonitor] util.JvmPauseMonitor:Detected pause in JVM or host machine (eg GC): pause of approximately70587ms

No GCs detected

2015-11-29 00:49:46,913 WARN [regionserver60020] util.Sleeper: We slept72658ms instead of 3000ms, this is likely due to a long garbagecollecting pause and it's usually bad, seehttp://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired2015-11-29 00:49:46,913 INFO[regionserver60020-SendThread(l-zk4.dba.cn8.qunar.com:2181)]zookeeper.ClientCnxn: Client session timed out, have not heard fromserver in 81433ms for sessionid 0x54f16f4aa3363fa, closing socketconnection and attempting reconnect2015-11-29 00:49:46,913 INFO[regionserver60020-SendThread(l-zk2.dba.cn6.qunar.com:2181)]zookeeper.ClientCnxn: Client session timed out, have not heard fromserver in 79798ms for sessionid 0x54f16f4aa3363fe, closing socketconnection and attempting reconnect2015-11-29 00:49:46,922 WARN [ResponseProcessor for blockBP-329468547-10.90.18.19-1444480193809:blk_1078012346_4271549]hdfs.DFSClient: Slow ReadProcessor read fields took 72827ms(threshold=30000ms); ack: seqno: -2 status: SUCCESS status: ERRORdownstreamAckTimeNanos: 0, targets: [10.90.18.47:50010,10.90.18.60:50010, 10.90.18.63:50010]2015-11-29 00:49:46,921 WARN [ResponseProcessor for blockBP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823]hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for blockBP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823

java.io.EOFException: Premature EOF: no length prefix available

atorg.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2103)atorg.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)atorg.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:810)

as we can see , the log shows that :"We slept 74029ms instead of10000ms, this is likely due to a long garbage collecting pause ",butfollowing :" it No GCs deteted ",

     this confused me. Can you give me some advice?
     I use   java7, G1 garbage collector,


On 2015??11??29?? 08:44, ???? wrote:

hi??all??
This morning regionserver shutdown.By analyzing the log, I guess it's about 
GC.Then,How to optimize the GC in HBase.
I use the HBase version is hbase-0.98.6-cdh5.2.0.I look for your help in this.


JVM configuration is as follows??
export HBASE_REGIONSERVER_OPTS="-Xms92g -Xmx92g -XX:PermSize=256m 
-XX:MaxPermSize=256m -XX:+UseG1GC -server -XX:+DisableExplicitGC 
-XX:+UseFastAccessorMethods -XX:SoftRefLRUPolicyMSPerMB=0 -XX:G1ReservePercent=15 
-XX:InitiatingHeapOccupancyPercent=40 -XX:ConcGCThreads=18 -XX:+ParallelRefProcEnabled 
-XX:-ResizePLAB -XX:ParallelGCThreads=18 -XX:+PrintClassHistogram -XX:+PrintGCDetails 
-XX:+PrintGCDateStamps -XX:+PrintHeapAtGC 
-Xloggc:/home/q/hbase/hbase-0.98.6-cdh5.2.0/logs/gc-$(hostname)-hbase.log"


Regionserver on the log is as follows:
2015-11-29 00:48:30,521 DEBUG 
[regionserver60020-smallCompactions-1447234955800] compactions.Compactor: 
Compacting 
hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/2becc4de83bc4102802a4f69adfba433,
 keycount=1244806, bloomtype=ROW, size=462.2 M, encoding=NONE, seqNum=22878
2015-11-29 00:48:30,521 DEBUG 
[regionserver60020-smallCompactions-1447234955800] compactions.Compactor: 
Compacting 
hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/52669d6cf3a34dc89af6faf23ea03fa0,
 keycount=613700, bloomtype=ROW, size=236.4 M, encoding=NONE, seqNum=22885
2015-11-29 00:48:30,521 DEBUG 
[regionserver60020-smallCompactions-1447234955800] compactions.Compactor: 
Compacting 
hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/1ff32b168134412ea3f6942126623ef6,
 keycount=373741, bloomtype=ROW, size=143.9 M, encoding=NONE, seqNum=22889
2015-11-29 00:48:30,521 DEBUG 
[regionserver60020-smallCompactions-1447234955800] compactions.Compactor: 
Compacting 
hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/3449dc007b05ec535d5a7e3cf3cafc81/t/b993875f0f384597900d2a2d3e69fde8,
 keycount=136626, bloomtype=ROW, size=49.6 M, encoding=NONE, seqNum=22891
2015-11-29 00:49:46,912 WARN  [regionserver60020.periodicFlusher] util.Sleeper: 
We slept 71814ms instead of 10000ms, this is likely due to a long garbage 
collecting pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2015-11-29 00:49:46,912 INFO  
[SplitLogWorker-l-hbase27.dba.cn2.qunar.com,60020,1447234715888-SendThread(l-zk2.dba.cn6.qunar.com:2181)]
 zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
77297ms for sessionid 0x150ad3cf038d6e6, closing socket connection and 
attempting reconnect
2015-11-29 00:49:46,912 WARN  [regionserver60020.compactionChecker] 
util.Sleeper: We slept 74029ms instead of 10000ms, this is likely due to a long 
garbage collecting pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2015-11-29 00:49:46,912 INFO  
[regionserver60020-SendThread(l-zk1.dba.cn6.qunar.com:2181)] 
zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
74953ms for sessionid 0x34f16f641cc2e77, closing socket connection and 
attempting reconnect
2015-11-29 00:49:46,913 WARN  [JvmPauseMonitor] util.JvmPauseMonitor: Detected 
pause in JVM or host machine (eg GC): pause of approximately 70587ms
No GCs detected
2015-11-29 00:49:46,913 WARN  [regionserver60020] util.Sleeper: We slept 
72658ms instead of 3000ms, this is likely due to a long garbage collecting 
pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2015-11-29 00:49:46,913 INFO  
[regionserver60020-SendThread(l-zk4.dba.cn8.qunar.com:2181)] 
zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
81433ms for sessionid 0x54f16f4aa3363fa, closing socket connection and 
attempting reconnect
2015-11-29 00:49:46,913 INFO  
[regionserver60020-SendThread(l-zk2.dba.cn6.qunar.com:2181)] 
zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
79798ms for sessionid 0x54f16f4aa3363fe, closing socket connection and 
attempting reconnect
2015-11-29 00:49:46,922 WARN  [ResponseProcessor for block 
BP-329468547-10.90.18.19-1444480193809:blk_1078012346_4271549] hdfs.DFSClient: 
Slow ReadProcessor read fields took 72827ms (threshold=30000ms); ack: seqno: -2 
status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
[10.90.18.47:50010, 10.90.18.60:50010, 10.90.18.63:50010]
2015-11-29 00:49:46,921 WARN  [ResponseProcessor for block 
BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823] hdfs.DFSClient: 
DFSOutputStream ResponseProcessor exception  for block 
BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823
java.io.EOFException: Premature EOF: no length prefix available
         at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2103)
         at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
         at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:810)
2015-11-29 00:49:46,914 DEBUG [LruStats #0] hfile.LruBlockCache: Total=20.98 
MB, free=18.38 GB, max=18.40 GB, blocks=19756849152, accesses=564477317, 
hits=166122280, hitRatio=29.43%, , cachingAccesses=166131865, 
cachingHits=166015128, cachingHitsRatio=99.93%, evictions=149453, 
evicted=116562, evictedPerRun=0.7799240946769714
2015-11-29 00:49:46,929 WARN  [DataStreamer for file 
/hbase/tc_qtrace/WALs/l-hbase27.dba.cn2.qunar.com,60020,1447234715888/l-hbase27.dba.cn2.qunar.com%2C60020%2C1447234715888.1448726130837.meta
 block BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823] 
hdfs.DFSClient: Error Recovery for block 
BP-329468547-10.90.18.19-1444480193809:blk_1078010620_4269823 in pipeline 
10.90.18.47:50010, 10.90.18.61:50010, 10.90.18.53:50010: bad datanode 
10.90.18.47:50010


??????????????????QUNAR??????????????????????????????????????????????????????????????????????
 secteam????????


--
*?????? wgq.wu*
----------------------------------------------------
*Position*:??????DBA-HBase
*Email*:[email protected]
*Mob*:13036134151
*Location*:??????????????????3?????????????? 1708
*Post Code*:100080

Re: How to optimize the GC in HBase

Reply via email to