Hi Pablo, It'a terrible for a long minor GC. I don't think there are swaping from your vmstat log. but I just suggest you 1) add following JVM options: -XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:GCTimeRatio=19 -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=2 -XX:MaxTenuringThreshold=3 -XX:+UseFastAccessorMethods
2) -Xmn is two small, your total Mem is 74GB, just make -Xmn2g 3) what are you doing during long GC happened? read or write? if reading, what the block cache size? On Mon, Mar 11, 2013 at 6:41 AM, Stack <[email protected]> wrote: > You could increase your zookeeper session timeout to 5 minutes while you > are figuring why these long pauses. > http://hbase.apache.org/book.html#zookeeper.session.timeout > > Above, there is an outage for almost 5 minutes: > > >> We slept 225100ms instead of 3000ms, this is likely due to a long > > You have ganglia or tsdb running? When you see the big pause above, can > you see anything going on on the machine? (swap, iowait, concurrent fat > mapreduce job?) > > St.Ack > > > > On Sun, Mar 10, 2013 at 3:29 PM, Pablo Musa <[email protected]> wrote: > > > Hi Sreepathi, > > they say in the book (or the site), we could try it to see if it is > really > > a timeout error > > or there is something more. But it is not recomended for production > > environments. > > > > I could give it a try if five minutes will ensure to us that the problem > > is the GC or > > elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30 > > minutes. > > > > Abs, > > Pablo > > > > > > On 03/10/2013 04:06 PM, Sreepathi wrote: > > > >> Hi Stack/Ted/Pablo, > >> > >> Should we increase the hbase.rpc.timeout property to 5 minutes ( 300000 > ms > >> ) ? > >> > >> Regards, > >> - Sreepathi > >> > >> On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa <[email protected]> wrote: > >> > >> That combo should be fine. > >>>> > >>> Great!! > >>> > >>> > >>> If JVM is full GC'ing, the application is stopped. > >>>> The below does not look like a full GC but that is a long pause in > >>>> system > >>>> time, enough to kill your zk session. > >>>> > >>> Exactly. This pause is really making the zk expire the RS which > shutsdown > >>> (logs > >>> in the end of the email). > >>> But the question is: what is causing this pause??!! > >>> > >>> You swapping? > >>>> > >>> I don't think so (stats below). > >>> > >>> Hardware is good? > >>>> > >>> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk > space. > >>> Below are some metrics I have heard about. Hope it helps. > >>> > >>> > >>> ** I am having some problems with the datanodes[1] which are having > >>> trouble to > >>> write. I really think the issues are related, but cannot solve any of > >>> them > >>> :( > >>> > >>> Thanks again, > >>> Pablo > >>> > >>> [1] http://mail-archives.apache.****org/mod_mbox/hadoop-hdfs-user/**** > >>> 201303.mbox/%3CCAJzooYfS-F1KS+******jGOPUt15PwFjcCSzigE0APeM9FXaCr**** > >>> [email protected]%3E<http:**//mail-archives.apache.org/** > >>> mod_mbox/hadoop-hdfs-user/**201303.mbox/%3CCAJzooYfS-F1KS+** > >>> jGOPUt15PwFjcCSzigE0APeM9FXaCr**[email protected]%3E< > http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3ccajzooyfs-f1ks+jgoput15pwfjccszige0apem9fxacrqfv...@mail.gmail.com%3E > > > >>> > > >>> > >>> top - 15:38:04 up 297 days, 21:03, 2 users, load average: 4.34, 2.55, > >>> 1.28 > >>> Tasks: 528 total, 1 running, 527 sleeping, 0 stopped, 0 zombie > >>> Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, > >>> 0.0%st > >>> Mem: 74187256k total, 29493992k used, 44693264k free, 5836576k > buffers > >>> Swap: 51609592k total, 128312k used, 51481280k free, 1353400k cached > >>> > >>> ]$ vmstat -w > >>> procs -------------------memory-----****------------- ---swap-- > >>> -----io---- > >>> --system-- -----cpu------- > >>> r b swpd free buff cache si so bi bo > >>> in > >>> cs us sy id wa st > >>> 2 0 128312 32416928 5838288 5043560 0 0 202 53 > >>> 0 > >>> 0 2 1 96 1 0 > >>> > >>> ]$ sar > >>> 02:20:01 PM all 26.18 0.00 2.90 0.63 0.00 > >>> 70.29 > >>> 02:30:01 PM all 1.66 0.00 1.25 1.05 0.00 > >>> 96.04 > >>> 02:40:01 PM all 10.01 0.00 2.14 0.75 0.00 > >>> 87.11 > >>> 02:50:01 PM all 0.76 0.00 0.80 1.03 0.00 > >>> 97.40 > >>> 03:00:01 PM all 0.23 0.00 0.30 0.71 0.00 > >>> 98.76 > >>> 03:10:01 PM all 0.22 0.00 0.30 0.66 0.00 > >>> 98.82 > >>> 03:20:01 PM all 0.22 0.00 0.31 0.76 0.00 > >>> 98.71 > >>> 03:30:01 PM all 0.24 0.00 0.31 0.64 0.00 > >>> 98.81 > >>> 03:40:01 PM all 1.13 0.00 2.97 1.18 0.00 > >>> 94.73 > >>> Average: all 3.86 0.00 1.38 0.88 0.00 > >>> 93.87 > >>> > >>> ]$ iostat > >>> Linux 2.6.32-220.7.1.el6.x86_64 (PSLBHDN002) 03/10/2013 _x86_64_ > >>> (16 CPU) > >>> > >>> avg-cpu: %user %nice %system %iowait %steal %idle > >>> 1.86 0.00 0.96 0.78 0.00 96.41 > >>> > >>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > >>> sda 1.23 20.26 23.53 521533196 605566924 > >>> sdb 6.51 921.55 241.90 23717850730 6225863488 > >>> sdc 6.22 921.83 236.41 23725181162 6084471192 > >>> sdd 6.25 925.13 237.26 23810004970 6106357880 > >>> sde 6.19 913.90 235.60 23521108818 6063722504 > >>> sdh 6.26 933.08 237.77 24014594546 6119511376 > >>> sdg 6.18 914.36 235.31 23532747378 6056257016 > >>> sdf 6.24 923.66 235.33 23772251810 6056604008 > >>> > >>> Some more logging which reinforce that the RS crash is happening > because > >>> of > >>> timeout. However this time the GC log is not accusing a big time. > >>> > >>> #####RS LOG##### > >>> 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.****ClientCnxn: > Client > >>> session timed out, have not heard from server in 257739ms for sessionid > >>> 0x13d3c4bcba6014a, closing socket connection and attempting reconnect > >>> 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.****ClientCnxn: > Client > >>> session timed out, have not heard from server in 226785ms for sessionid > >>> 0x13d3c4bcba60149, closing socket connection and attempting reconnect > >>> 2013-03-10 15:37:46,712 DEBUG org.apache.hadoop.hbase.io.**** > >>> hfile.LruBlockCache: > >>> Stats: total=61.91 MB, free=1.94 GB, max=2 GB, blocks=1254, > >>> accesses=60087, > >>> hits=58811, hitRatio=97.87%, , cachingAccesses=60069, > cachingHits=58811, > >>> cachingHitsRatio=97.90%, , evictions=0, evicted=0, evictedPerRun=NaN > >>> 2013-03-10 15:37:46,712 WARN org.apache.hadoop.hbase.util.****Sleeper: > >>> We > >>> slept 225100ms instead of 3000ms, this is likely due to a long garbage > >>> collecting pause and it's usually bad, see > >>> http://hbase.apache.org/book.**** <http://hbase.apache.org/book.**> > >>> html#trouble.rs.runtime.****zkexpired<http://hbase.apache.** > >>> org/book.html#trouble.rs.**runtime.zkexpired< > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired> > >>> > > >>> 2013-03-10 15:37:46,714 WARN org.apache.hadoop.hdfs.****DFSClient: > >>> DFSOutputStream ResponseProcessor exception for block > >>> BP-43236042-172.17.2.10-****1362490844340:blk_-** > >>> 6834190810033122569_25150229 > >>> > >>> java.io.EOFException: Premature EOF: no length prefix available > >>> at org.apache.hadoop.hdfs.****protocol.HdfsProtoUtil.** > >>> vintPrefixed(HdfsProtoUtil.****java:171) > >>> at org.apache.hadoop.hdfs.****protocol.datatransfer.** > >>> PipelineAck.readFields(****PipelineAck.java:114) > >>> at > org.apache.hadoop.hdfs.****DFSOutputStream$DataStreamer$**** > >>> ResponseProcessor.run(****DFSOutputStream.java:670) > >>> 2013-03-10 15:37:46,716 ERROR org.apache.hadoop.hbase.** > >>> regionserver.HRegionServer: > >>> org.apache.hadoop.hbase.ipc.****CallerDisconnectedException: Aborting > >>> call > >>> get([B@7caf69ed, {"timeRange":[0,****9223372036854775807],"** > >>> totalColumns":1,"cacheBlocks":****true,"families":{"details":[**"** > >>> ALL"]},"maxVersions":1,"row":"****\\x00\\x00\\x00\\x00\\x00\\*** > >>> *x12\\x93@"}), > >>> rpc version=1, client version=29, methodsFingerPrint=1891768260 from > >>> 172.17.1.71:51294 after 0 ms, since caller disconnected > >>> > >>> > >>> #####GC LOG##### > >>> 2716.635: [GC 2716.635: [ParNew: 57570K->746K(59008K), 0.0046530 secs] > >>> 354857K->300891K(1152704K), 0.0047370 secs] [Times: user=0.03 sys=0.00, > >>> real=0.00 secs] > >>> 2789.478: [GC 2789.478: [ParNew: 53226K->1192K(59008K), 0.0041370 secs] > >>> 353371K->301337K(1152704K), 0.0042220 secs] [Times: user=0.04 sys=0.00, > >>> real=0.00 secs] > >>> 2868.435: [GC 2868.435: [ParNew: 53672K->740K(59008K), 0.0041570 secs] > >>> 353817K->300886K(1152704K), 0.0042440 secs] [Times: user=0.03 sys=0.00, > >>> real=0.01 secs] > >>> 2920.309: [GC 2920.309: [ParNew: 53220K->6202K(59008K), 0.0058440 secs] > >>> 353366K->310443K(1152704K), 0.0059410 secs] [Times: user=0.05 sys=0.00, > >>> real=0.01 secs] > >>> 2984.239: [GC 2984.239: [ParNew: 58680K->5282K(59008K), 0.0048070 secs] > >>> 362921K->310977K(1152704K), 0.0049180 secs] [Times: user=0.03 sys=0.00, > >>> real=0.01 secs] > >>> 3050.620: [GC 3050.620: [ParNew: 57524K->5292K(59008K), 0.0055510 secs] > >>> 363219K->313384K(1152704K), 0.0056520 secs] [Times: user=0.04 sys=0.00, > >>> real=0.01 secs] > >>> 3111.331: [GC 3111.331: [ParNew: 57542K->5637K(59008K), 0.0056900 secs] > >>> 365634K->318079K(1152704K), 0.0057950 secs] [Times: user=0.03 sys=0.00, > >>> real=0.01 secs] > >>> 3163.510: [GC 3163.510: [ParNew: 58117K->1138K(59008K), 0.0046540 secs] > >>> 370559K->316086K(1152704K), 0.0047470 secs] [Times: user=0.03 sys=0.00, > >>> real=0.01 secs] > >>> 3217.642: [GC 3217.643: [ParNew: 53618K->1001K(59008K), 0.0038530 secs] > >>> 368566K->315949K(1152704K), 0.0039450 secs] [Times: user=0.03 sys=0.00, > >>> real=0.00 secs] > >>> 3464.067: [GC 3464.067: [ParNew: 53481K->5348K(59008K), 0.0068370 secs] > >>> 368429K->322343K(1152704K), 0.0069170 secs] [Times: user=0.07 sys=0.01, > >>> real=0.00 secs] > >>> 3465.648: [GC 3465.648: [ParNew: 57828K->1609K(59008K), 0.0111520 secs] > >>> 374823K->321496K(1152704K), 0.0112170 secs] [Times: user=0.07 sys=0.00, > >>> real=0.01 secs] > >>> 3466.291: [GC 3466.291: [ParNew: 54089K->2186K(59008K), 0.0093190 secs] > >>> 373976K->322073K(1152704K), 0.0093800 secs] [Times: user=0.06 sys=0.00, > >>> real=0.01 secs] > >>> Heap > >>> par new generation total 59008K, used 15497K [0x00000005fae00000, > >>> 0x00000005fee00000, 0x00000005fee00000) > >>> eden space 52480K, 25% used [0x00000005fae00000, > 0x00000005fbaff8c8, > >>> 0x00000005fe140000) > >>> from space 6528K, 33% used [0x00000005fe140000, 0x00000005fe362b70, > >>> 0x00000005fe7a0000) > >>> to space 6528K, 0% used [0x00000005fe7a0000, 0x00000005fe7a0000, > >>> 0x00000005fee00000) > >>> concurrent mark-sweep generation total 1093696K, used 319887K > >>> [0x00000005fee00000, 0x0000000641a10000, 0x00000007fae00000) > >>> concurrent-mark-sweep perm gen total 30464K, used 30427K > >>> [0x00000007fae00000, 0x00000007fcbc0000, 0x0000000800000000) > >>> > >>> > >>> > >>> On 03/08/2013 07:02 PM, Stack wrote: > >>> > >>> On Fri, Mar 8, 2013 at 10:58 AM, Pablo Musa <[email protected]> wrote: > >>>> > >>>> 0.94 currently doesn't support hadoop 2.0 > >>>> > >>>>> Can you deploy hadoop 1.1.1 instead ? > >>>>>> > >>>>>> I am using cdh4.2.0 which uses this version as default > installation. > >>>>>> > >>>>> I think it will be a problem for me to deploy 1.1.1 because I would > >>>>> need > >>>>> to > >>>>> "upgrade" the whole cluster with 70TB of data (backup everything, go > >>>>> offline, etc.). > >>>>> > >>>>> Is there a problem to use cdh4.2.0? > >>>>> I should send my email to cdh list? > >>>>> > >>>>> > >>>>> That combo should be fine. > >>>>> > >>>> > >>>> You Full GC'ing around this time? > >>>> > >>>>> The GC shows it took a long time. However it does not make any sense > >>>>> to be it, since the same ammount of data was cleaned before and AFTER > >>>>> in just 0.01 secs! > >>>>> > >>>>> > >>>>> If JVM is full GC'ing, the application is stopped. > >>>>> > >>>> > >>>> [Times: user=0.08 sys=137.62, real=137.62 secs] > >>>> > >>>>> Besides the whole time was used by system. That is what is bugging > me. > >>>>> > >>>>> > >>>>> The below does not look like a full GC but that is a long pause in > >>>>> > >>>> system > >>>> time, enough to kill your zk session. > >>>> > >>>> You swapping? > >>>> > >>>> Hardware is good? > >>>> > >>>> St.Ack > >>>> > >>>> > >>>> > >>>> ... > >>>> > >>>>> > >>>>> 1044.081: [GC 1044.081: [ParNew: 58970K->402K(59008K), 0.0040990 > secs] > >>>>> 275097K->216577K(1152704K), 0.0041820 secs] [Times: user=0.03 > sys=0.00, > >>>>> real=0.01 secs] > >>>>> > >>>>> 1087.319: [GC 1087.319: [ParNew: 52873K->6528K(59008K), 0.0055000 > secs] > >>>>> 269048K->223592K(1152704K), 0.0055930 secs] [Times: user=0.04 > sys=0.01, > >>>>> real=0.00 secs] > >>>>> > >>>>> 1087.834: [GC 1087.834: [ParNew: 59008K->6527K(59008K), 137.6353620 > >>>>> secs] 276072K->235097K(1152704K), 137.6354700 secs] [Times: user=0.08 > >>>>> sys=137.62, real=137.62 secs] > >>>>> > >>>>> 1226.638: [GC 1226.638: [ParNew: 59007K->1897K(59008K), 0.0079960 > secs] > >>>>> 287577K->230937K(1152704K), 0.0080770 secs] [Times: user=0.05 > sys=0.00, > >>>>> real=0.01 secs] > >>>>> > >>>>> 1227.251: [GC 1227.251: [ParNew: 54377K->2379K(59008K), 0.0095650 > secs] > >>>>> 283417K->231420K(1152704K), 0.0096340 secs] [Times: user=0.06 > sys=0.00, > >>>>> real=0.01 secs] > >>>>> > >>>>> > >>>>> I really appreciate you guys helping me to find out what is wrong. > >>>>> > >>>>> Thanks, > >>>>> Pablo > >>>>> > >>>>> > >>>>> > >>>>> On 03/08/2013 02:11 PM, Stack wrote: > >>>>> > >>>>> What RAM says. > >>>>> > >>>>>> 2013-03-07 17:24:57,887 INFO > org.apache.zookeeper.********ClientCnxn: > >>>>>> Client > >>>>>> > >>>>>> session timed out, have not heard from server in 159348ms for > >>>>>> sessionid > >>>>>> 0x13d3c4bcba600a7, closing socket connection and attempting > reconnect > >>>>>> > >>>>>> You Full GC'ing around this time? > >>>>>> > >>>>>> Put up your configs in a place where we can take a look? > >>>>>> > >>>>>> St.Ack > >>>>>> > >>>>>> > >>>>>> On Fri, Mar 8, 2013 at 8:32 AM, ramkrishna vasudevan < > >>>>>> ramkrishna.s.vasudevan@gmail.******com > <ramkrishna.s.vasudevan@gmail. > >>>>>> **** > >>>>>> com <ramkrishna.s.vasudevan@gmail.**com< > [email protected]> > >>>>>> >>> > >>>>>> wrote: > >>>>>> > >>>>>> I think it is with your GC config. What is your heap size? What > >>>>>> is > >>>>>> the > >>>>>> > >>>>>> data that you pump in and how much is the block cache size? > >>>>>>> > >>>>>>> Regards > >>>>>>> Ram > >>>>>>> > >>>>>>> On Fri, Mar 8, 2013 at 9:31 PM, Ted Yu <[email protected]> > wrote: > >>>>>>> > >>>>>>> 0.94 currently doesn't support hadoop 2.0 > >>>>>>> > >>>>>>> Can you deploy hadoop 1.1.1 instead ? > >>>>>>>> > >>>>>>>> Are you using 0.94.5 ? > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> > >>>>>>>> On Fri, Mar 8, 2013 at 7:44 AM, Pablo Musa <[email protected]> > wrote: > >>>>>>>> > >>>>>>>> Hey guys, > >>>>>>>> > >>>>>>>> as I sent in an email a long time ago, the RSs in my cluster did > >>>>>>>>> not > >>>>>>>>> > >>>>>>>>> get > >>>>>>>>> > >>>>>>>> along > >>>>>>>> > >>>>>>>> and crashed 3 times a day. I tried a lot of options we discussed > in > >>>>>>>>> the > >>>>>>>>> emails, but it not solved the problem. As I used an old version > of > >>>>>>>>> > >>>>>>>>> hadoop I > >>>>>>>>> > >>>>>>>> thought this was the problem. > >>>>>>>> > >>>>>>>>> So, I upgraded from hadoop 0.20 - hbase 0.90 - zookeeper 3.3.5 to > >>>>>>>>> > >>>>>>>>> hadoop > >>>>>>>>> > >>>>>>>> 2.0.0 > >>>>>>>> > >>>>>>>> - hbase 0.94 - zookeeper 3.4.5. > >>>>>>>>> > >>>>>>>>> Unfortunately the RSs did not stop crashing, and worst! Now they > >>>>>>>>> crash > >>>>>>>>> every > >>>>>>>>> hour and some times when the RS that holds the .ROOT. crashes all > >>>>>>>>> > >>>>>>>>> cluster > >>>>>>>>> > >>>>>>>> get > >>>>>>>> > >>>>>>>> stuck in transition and everything stops working. > >>>>>>>>> In this case I need to clean zookeeper znodes, restart the master > >>>>>>>>> and > >>>>>>>>> > >>>>>>>>> the > >>>>>>>>> > >>>>>>>> RSs. > >>>>>>>> > >>>>>>>> To avoid this case I am running on production with only ONE RS > and > >>>>>>>>> a > >>>>>>>>> monitoring > >>>>>>>>> script that check every minute, if the RS is ok. If not, restart > >>>>>>>>> it. > >>>>>>>>> * This case does not get the cluster stuck. > >>>>>>>>> > >>>>>>>>> This is driving me crazy, but I really cant find a solution for > the > >>>>>>>>> cluster. > >>>>>>>>> I tracked all logs from the start time 16:49 from all interesting > >>>>>>>>> nodes > >>>>>>>>> (zoo, > >>>>>>>>> namenode, master, rs, dn2, dn9, dn10) and copied here what I > think > >>>>>>>>> is > >>>>>>>>> usefull. > >>>>>>>>> > >>>>>>>>> There are some strange errors in the DATANODE2, as an error > >>>>>>>>> copiyng a > >>>>>>>>> > >>>>>>>>> block > >>>>>>>>> > >>>>>>>> to itself. > >>>>>>>> > >>>>>>>>> The gc log points to GC timeout. However it is very weird that > the > >>>>>>>>> RS > >>>>>>>>> > >>>>>>>>> spend > >>>>>>>>> > >>>>>>>> so much time in GC while in the other cases it takes 0.001sec. > >>>>>>>> > >>>>>>>>> Besides, > >>>>>>>>> the time > >>>>>>>>> spent, is in sys which makes me think that might be a problem in > >>>>>>>>> > >>>>>>>>> another > >>>>>>>>> > >>>>>>>> place. > >>>>>>>> > >>>>>>>> I know that it is a bunch of logs, and that it is very difficult > to > >>>>>>>>> > >>>>>>>>> find > >>>>>>>>> > >>>>>>>> the > >>>>>>>> > >>>>>>>> problem without much context. But I REALLY need some help. If it > is > >>>>>>>>> not > >>>>>>>>> > >>>>>>>>> the > >>>>>>>>> > >>>>>>>> solution, at least what I should read, where I should look, or > >>>>>>>> which > >>>>>>>> > >>>>>>>>> cases > >>>>>>>>> > >>>>>>>> I > >>>>>>>> > >>>>>>>>> should monitor. > >>>>>>>>> > >>>>>>>>> Thank you very much, > >>>>>>>>> Pablo Musa > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >> -- > >> *Regards,* > >> --- *Sreepathi * > >> > > > > >
