Thanks for the sharing. Have you looked at http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired , suggested below ?
I think zookeeper timeout should be more closely watched. On Wed, Apr 3, 2013 at 11:21 AM, Pablo Musa <[email protected]> wrote: > Hello guys, > I stopped my research on HBase ZK timeout for while due to > other issues I had to do, but I am back. > > A very weird behavior that I would like your comments is that my > RegionServers perform better (less crashes) under heavy load instead > of light load. > There is, if I let HBase alone with 50 requestsPerSecond along the > whole day the crashes are higher than if I run a mapred Job every hour. > > > Another weird thing is the following: > > RS startTime = Mon Apr 01 13:22:35 BRT 2013 > > [...]$ grep slept /var/log/hbase/hbase-hbase-**regionserver-PSLBHDN00*.log > 2013-04-01 20:09:21,135 WARN org.apache.hadoop.hbase.util.**Sleeper: We > slept 45491ms instead of 3000ms, this is likely due to a long garbage > collecting pause and it's usually bad, see http://hbase.apache.org/book.** > html#trouble.rs.runtime.**zkexpired<http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired> > 2013-04-01 22:45:59,407 WARN org.apache.hadoop.hbase.util.**Sleeper: We > slept 101271ms instead of 3000ms, this is likely due to a long garbage > collecting pause and it's usually bad, see http://hbase.apache.org/book.** > html#trouble.rs.runtime.**zkexpired<http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired> > > [...]$ egrep 'real=[1-9][0-9][0-9][0-9]*\.[**0-9][0-9]' > /var/log/hbase/hbase-hbase-**regionserver-PSLBHDN00*.out > * the below report is the above command for each time range. > 0.0 - 0.1 secs GCs = 5084 > 0.1 - 0.5 secs GCs = 9 > 0.5 - 1.0 secs GCs = 3 > 1.0 - 010 secs GCs = 0 > 010 - 100 secs GCs = 0 > 100 - 1000 secs GCs = 0 > > So, my script for getting every gc time ("real=... secs") says that > there is no gc that took longer than 1 second. > However the RS log says twice that the RS slept more than 40 seconds > instead of 3. > > "this is likely due to a long garbage collecting pause", yes > this is likely but I dont think it is the case. > > The machine is a huge machine with 70GB RAM, 32 procs, light load, > no swap or iowait. > > Any ideas? > > Thanks, > Pablo > > > On 03/12/2013 12:43 PM, Pablo Musa wrote: > >> Guys, >> thank you very much for the help. >> >> Yesterday I spent 14 hours trying to tune the whole cluster. >> The cluster is not ready yet needs a lot of tunning, but at least is >> working. >> >> My first big problem was namenode + datanode GC. They were not using >> CMS and thus were taking "incremental" time to run. Ii started in 0.01 >> ms and >> in 20 minutes was taking 150 secs. >> After setting CMSGC this time is much smaller taking a maximum of 70 secs, >> which is VERY HIGH, but for now does not stop HBase. >> >> With this issue solved, it was clear that the RS was doing a long pause >> GC, >> taking up to 220 secs. Zookeeper expired the RS and it shutdown. >> I tried a lot of different flags configuration (MORE than 20), and could >> not >> get small gcs. Eventually it would take more than 150 secs (zookeeper >> timeout) >> and shutdown. >> >> Finally I tried a config that so far, 12 hours, is working with a maximum >> GC >> time of 90 secs. Which of course is a terrible problem since HBase is a >> database, but at least the cluster is stable while I can tune it a >> little more. >> >> In my opinion, my biggest problem is to have a few "monster" machines in >> the >> cluster instead of a bunch of commodities machines. I don't know if >> there are >> a lot companies using this kind of machines inside a hadoop cluster, but >> a fast search on google could not find a lot of tunes for big heap GCs. >> >> I guess my next step will be search for big heap gc tuning. >> >> Back to some questions ;) >> >> > You have ganglia or tsdb running? >> >> I use zabbix for now, and no there is nothing going on when the big >> pause happens. >> >> > When you see the big pause above, can you see anything going on on the >> > machine? (swap, iowait, concurrent fat mapreduce job?) >> > what are you doing during long GC happened? read or write? if >> reading, what >> > the block cache size? >> >> The cpu for the RS process goes to 100% and the logs "pause", until it >> gets out. >> Ex: [NewPar >> >> IO and SWAP are normal. There is no MR running, just normal database >> load, which is >> very low. I am probably doing reads AND writes to the database with >> default block >> cache size. >> One problem in this moment might be the big number of regions (1252) >> since I am >> using only one RS to be able to track the problem. >> >> The links and ideas were very helpful. Thank you very much guys. >> >> I will post my future researches as I find a solution ;) >> >> If you have more ideas or info (links, flag suggestions, etc.), please >> post it :) >> >> Abs, >> Pablo >> >> On 03/10/2013 11:24 PM, Andrew Purtell wrote: >> >>> Be careful with GC tuning, throwing changes at an application without >>> analysis of what is going on with the heap is shooting in the dark. One >>> particular good treatment of the subject is here: >>> http://java.dzone.com/**articles/how-tame-java-gc-**pauses<http://java.dzone.com/articles/how-tame-java-gc-pauses> >>> >>> If you have made custom changes to blockcache or memstore configurations, >>> back them out until you're sure everything else is ok. >>> >>> Watch carefully for swapping. Set the vm.swappiness sysctl to 0. Monitor >>> for spikes in page scanning or any swap activity. Nothing brings on >>> "Juliette" pauses better than a JVM partially swapped out. The Java GC >>> starts collection by examining the oldest pages, and those are the first >>> pages the OS swaps out... >>> >>> >>> >>> On Mon, Mar 11, 2013 at 10:13 AM, Azuryy Yu <[email protected]> wrote: >>> >>> Hi Pablo, >>>> It'a terrible for a long minor GC. I don't think there are swaping from >>>> your vmstat log. >>>> but I just suggest you >>>> 1) add following JVM options: >>>> -XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:GCTimeRatio=19 >>>> -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=2 >>>> -XX:MaxTenuringThreshold=3 -XX:+UseFastAccessorMethods >>>> >>>> 2) -Xmn is two small, your total Mem is 74GB, just make -Xmn2g >>>> 3) what are you doing during long GC happened? read or write? if >>>> reading, >>>> what the block cache size? >>>> >>>> >>>> >>>> >>>> On Mon, Mar 11, 2013 at 6:41 AM, Stack <[email protected]> wrote: >>>> >>>> You could increase your zookeeper session timeout to 5 minutes while >>>>> you >>>>> are figuring why these long pauses. >>>>> http://hbase.apache.org/book.**html#zookeeper.session.timeout<http://hbase.apache.org/book.html#zookeeper.session.timeout> >>>>> >>>>> Above, there is an outage for almost 5 minutes: >>>>> >>>>> We slept 225100ms instead of 3000ms, this is likely due to a long >>>>>>> >>>>>> You have ganglia or tsdb running? When you see the big pause above, >>>>> can >>>>> you see anything going on on the machine? (swap, iowait, concurrent >>>>> fat >>>>> mapreduce job?) >>>>> >>>>> St.Ack >>>>> >>>>> >>>>> >>>>> On Sun, Mar 10, 2013 at 3:29 PM, Pablo Musa <[email protected]> wrote: >>>>> >>>>> Hi Sreepathi, >>>>>> they say in the book (or the site), we could try it to see if it is >>>>>> >>>>> really >>>>> >>>>>> a timeout error >>>>>> or there is something more. But it is not recomended for production >>>>>> environments. >>>>>> >>>>>> I could give it a try if five minutes will ensure to us that the >>>>>> >>>>> problem >>>> >>>>> is the GC or >>>>>> elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30 >>>>>> minutes. >>>>>> >>>>>> Abs, >>>>>> Pablo >>>>>> >>>>>> >>>>>> On 03/10/2013 04:06 PM, Sreepathi wrote: >>>>>> >>>>>> Hi Stack/Ted/Pablo, >>>>>>> >>>>>>> Should we increase the hbase.rpc.timeout property to 5 minutes ( >>>>>>> >>>>>> 300000 >>>> >>>>> ms >>>>> >>>>>> ) ? >>>>>>> >>>>>>> Regards, >>>>>>> - Sreepathi >>>>>>> >>>>>>> On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> That combo should be fine. >>>>>>> >>>>>>>> Great!! >>>>>>>> >>>>>>>> >>>>>>>> If JVM is full GC'ing, the application is stopped. >>>>>>>> >>>>>>>>> The below does not look like a full GC but that is a long pause in >>>>>>>>> system >>>>>>>>> time, enough to kill your zk session. >>>>>>>>> >>>>>>>>> Exactly. This pause is really making the zk expire the RS which >>>>>>>> >>>>>>> shutsdown >>>>> >>>>>> (logs >>>>>>>> in the end of the email). >>>>>>>> But the question is: what is causing this pause??!! >>>>>>>> >>>>>>>> You swapping? >>>>>>>> I don't think so (stats below). >>>>>>>> >>>>>>>> Hardware is good? >>>>>>>> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk >>>>>>>> >>>>>>> space. >>>>> >>>>>> Below are some metrics I have heard about. Hope it helps. >>>>>>>> >>>>>>>> >>>>>>>> ** I am having some problems with the datanodes[1] which are having >>>>>>>> trouble to >>>>>>>> write. I really think the issues are related, but cannot solve any >>>>>>>> of >>>>>>>> them >>>>>>>> :( >>>>>>>> >>>>>>>> Thanks again, >>>>>>>> Pablo >>>>>>>> >>>>>>>> [1] http://mail-archives.apache. >>>>>>>> >>>>>>> ****org/mod_mbox/hadoop-hdfs-**user/**** >>>> 201303.mbox/%3CCAJzooYfS-F1KS+**********jGOPUt15PwFjcCSzigE0APeM9FXaCr* >>>> ***** >>>> >>>>> [email protected]%3E<http:****//mail-archives.apache.org/****<http://mail-archives.apache.org/**> >>>>>>>> mod_mbox/hadoop-hdfs-user/****201303.mbox/%3CCAJzooYfS-F1KS+**** >>>>>>>> jGOPUt15PwFjcCSzigE0APeM9FXaCr****[email protected]%3E< >>>>>>>> >>>>>>> http://mail-archives.apache.**org/mod_mbox/hadoop-hdfs-user/** >>>> 201303.mbox/%3CCAJzooYfS-F1KS+**jGOPUt15PwFjcCSzigE0APeM9FXaCr** >>>> [email protected]%3E<http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3ccajzooyfs-f1ks+jgoput15pwfjccszige0apem9fxacrqfv...@mail.gmail.com%3E> >>>> >>>>> top - 15:38:04 up 297 days, 21:03, 2 users, load average: 4.34, >>>>>>>> >>>>>>> 2.55, >>>> >>>>> 1.28 >>>>>>>> Tasks: 528 total, 1 running, 527 sleeping, 0 stopped, 0 zombie >>>>>>>> Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, >>>>>>>> 0.0%si, >>>>>>>> 0.0%st >>>>>>>> Mem: 74187256k total, 29493992k used, 44693264k free, 5836576k >>>>>>>> >>>>>>> buffers >>>>> >>>>>> Swap: 51609592k total, 128312k used, 51481280k free, 1353400k >>>>>>>> >>>>>>> cached >>>> >>>>> ]$ vmstat -w >>>>>>>> procs -------------------memory-----******------------- ---swap-- >>>>>>>> -----io---- >>>>>>>> --system-- -----cpu------- >>>>>>>> r b swpd free buff cache si so >>>>>>>> bi bo >>>>>>>> in >>>>>>>> cs us sy id wa st >>>>>>>> 2 0 128312 32416928 5838288 5043560 0 0 >>>>>>>> 202 53 >>>>>>>> 0 >>>>>>>> 0 2 1 96 1 0 >>>>>>>> >>>>>>>> ]$ sar >>>>>>>> 02:20:01 PM all 26.18 0.00 2.90 0.63 0.00 >>>>>>>> 70.29 >>>>>>>> 02:30:01 PM all 1.66 0.00 1.25 1.05 0.00 >>>>>>>> 96.04 >>>>>>>> 02:40:01 PM all 10.01 0.00 2.14 0.75 0.00 >>>>>>>> 87.11 >>>>>>>> 02:50:01 PM all 0.76 0.00 0.80 1.03 0.00 >>>>>>>> 97.40 >>>>>>>> 03:00:01 PM all 0.23 0.00 0.30 0.71 0.00 >>>>>>>> 98.76 >>>>>>>> 03:10:01 PM all 0.22 0.00 0.30 0.66 0.00 >>>>>>>> 98.82 >>>>>>>> 03:20:01 PM all 0.22 0.00 0.31 0.76 0.00 >>>>>>>> 98.71 >>>>>>>> 03:30:01 PM all 0.24 0.00 0.31 0.64 0.00 >>>>>>>> 98.81 >>>>>>>> 03:40:01 PM all 1.13 0.00 2.97 1.18 0.00 >>>>>>>> 94.73 >>>>>>>> Average: all 3.86 0.00 1.38 0.88 0.00 >>>>>>>> 93.87 >>>>>>>> >>>>>>>> ]$ iostat >>>>>>>> Linux 2.6.32-220.7.1.el6.x86_64 (PSLBHDN002) 03/10/2013 _x86_64_ >>>>>>>> (16 CPU) >>>>>>>> >>>>>>>> avg-cpu: %user %nice %system %iowait %steal %idle >>>>>>>> 1.86 0.00 0.96 0.78 0.00 96.41 >>>>>>>> >>>>>>>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >>>>>>>> sda 1.23 20.26 23.53 521533196 >>>>>>>> 605566924 >>>>>>>> sdb 6.51 921.55 241.90 23717850730 >>>>>>>> >>>>>>> 6225863488 >>>> >>>>> sdc 6.22 921.83 236.41 23725181162 >>>>>>>> >>>>>>> 6084471192 >>>> >>>>> sdd 6.25 925.13 237.26 23810004970 >>>>>>>> >>>>>>> 6106357880 >>>> >>>>> sde 6.19 913.90 235.60 23521108818 >>>>>>>> >>>>>>> 6063722504 >>>> >>>>> sdh 6.26 933.08 237.77 24014594546 >>>>>>>> >>>>>>> 6119511376 >>>> >>>>> sdg 6.18 914.36 235.31 23532747378 >>>>>>>> >>>>>>> 6056257016 >>>> >>>>> sdf 6.24 923.66 235.33 23772251810 >>>>>>>> >>>>>>> 6056604008 >>>> >>>>> Some more logging which reinforce that the RS crash is happening >>>>>>>> >>>>>>> because >>>>> >>>>>> of >>>>>>>> timeout. However this time the GC log is not accusing a big time. >>>>>>>> >>>>>>>> #####RS LOG##### >>>>>>>> 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.******ClientCnxn: >>>>>>>> >>>>>>> Client >>>>> >>>>>> session timed out, have not heard from server in 257739ms for >>>>>>>> >>>>>>> sessionid >>>> >>>>> 0x13d3c4bcba6014a, closing socket connection and attempting reconnect >>>>>>>> 2013-03-10 15:37:46,712 INFO org.apache.zookeeper.******ClientCnxn: >>>>>>>> >>>>>>> Client >>>>> >>>>>> session timed out, have not heard from server in 226785ms for >>>>>>>> >>>>>>> sessionid >>>> >>>>> 0x13d3c4bcba60149, closing socket connection and attempting reconnect >>>>>>>> 2013-03-10 15:37:46,712 DEBUG org.apache.hadoop.hbase.io.****** >>>>>>>> hfile.LruBlockCache: >>>>>>>> Stats: total=61.91 MB, free=1.94 GB, max=2 GB, blocks=1254, >>>>>>>> accesses=60087, >>>>>>>> hits=58811, hitRatio=97.87%, , cachingAccesses=60069, >>>>>>>> >>>>>>> cachingHits=58811, >>>>> >>>>>> cachingHitsRatio=97.90%, , evictions=0, evicted=0, evictedPerRun=NaN >>>>>>>> 2013-03-10 15:37:46,712 WARN >>>>>>>> >>>>>>> org.apache.hadoop.hbase.util.******Sleeper: >>>> >>>>> We >>>>>>>> slept 225100ms instead of 3000ms, this is likely due to a long >>>>>>>> >>>>>>> garbage >>>> >>>>> collecting pause and it's usually bad, see >>>>>>>> http://hbase.apache.org/book.******<http://hbase.apache.org/book.****>< >>>>>>>> http://hbase.apache.org/book.**** <http://hbase.apache.org/book.**> >>>>>>>> > >>>>>>>> html#trouble.rs.runtime.******zkexpired<http://hbase.apache.**** >>>>>>>> org/book.html#trouble.rs.****runtime.zkexpired< >>>>>>>> >>>>>>> http://hbase.apache.org/book.**html#trouble.rs.runtime.**zkexpired<http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired> >>>>> > >>>>> >>>>>> 2013-03-10 15:37:46,714 WARN org.apache.hadoop.hdfs.******DFSClient: >>>>>>>> DFSOutputStream ResponseProcessor exception for block >>>>>>>> BP-43236042-172.17.2.10-******1362490844340:blk_-** >>>>>>>> 6834190810033122569_25150229 >>>>>>>> >>>>>>>> java.io.EOFException: Premature EOF: no length prefix available >>>>>>>> at org.apache.hadoop.hdfs.****** >>>>>>>> protocol.HdfsProtoUtil.** >>>>>>>> vintPrefixed(HdfsProtoUtil.******java:171) >>>>>>>> at org.apache.hadoop.hdfs.******protocol.datatransfer.** >>>>>>>> PipelineAck.readFields(******PipelineAck.java:114) >>>>>>>> at >>>>>>>> >>>>>>> org.apache.hadoop.hdfs.******DFSOutputStream$DataStreamer$****** >>>>> >>>>>> ResponseProcessor.run(******DFSOutputStream.java:670) >>>>>>>> 2013-03-10 15:37:46,716 ERROR org.apache.hadoop.hbase.** >>>>>>>> regionserver.HRegionServer: >>>>>>>> org.apache.hadoop.hbase.ipc.******CallerDisconnectedException: >>>>>>>> Aborting >>>>>>>> call >>>>>>>> get([B@7caf69ed, {"timeRange":[0,******9223372036854775807],"** >>>>>>>> totalColumns":1,"cacheBlocks":******true,"families":{"details"** >>>>>>>> :[**"** >>>>>>>> ALL"]},"maxVersions":1,"row":"******\\x00\\x00\\x00\\x00\\x00\** >>>>>>>> \*** >>>>>>>> *x12\\x93@"}), >>>>>>>> rpc version=1, client version=29, methodsFingerPrint=1891768260 from >>>>>>>> 172.17.1.71:51294 after 0 ms, since caller disconnected >>>>>>>> >>>>>>>> >>>>>>>> #####GC LOG##### >>>>>>>> 2716.635: [GC 2716.635: [ParNew: 57570K->746K(59008K), 0.0046530 >>>>>>>> >>>>>>> secs] >>>> >>>>> 354857K->300891K(1152704K), 0.0047370 secs] [Times: user=0.03 >>>>>>>> >>>>>>> sys=0.00, >>>> >>>>> real=0.00 secs] >>>>>>>> 2789.478: [GC 2789.478: [ParNew: 53226K->1192K(59008K), 0.0041370 >>>>>>>> >>>>>>> secs] >>>> >>>>> 353371K->301337K(1152704K), 0.0042220 secs] [Times: user=0.04 >>>>>>>> >>>>>>> sys=0.00, >>>> >>>>> real=0.00 secs] >>>>>>>> 2868.435: [GC 2868.435: [ParNew: 53672K->740K(59008K), 0.0041570 >>>>>>>> >>>>>>> secs] >>>> >>>>> 353817K->300886K(1152704K), 0.0042440 secs] [Times: user=0.03 >>>>>>>> >>>>>>> sys=0.00, >>>> >>>>> real=0.01 secs] >>>>>>>> 2920.309: [GC 2920.309: [ParNew: 53220K->6202K(59008K), 0.0058440 >>>>>>>> >>>>>>> secs] >>>> >>>>> 353366K->310443K(1152704K), 0.0059410 secs] [Times: user=0.05 >>>>>>>> >>>>>>> sys=0.00, >>>> >>>>> real=0.01 secs] >>>>>>>> 2984.239: [GC 2984.239: [ParNew: 58680K->5282K(59008K), 0.0048070 >>>>>>>> >>>>>>> secs] >>>> >>>>> 362921K->310977K(1152704K), 0.0049180 secs] [Times: user=0.03 >>>>>>>> >>>>>>> sys=0.00, >>>> >>>>> real=0.01 secs] >>>>>>>> 3050.620: [GC 3050.620: [ParNew: 57524K->5292K(59008K), 0.0055510 >>>>>>>> >>>>>>> secs] >>>> >>>>> 363219K->313384K(1152704K), 0.0056520 secs] [Times: user=0.04 >>>>>>>> >>>>>>> sys=0.00, >>>> >>>>> real=0.01 secs] >>>>>>>> 3111.331: [GC 3111.331: [ParNew: 57542K->5637K(59008K), 0.0056900 >>>>>>>> >>>>>>> secs] >>>> >>>>> 365634K->318079K(1152704K), 0.0057950 secs] [Times: user=0.03 >>>>>>>> >>>>>>> sys=0.00, >>>> >>>>> real=0.01 secs] >>>>>>>> 3163.510: [GC 3163.510: [ParNew: 58117K->1138K(59008K), 0.0046540 >>>>>>>> >>>>>>> secs] >>>> >>>>> 370559K->316086K(1152704K), 0.0047470 secs] [Times: user=0.03 >>>>>>>> >>>>>>> sys=0.00, >>>> >>>>> real=0.01 secs] >>>>>>>> 3217.642: [GC 3217.643: [ParNew: 53618K->1001K(59008K), 0.0038530 >>>>>>>> >>>>>>> secs] >>>> >>>>> 368566K->315949K(1152704K), 0.0039450 secs] [Times: user=0.03 >>>>>>>> >>>>>>> sys=0.00, >>>> >>>>> real=0.00 secs] >>>>>>>> 3464.067: [GC 3464.067: [ParNew: 53481K->5348K(59008K), 0.0068370 >>>>>>>> >>>>>>> secs] >>>> >>>>> 368429K->322343K(1152704K), 0.0069170 secs] [Times: user=0.07 >>>>>>>> >>>>>>> sys=0.01, >>>> >>>>> real=0.00 secs] >>>>>>>> 3465.648: [GC 3465.648: [ParNew: 57828K->1609K(59008K), 0.0111520 >>>>>>>> >>>>>>> secs] >>>> >>>>> 374823K->321496K(1152704K), 0.0112170 secs] [Times: user=0.07 >>>>>>>> >>>>>>> sys=0.00, >>>> >>>>> real=0.01 secs] >>>>>>>> 3466.291: [GC 3466.291: [ParNew: 54089K->2186K(59008K), 0.0093190 >>>>>>>> >>>>>>> secs] >>>> >>>>> 373976K->322073K(1152704K), 0.0093800 secs] [Times: user=0.06 >>>>>>>> >>>>>>> sys=0.00, >>>> >>>>> real=0.01 secs] >>>>>>>> Heap >>>>>>>> par new generation total 59008K, used 15497K >>>>>>>> [0x00000005fae00000, >>>>>>>> 0x00000005fee00000, 0x00000005fee00000) >>>>>>>> eden space 52480K, 25% used [0x00000005fae00000, >>>>>>>> >>>>>>> 0x00000005fbaff8c8, >>>>> >>>>>> 0x00000005fe140000) >>>>>>>> from space 6528K, 33% used [0x00000005fe140000, >>>>>>>> >>>>>>> 0x00000005fe362b70, >>>> >>>>> 0x00000005fe7a0000) >>>>>>>> to space 6528K, 0% used [0x00000005fe7a0000, >>>>>>>> >>>>>>> 0x00000005fe7a0000, >>>> >>>>> 0x00000005fee00000) >>>>>>>> concurrent mark-sweep generation total 1093696K, used 319887K >>>>>>>> [0x00000005fee00000, 0x0000000641a10000, 0x00000007fae00000) >>>>>>>> concurrent-mark-sweep perm gen total 30464K, used 30427K >>>>>>>> [0x00000007fae00000, 0x00000007fcbc0000, 0x0000000800000000) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 03/08/2013 07:02 PM, Stack wrote: >>>>>>>> >>>>>>>> On Fri, Mar 8, 2013 at 10:58 AM, Pablo Musa <[email protected]> >>>>>>>> >>>>>>> wrote: >>>> >>>>> 0.94 currently doesn't support hadoop 2.0 >>>>>>>>> >>>>>>>>> Can you deploy hadoop 1.1.1 instead ? >>>>>>>>>> >>>>>>>>>>> I am using cdh4.2.0 which uses this version as default >>>>>>>>>>> >>>>>>>>>> installation. >>>>> >>>>>> I think it will be a problem for me to deploy 1.1.1 because I would >>>>>>>>>> need >>>>>>>>>> to >>>>>>>>>> "upgrade" the whole cluster with 70TB of data (backup everything, >>>>>>>>>> >>>>>>>>> go >>>> >>>>> offline, etc.). >>>>>>>>>> >>>>>>>>>> Is there a problem to use cdh4.2.0? >>>>>>>>>> I should send my email to cdh list? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> That combo should be fine. >>>>>>>>>> >>>>>>>>>> You Full GC'ing around this time? >>>>>>>>> >>>>>>>>> The GC shows it took a long time. However it does not make any >>>>>>>>>> >>>>>>>>> sense >>>> >>>>> to be it, since the same ammount of data was cleaned before and >>>>>>>>>> >>>>>>>>> AFTER >>>> >>>>> in just 0.01 secs! >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If JVM is full GC'ing, the application is stopped. >>>>>>>>>> >>>>>>>>>> [Times: user=0.08 sys=137.62, real=137.62 secs] >>>>>>>>> >>>>>>>>> Besides the whole time was used by system. That is what is bugging >>>>>>>>>> >>>>>>>>> me. >>>>> >>>>>> The below does not look like a full GC but that is a long pause >>>>>>>>>> >>>>>>>>> in >>>> >>>>> system >>>>>>>>> time, enough to kill your zk session. >>>>>>>>> >>>>>>>>> You swapping? >>>>>>>>> >>>>>>>>> Hardware is good? >>>>>>>>> >>>>>>>>> St.Ack >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ... >>>>>>>>> >>>>>>>>> 1044.081: [GC 1044.081: [ParNew: 58970K->402K(59008K), 0.0040990 >>>>>>>>>> >>>>>>>>> secs] >>>>> >>>>>> 275097K->216577K(1152704K), 0.0041820 secs] [Times: user=0.03 >>>>>>>>>> >>>>>>>>> sys=0.00, >>>>> >>>>>> real=0.01 secs] >>>>>>>>>> >>>>>>>>>> 1087.319: [GC 1087.319: [ParNew: 52873K->6528K(59008K), 0.0055000 >>>>>>>>>> >>>>>>>>> secs] >>>>> >>>>>> 269048K->223592K(1152704K), 0.0055930 secs] [Times: user=0.04 >>>>>>>>>> >>>>>>>>> sys=0.01, >>>>> >>>>>> real=0.00 secs] >>>>>>>>>> >>>>>>>>>> 1087.834: [GC 1087.834: [ParNew: 59008K->6527K(59008K), >>>>>>>>>> 137.6353620 >>>>>>>>>> secs] 276072K->235097K(1152704K), 137.6354700 secs] [Times: >>>>>>>>>> >>>>>>>>> user=0.08 >>>> >>>>> sys=137.62, real=137.62 secs] >>>>>>>>>> >>>>>>>>>> 1226.638: [GC 1226.638: [ParNew: 59007K->1897K(59008K), 0.0079960 >>>>>>>>>> >>>>>>>>> secs] >>>>> >>>>>> 287577K->230937K(1152704K), 0.0080770 secs] [Times: user=0.05 >>>>>>>>>> >>>>>>>>> sys=0.00, >>>>> >>>>>> real=0.01 secs] >>>>>>>>>> >>>>>>>>>> 1227.251: [GC 1227.251: [ParNew: 54377K->2379K(59008K), 0.0095650 >>>>>>>>>> >>>>>>>>> secs] >>>>> >>>>>> 283417K->231420K(1152704K), 0.0096340 secs] [Times: user=0.06 >>>>>>>>>> >>>>>>>>> sys=0.00, >>>>> >>>>>> real=0.01 secs] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I really appreciate you guys helping me to find out what is wrong. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Pablo >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 03/08/2013 02:11 PM, Stack wrote: >>>>>>>>>> >>>>>>>>>> What RAM says. >>>>>>>>>> >>>>>>>>>> 2013-03-07 17:24:57,887 INFO >>>>>>>>>>> >>>>>>>>>> org.apache.zookeeper.**********ClientCnxn: >>>>> >>>>>> Client >>>>>>>>>>> >>>>>>>>>>> session timed out, have not heard from server in 159348ms for >>>>>>>>>>> sessionid >>>>>>>>>>> 0x13d3c4bcba600a7, closing socket connection and attempting >>>>>>>>>>> >>>>>>>>>> reconnect >>>>> >>>>>> You Full GC'ing around this time? >>>>>>>>>>> >>>>>>>>>>> Put up your configs in a place where we can take a look? >>>>>>>>>>> >>>>>>>>>>> St.Ack >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Mar 8, 2013 at 8:32 AM, ramkrishna vasudevan < >>>>>>>>>>> ramkrishna.s.vasudevan@gmail.********com >>>>>>>>>>> >>>>>>>>>> <ramkrishna.s.vasudevan@gmail. >>>>> >>>>>> **** >>>>>>>>>>> com <ramkrishna.s.vasudevan@gmail.****com< >>>>>>>>>>> >>>>>>>>>> ramkrishna.s.vasudevan@gmail.**com<[email protected]> >>>>> > >>>>> >>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> I think it is with your GC config. What is your heap size? >>>>>>>>>>> >>>>>>>>>> What >>>> >>>>> is >>>>>>>>>>> the >>>>>>>>>>> >>>>>>>>>>> data that you pump in and how much is the block cache size? >>>>>>>>>>> >>>>>>>>>>>> Regards >>>>>>>>>>>> Ram >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Mar 8, 2013 at 9:31 PM, Ted Yu <[email protected]> >>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>> >>>>>> 0.94 currently doesn't support hadoop 2.0 >>>>>>>>>>>> >>>>>>>>>>>> Can you deploy hadoop 1.1.1 instead ? >>>>>>>>>>>> >>>>>>>>>>>>> Are you using 0.94.5 ? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Mar 8, 2013 at 7:44 AM, Pablo Musa <[email protected]> >>>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>> >>>>>> Hey guys, >>>>>>>>>>>>> >>>>>>>>>>>>> as I sent in an email a long time ago, the RSs in my cluster >>>>>>>>>>>>> >>>>>>>>>>>> did >>>> >>>>> not >>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>> >>>>>>>>>>>>>> along >>>>>>>>>>>>> >>>>>>>>>>>>> and crashed 3 times a day. I tried a lot of options we >>>>>>>>>>>>> >>>>>>>>>>>> discussed >>>> >>>>> in >>>>> >>>>>> the >>>>>>>>>>>>>> emails, but it not solved the problem. As I used an old >>>>>>>>>>>>>> version >>>>>>>>>>>>>> >>>>>>>>>>>>> of >>>>> >>>>>> hadoop I >>>>>>>>>>>>>> >>>>>>>>>>>>>> thought this was the problem. >>>>>>>>>>>>> >>>>>>>>>>>>> So, I upgraded from hadoop 0.20 - hbase 0.90 - zookeeper 3.3.5 >>>>>>>>>>>>>> >>>>>>>>>>>>> to >>>> >>>>> hadoop >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2.0.0 >>>>>>>>>>>>> >>>>>>>>>>>>> - hbase 0.94 - zookeeper 3.4.5. >>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately the RSs did not stop crashing, and worst! Now >>>>>>>>>>>>>> >>>>>>>>>>>>> they >>>> >>>>> crash >>>>>>>>>>>>>> every >>>>>>>>>>>>>> hour and some times when the RS that holds the .ROOT. crashes >>>>>>>>>>>>>> >>>>>>>>>>>>> all >>>> >>>>> cluster >>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>> >>>>>>>>>>>>> stuck in transition and everything stops working. >>>>>>>>>>>>> >>>>>>>>>>>>>> In this case I need to clean zookeeper znodes, restart the >>>>>>>>>>>>>> >>>>>>>>>>>>> master >>>> >>>>> and >>>>>>>>>>>>>> >>>>>>>>>>>>>> the >>>>>>>>>>>>>> >>>>>>>>>>>>>> RSs. >>>>>>>>>>>>> >>>>>>>>>>>>> To avoid this case I am running on production with only ONE >>>>>>>>>>>>> RS >>>>>>>>>>>>> >>>>>>>>>>>> and >>>>> >>>>>> a >>>>>>>>>>>>>> monitoring >>>>>>>>>>>>>> script that check every minute, if the RS is ok. If not, >>>>>>>>>>>>>> >>>>>>>>>>>>> restart >>>> >>>>> it. >>>>>>>>>>>>>> * This case does not get the cluster stuck. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is driving me crazy, but I really cant find a solution >>>>>>>>>>>>>> for >>>>>>>>>>>>>> >>>>>>>>>>>>> the >>>>> >>>>>> cluster. >>>>>>>>>>>>>> I tracked all logs from the start time 16:49 from all >>>>>>>>>>>>>> >>>>>>>>>>>>> interesting >>>> >>>>> nodes >>>>>>>>>>>>>> (zoo, >>>>>>>>>>>>>> namenode, master, rs, dn2, dn9, dn10) and copied here what I >>>>>>>>>>>>>> >>>>>>>>>>>>> think >>>>> >>>>>> is >>>>>>>>>>>>>> usefull. >>>>>>>>>>>>>> >>>>>>>>>>>>>> There are some strange errors in the DATANODE2, as an error >>>>>>>>>>>>>> copiyng a >>>>>>>>>>>>>> >>>>>>>>>>>>>> block >>>>>>>>>>>>>> >>>>>>>>>>>>>> to itself. >>>>>>>>>>>>> >>>>>>>>>>>>> The gc log points to GC timeout. However it is very weird that >>>>>>>>>>>>>> >>>>>>>>>>>>> the >>>>> >>>>>> RS >>>>>>>>>>>>>> >>>>>>>>>>>>>> spend >>>>>>>>>>>>>> >>>>>>>>>>>>>> so much time in GC while in the other cases it takes >>>>>>>>>>>>> 0.001sec. >>>>>>>>>>>>> >>>>>>>>>>>>> Besides, >>>>>>>>>>>>>> the time >>>>>>>>>>>>>> spent, is in sys which makes me think that might be a problem >>>>>>>>>>>>>> >>>>>>>>>>>>> in >>>> >>>>> another >>>>>>>>>>>>>> >>>>>>>>>>>>>> place. >>>>>>>>>>>>> >>>>>>>>>>>>> I know that it is a bunch of logs, and that it is very >>>>>>>>>>>>> >>>>>>>>>>>> difficult >>>> >>>>> to >>>>> >>>>>> find >>>>>>>>>>>>>> >>>>>>>>>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>>> problem without much context. But I REALLY need some help. >>>>>>>>>>>>> If >>>>>>>>>>>>> >>>>>>>>>>>> it >>>> >>>>> is >>>>> >>>>>> not >>>>>>>>>>>>>> >>>>>>>>>>>>>> the >>>>>>>>>>>>>> >>>>>>>>>>>>>> solution, at least what I should read, where I should >>>>>>>>>>>>> look, or >>>>>>>>>>>>> which >>>>>>>>>>>>> >>>>>>>>>>>>> cases >>>>>>>>>>>>>> >>>>>>>>>>>>>> I >>>>>>>>>>>>> >>>>>>>>>>>>> should monitor. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you very much, >>>>>>>>>>>>>> Pablo Musa >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>> *Regards,* >>>>>>> --- *Sreepathi * >>>>>>> >>>>>>> >>> -- >>> Best regards, >>> >>> - Andy >>> >>> Problems worthy of attack prove their worth by hitting back. - Piet Hein >>> (via Tom White) >>> >> >
