Hey do that, things go boom. :-) Before you do that I would suggest running top and seeing if there is any swapping occurring.
Sent from my iPhone On Mar 8, 2012, at 4:29 PM, "Jean-Daniel Cryans" <[email protected]> wrote: > When real cpu is bigger than user cpu it very often points to > swapping. Even if you think you turned that off or that there's no > possible way you could be swapping, check it again. > > I could also be that your CPUs were busy doing something else, I've > seen crazy context switching CPUs freezing up my nodes, but in my > experience it's not very likely. > > Setting swappiness to 0 just means it's not going to page anything out > until it really needs to do it, meaning it's possible to swap. The > only way to guarantee no swapping whatsoever is giving your system 0 > swap space. > > Regarding that promotion failure, you could try reducing the eden > size. Try -Xmn128m > > J-D > > On Sat, Mar 3, 2012 at 5:05 AM, Ferdy Galema <[email protected]> wrote: >> Hi, >> >> I'm running regionservers with 2GB heap and following tuning options: >> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:NewRatio=16 >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly >> -XX:MaxGCPauseMillis=100 >> >> A regionserver aborted (YouAreDeadException) and this was printed in the gc >> logs (all is shown up until the abort) >> >> 211663.516: [GC 211663.516: [ParNew: 118715K->13184K(118912K), 0.0445390 >> secs] 1373940K->1289814K(2233472K), 0.0446420 secs] [Times: user=0.14 >> sys=0.01, real=0.05 secs] >> 211663.686: [GC 211663.686: [ParNew: 118912K->13184K(118912K), 0.0594280 >> secs] 1395542K->1310185K(2233472K), 0.0595420 secs] [Times: user=0.15 >> sys=0.00, real=0.06 secs] >> 211663.869: [GC 211663.869: [ParNew: 118790K->13184K(118912K), 0.0434820 >> secs] 1415792K->1331317K(2233472K), 0.0435930 secs] [Times: user=0.13 >> sys=0.01, real=0.04 secs] >> 211667.598: [GC 211667.598: [ParNew (promotion failed): >> 118912K->118912K(118912K), 0.0225390 secs]211667.621: [CMS: >> 1330845K->1127914K(2114560K), 51.3610670 secs] >> 1437045K->1127914K(2233472K), [CMS Perm : 20680K->20622K(34504K)], >> 51.3838170 secs] [Times: user=1.82 sys=0.31, real=51.38 secs] >> 211719.713: [GC 211719.714: [ParNew: 105723K->13184K(118912K), 0.0176130 >> secs] 1233638K->1149393K(2233472K), 0.0177230 secs] [Times: user=0.07 >> sys=0.00, real=0.02 secs] >> 211719.851: [GC 211719.852: [ParNew: 118912K->13184K(118912K), 0.0281860 >> secs] 1255121K->1170269K(2233472K), 0.0282970 secs] [Times: user=0.10 >> sys=0.01, real=0.03 secs] >> 211719.993: [GC 211719.993: [ParNew: 118795K->13184K(118912K), 0.0276320 >> secs] 1275880K->1191268K(2233472K), 0.0277350 secs] [Times: user=0.09 >> sys=0.00, real=0.03 secs] >> 211720.490: [GC 211720.490: [ParNew: 118912K->13184K(118912K), 0.0624650 >> secs] 1296996K->1210640K(2233472K), 0.0625560 secs] [Times: user=0.15 >> sys=0.00, real=0.06 secs] >> 211720.687: [GC 211720.687: [ParNew: 118702K->13184K(118912K), 0.1651750 >> secs] 1316159K->1231993K(2233472K), 0.1652660 secs] [Times: user=0.25 >> sys=0.01, real=0.17 secs] >> 211721.038: [GC 211721.038: [ParNew: 118912K->13184K(118912K), 0.0952750 >> secs] 1337721K->1252598K(2233472K), 0.0953660 secs] [Times: user=0.15 >> sys=0.00, real=0.09 secs] >> Heap >> par new generation total 118912K, used 86199K [0x00002aaaae1f0000, >> 0x00002aaab62f0000, 0x00002aaab62f0000) >> eden space 105728K, 69% used [0x00002aaaae1f0000, 0x00002aaab293dfa8, >> 0x00002aaab4930000) >> from space 13184K, 100% used [0x00002aaab4930000, 0x00002aaab5610000, >> 0x00002aaab5610000) >> to space 13184K, 0% used [0x00002aaab5610000, 0x00002aaab5610000, >> 0x00002aaab62f0000) >> concurrent mark-sweep generation total 2114560K, used 1239414K >> [0x00002aaab62f0000, 0x00002aab373f0000, 0x00002aab373f0000) >> concurrent-mark-sweep perm gen total 34504K, used 20728K >> [0x00002aab373f0000, 0x00002aab395a2000, 0x00002aab3c7f0000) >> >> >> Why did a GC took 51 seconds? The machine still had enough memory available >> so it could not be swapping. (swapiness is set to 0). From the 15 >> regionservers in total, I often see this specific regionserver fail. What >> do you recommended in this situation? >> >> Ferdy.
