You can add the following to JVM parameters: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Cheers On Fri, May 16, 2014 at 4:32 AM, sunweiwei <[email protected]>wrote: > HI > Sorry, I just saw this mail. I set Gc parameters like this: > > export HBASE_REGIONSERVER_OPTS="-Xmn512m > -XX:CMSInitiatingOccupancyFraction=70 -Xms16384m -Xmx16384m -verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCDateStamps " > > > -----邮件原件----- > 发件人: Ted Yu [mailto:[email protected]] > 发送时间: 2014年5月11日 19:11 > 收件人: [email protected] > 抄送: <[email protected]> > 主题: Re: 答复: 答复: meta server hungs ? > > What GC parameters did you specify for JVM ? > > Thanks > > On May 7, 2014, at 6:27 PM, "sunweiwei" <[email protected]> > wrote: > > > I find lots of these in gc.log. It seems like CMS gc run many times > but old Generation is always large. > > I'm confused. > > Any suggestion will be appreciated. Thanks. > > > > 2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start] > > 2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew: > 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K), > 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs] > > 2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew: > 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K), > 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs] > > 2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep: > 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs] > > > > 2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start] > > 2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep: > 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs] > > > > 2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start] > > 2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep: > 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs] > > > > -----邮件原件----- > > 发件人: sunweiwei [mailto:[email protected]] > > 发送时间: 2014年5月6日 9:27 > > 收件人: [email protected] > > 主题: 答复: 答复: meta server hungs ? > > > > HI Samir > > I think master declared hadoop77/192.168.1.87:60020 as dead server, > because of "Failed verification of hbase:meta,,1 at > address=hadoop77,60020,1396606457005 > exception=java.net.SocketTimeoutException". > > I have paste the master log in the first mail. > > > > I'm not sure, here is the whole process: > > at 2014-04-29 13:53:57,271 client throw a SocketTimeoutException : > Call to hadoop77/192.168.1.87:60020failed because > java.net.SocketTimeoutException: 60000 millis timeout and other clients > hung. > > at 2014-04-29 15:30:** I visit hbase web and found hmaster > hung , then i stop it and start a new hmaster. > > at 2014-04-29 15:32:21,530 the new hmaster logs "Failed > verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005, > exception=java.net.SocketTimeoutException: > > Call to hadoop77/192.168.1.87:60020failed > > because java.net.SocketTimeoutException" > > at 2014-04-29 15:32:28,364 the meta server received hmaster's > message and shutdown itself. > > > > after these, clients come back to normal > > > > -----邮件原件----- > > 发件人: Samir Ahmic [mailto:[email protected]] > > 发送时间: 2014年5月5日 19:25 > > 收件人: [email protected] > > 主题: Re: 答复: meta server hungs ? > > > > There should be exception in regionserver log on hadoop77/ > > 192.168.1.87:60020 above this one: > > > > ********* > > 2014-04-29 15:32:28,364 FATAL [regionserver60020] > > regionserver.HRegionServer: ABORTING region server > > hadoop77,60020,1396606457005: > org.apache.hadoop.hbase.YouAreDeadException: > > Server REPORT rejected; currently processing hadoop77,60020,1396606457005 > > as dead server > > at org.apache.hadoop.hbase.master.ServerManager. > > checkIsDead(ServerManager.java:339) > > ********* > > > > Can you find it and past it. That exception should explain why > > master declared hadoop77/192.168.1.87:60020 as dead server. > > > > Regards > > Samir > > > > > > On Mon, May 5, 2014 at 11:39 AM, sunweiwei <[email protected] > >wrote: > > > >> And this is client log. > >> > >> 2014-04-29 13:53:57,271 WARN [main] > >> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already > >> closed > >> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed > because java.net.SocketTimeoutException: 60000 millis timeout while > >> waiting for channel to be ready for read. ch : > >> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473 > remote=hadoop77/ > >> 192.168.1.87:60020] > >> at > >> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475) > >> at > org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450) > >> at > >> > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650) > >> at > >> > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708) > >> at > >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332) > >> at > >> > org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284) > >> at > >> > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152) > >> at > >> > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57) > >> at > >> > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116) > >> at > >> > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94) > >> at > >> > org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462) > >> at > >> > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187) > >> at > >> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095) > >> at > >> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155) > >> at > >> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047) > >> at > >> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004) > >> at > >> > org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330) > >> at > >> > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281) > >> at > >> > org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917) > >> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901) > >> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863) > >> > >> -----邮件原件----- > >> 发件人: sunweiwei [mailto:[email protected]] > >> 发送时间: 2014年5月5日 17:23 > >> 收件人: [email protected] > >> 主题: 答复: meta server hungs ? > >> > >> Thank you for reply. > >> I find this logs in hadoop77/192.168.1.87. It seems like meta > >> regionserver receive hmaster's message and shutdown itself. > >> 2014-04-29 15:32:28,364 FATAL [regionserver60020] > >> regionserver.HRegionServer: ABORTING region server > >> hadoop77,60020,1396606457005: > org.apache.hadoop.hbase.YouAreDeadException: > >> Server REPORT rejected; currently processing > hadoop77,60020,1396606457005 > >> as dead server > >> at > >> > org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339) > >> > >> > >> and this is gc log: > >> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew: > >> 449091K->52416K(471872K), 0.0411300 secs] > 11582287K->11199419K(16724800K), > >> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs] > >> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew: > >> 471859K->19313K(471872K), 0.0222250 secs] > 11618863K->11175232K(16724800K), > >> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs] > >> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew: > >> 438769K->38887K(471872K), 0.0242330 secs] > 11594688K->11194807K(16724800K), > >> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs] > >> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew: > >> 458343K->18757K(471872K), 0.0242790 secs] > 11614263K->11180844K(16724800K), > >> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs] > >> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew: > >> 438213K->4874K(471872K), 0.0221520 secs] > 11600300K->11166960K(16724800K), > >> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs] > >> Heap > >> par new generation total 471872K, used 335578K [0x00000003fae00000, > >> 0x000000041ae00000, 0x000000041ae00000) > >> eden space 419456K, 78% used [0x00000003fae00000, 0x000000040f0f41c8, > >> 0x00000004147a0000) > >> from space 52416K, 9% used [0x0000000417ad0000, 0x0000000417f928e0, > >> 0x000000041ae00000) > >> to space 52416K, 0% used [0x00000004147a0000, 0x00000004147a0000, > >> 0x0000000417ad0000) > >> concurrent mark-sweep generation total 16252928K, used 11162086K > >> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000) > >> concurrent-mark-sweep perm gen total 81072K, used 48660K > >> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000) > >> > >> > >> > >> -----邮件原件----- > >> 发件人: Samir Ahmic [mailto:[email protected]] > >> 发送时间: 2014年5月5日 16:50 > >> 收件人: [email protected] > >> 抄送: sunweiwei > >> 主题: Re: meta server hungs ? > >> > >> Hi, > >> This exception: > >> **** > >> exception=java.net.SocketTimeoutException: Call to > >> hadoop77/192.168.1.87:60020 failed because > >> java.net.SocketTimeoutException: > >> 60000 millis timeout while waiting for channel to be ready for read. ch > : > >> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117 > >> remote=hadoop77/192.168.1.87:60020] > >> ***** > >> shows that there is connection timeout between master server and > >> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' > table. > >> Real question is what is causing this timeout? In my experience it can > be > >> by few things causing this type of timeout. I would suggest that you > check > >> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection, > >> memory, network, CPU disks and i'm sure you will find cause of timeout. > >> You can us some diagnostic tools like vmstat, sar, iostat to check your > >> sistem and you can use jstat to check GC and some other JVM stuff. > >> > >> Regards > >> Samir > >> > >> > >> > >> > >> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <[email protected] > >>> wrote: > >> > >>> Hi > >>> > >>> I'm using hbase0.96.0. > >>> > >>> I found client can't put data suddenly and hmaster hungs. Then I > >> shutdown > >>> the hmaster and start a new hmaster, then the client back to normal. > >>> > >>> > >>> > >>> I found this logs in the new hmaster . It seem like meta server hungs > and > >>> hmaster stop the meta server. > >>> > >>> 2014-04-29 15:32:21,530 INFO [master:hadoop1:60000] > >>> catalog.CatalogTracker: > >>> Failed verification of hbase:meta,,1 at > >>> address=hadoop77,60020,1396606457005, > >>> exception=java.net.SocketTimeoutException: Call to > >>> hadoop77/192.168.1.87:60020 failed because > >>> java.net.SocketTimeoutException: > >>> 60000 millis timeout while waiting for channel to be ready for read. > ch : > >>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117 > >>> remote=hadoop77/192.168.1.87:60020] > >>> > >>> 2014-04-29 15:32:21,532 INFO [master:hadoop1:60000] master.HMaster: > >>> Forcing > >>> expire of hadoop77,60020,1396606457005 > >>> > >>> > >>> > >>> I can't find why meta server hungs .I found this in meta server log > >>> > >>> 2014-04-29 13:53:55,637 INFO [regionserver60020.leaseChecker] > >>> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired > on > >>> region hbase:meta,,1.1588230740 > >>> > >>> 2014-04-29 13:53:56,632 INFO [regionserver60020.leaseChecker] > >>> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired > on > >>> region hbase:meta,,1.1588230740 > >>> > >>> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker] > >>> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on > >>> region hbase:meta,,1.1588230740 > >>> > >>> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker] > >>> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired > on > >>> region hbase:meta,,1.1588230740 > >>> > >>> > >>> > >>> > >>> > >>> any suggestion will be appreciated. Thanks. > > > >
