Did you check "http://hbase.apache.org/book.html#perf.os.swap"?
-----邮件原件----- 发件人: Pablo Musa [mailto:[email protected]] 发送时间: 2012年7月6日 5:38 收件人: [email protected] 主题: RE: Hmaster and HRegionServer disappearance reason to ask I am having the same problem. I tried N different things but I cannot solve the problem. hadoop-0.20.noarch 0.20.2+923.256-1 hadoop-hbase.noarch 0.90.6+84.29-1 hadoop-zookeeper.noarch 3.3.5+19.1-1 I already set: <property> <name>hbase.hregion.memstore.mslab.enabled</name> <value>true</value> </property> <property> <name>hbase.regionserver.handler.count</name> <value>20</value> </property> But it does not seem to work. How can I check if this variables are really set in the HRegionServer? I am starting the server with the following: -Xmx8192m -XX:NewSize=64m -XX:MaxNewSize=64m -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps I am also having trouble to read reagionserver.out [GC 72004.406: [ParNew: 55830K->2763K(59008K), 0.0043820 secs] 886340K->835446K(1408788K) icms_dc=0 , 0.0044900 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] [GC 72166.759: [ParNew: 55192K->6528K(59008K), 135.1102750 secs] 887876K->839688K(1408788K) icms_dc=0 , 135.1103920 secs] [Times: user=1045.58 sys=138.11, real=135.09 secs] [GC 72552.616: [ParNew: 58977K->6528K(59008K), 0.0083040 secs] 892138K->847415K(1408788K) icms_dc=0 , 0.0084060 secs] [Times: user=0.05 sys=0.01, real=0.01 secs] [GC 72882.991: [ParNew: 58979K->6528K(59008K), 151.4924490 secs] 899866K->853931K(1408788K) icms_dc=0 , 151.4925690 secs] [Times: user=0.07 sys=151.48, real=151.47 secs] What does each part means? Each line is a GC cicle? Thanks, Pablo -----Original Message----- From: Lars George [mailto:[email protected]] Sent: segunda-feira, 2 de julho de 2012 06:43 To: [email protected] Subject: Re: Hmaster and HRegionServer disappearance reason to ask Hi lztaomin, > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired indicates that you have experienced the "Juliet Pause" issue, which means you ran into a JVM garbage collection that lasted longer than the configured ZooKeeper timeout threshold. If you search for it on Google http://www.google.com/search?q=juliet+pause+hbase you will find quite a few pages explaining the problem, and what you can do to avoid this. Lars On Jul 2, 2012, at 10:30 AM, lztaomin wrote: > HI ALL > > My HBase group a total of 3 machine, Hadoop HBase mounted in the same > machine, zookeeper using HBase own. Operation 3 months after the reported > abnormal as follows. Cause hmaster and HRegionServer processes are gone. > Please help me. > Thanks > > The following is a log > > ABORTING region server serverName=datanode1,60020,1325326435553, > load=(requests=332, regions=188, usedHeap=2741, maxHeap=8165): > regionserver:60020-0x3488dec38a02b1 > regionserver:60020-0x3488dec38a02b1 received expired from ZooKeeper, > aborting > Cause: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(Zoo > KeeperWatcher.java:343) at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWa > tcher.java:261) at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.ja > va:530) at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) > 2012-07-01 13:45:38,707 INFO > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: > Splitting logs for datanode1,60020,1325326435553 > 2012-07-01 13:45:38,756 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 32 > hlog(s) in > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553 > 2012-07-01 13:45:38,764 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog > 1 of 32: > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod > e1%3A60020.1341006689352, length=5671397 > 2012-07-01 13:45:38,764 INFO org.apache.hadoop.hbase.util.FSUtils: > Recovering file > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod > e1%3A60020.1341006689352 > 2012-07-01 13:45:39,766 INFO org.apache.hadoop.hbase.util.FSUtils: > Finished lease recover attempt for > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod > e1%3A60020.1341006689352 > 2012-07-01 13:45:39,880 INFO > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using > syncFs -- HDFS-200 > 2012-07-01 13:45:39,925 INFO > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using > syncFs -- HDFS-200 > > ABORTING region server serverName=datanode2,60020,1325146199444, > load=(requests=614, regions=189, usedHeap=3662, maxHeap=8165): > regionserver:60020-0x3488dec38a0002 > regionserver:60020-0x3488dec38a0002 received expired from ZooKeeper, > aborting > Cause: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(Zoo > KeeperWatcher.java:343) at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWa > tcher.java:261) at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.ja > va:530) at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) > 2012-07-01 13:24:10,308 INFO org.apache.hadoop.hbase.util.FSUtils: > Finished lease recover attempt for > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod > e1%3A60020.1341075090535 > 2012-07-01 13:24:10,918 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog > 21 of 32: > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod > e1%3A60020.1341078690560, length=11778108 > 2012-07-01 13:24:29,809 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path > hdfs://namenode:9000/hbase/t_speakfor_relation_chapter/ffd2057b46da227 > e078c82ff43f0f9f2/recovered.edits/0000000000660951991 (wrote 8178 > edits in 403ms) > 2012-07-01 13:24:29,809 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file > splitting completed in -1268935 ms for > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553 > 2012-07-01 13:24:29,824 INFO > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Received > exception accessing META during server shutdown of > datanode1,60020,1325326435553, retrying META read > org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not > running, aborting at > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionSe > rver.java:2408) at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegi > onServer.java:1649) at > sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess > orImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1 > 039) > > > > lztaomin
