Hi, Based on you logs: 2015-09-01 15:35:58,047 INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine *(eg GC): pause of approximately 4954ms*
you had long running GC which cause timeout in communication between regionserver and zookeeper: 2015-09-01 15:36:04,970 INFO [main-EventThread] zookeeper.RegionServerTracker: *RegionServer ephemeral node deleted, processing expiration [l-hbase4.dba.cn1.qunar.com <http://l-hbase4.dba.cn1.qunar.com>,* *60020,1440573682913]* that have cause to rs shutdown. Check your GC configuration in hbase-env.sh, also check load that you are generating on your cluster. Regards Samir On Tue, Sep 1, 2015 at 1:59 PM, 聪聪 <[email protected]> wrote: > hi,all: > > > I use the HBase version is hbase-0.96.0.This afternoon(2015-09-01 15:35),I > met a problem.One of regionservers shutdown,I don't know why.Can we get > some help over here? > > > > Regionserver on the log is as follows: > 2015-09-01 15:35:45,476 DEBUG > [regionserver60020-smallCompactions-1440573854394] backup.HFileArchiver: > Finished archiving from class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs://mycluster:8020/hbase/airfare/data/atp/atp_fare/c0eb67f08c2e3818f2e52812ec69b71c/i/ff73fb07c62044c082f6b4f92e0ed7ca, > to > hdfs://mycluster:8020/hbase/airfare/archive/data/atp/atp_fare/c0eb67f08c2e3818f2e52812ec69b71c/i/ff73fb07c62044c082f6b4f92e0ed7ca > 2015-09-01 15:35:45,476 INFO > [regionserver60020-smallCompactions-1440573854394] regionserver.HStore: > Completed compaction of 3 file(s) in i of > atp:atp_fare,I,1440507763404.c0eb67f08c2e3818f2e52812ec69b71c. into > fb746308d219490d81e5f8f1dd8b60f1(size=56.9 M), total size for store is 1.8 > G. This selection was in queue for 0sec, and took 4sec to execute. > 2015-09-01 15:35:45,476 INFO > [regionserver60020-smallCompactions-1440573854394] > regionserver.CompactSplitThread: Completed compaction: Request = > regionName=atp:atp_fare,I,1440507763404.c0eb67f08c2e3818f2e52812ec69b71c., > storeName=i, fileCount=3, fileSize=56.9 M, priority=24, > time=29115372534800314; duration=4sec > 2015-09-01 15:35:45,476 DEBUG > [regionserver60020-smallCompactions-1440573854394] > regionserver.CompactSplitThread: CompactSplitThread Status: > compaction_queue=(0:0), split_queue=0, merge_queue=0 > 2015-09-01 15:35:58,047 INFO [JvmPauseMonitor] util.JvmPauseMonitor: > Detected pause in JVM or host machine (eg GC): pause of approximately 4954ms > GC pool 'G1 Young Generation' had collection(s): count=1 time=224ms > GC pool 'G1 Old Generation' had collection(s): count=1 time=5077ms > 2015-09-01 15:36:04,883 INFO [main] zookeeper.ZooKeeper: Client > environment:zookeeper.version=3.4.5-cdh5.2.0--1, built on 10/11/2014 20:49 > GMT > 2015-09-01 15:36:04,883 INFO [main] zookeeper.ZooKeeper: Client > environment:host.name=l-hbase4.dba.cn1.qunar.com > 2015-09-01 15:36:04,884 INFO [main] zookeeper.ZooKeeper: Client > environment:java.version=1.7.0_45 > 2015-09-01 15:36:04,884 INFO [main] zookeeper.ZooKeeper: Client > environment:java.vendor=Oracle Corporation > 2015-09-01 15:36:04,884 INFO [main] zookeeper.ZooKeeper: Client > environment:java.home=/home/q/java/jdk1.7.0_45/jre > > > > > > Master on the log is as follows: > 015-09-01 15:32:40,918 DEBUG [master:l-namenode1:60000.oldLogCleaner] > master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: > l-hbase2.dba.cn1.qunar.com%2C60020%2C1440559908245.1441089788639 > 2015-09-01 15:32:40,920 DEBUG [master:l-namenode1:60000.oldLogCleaner] > master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: > l-hbase2.dba.cn1.qunar.com%2C60020%2C1440559908245.1441089839980 > 2015-09-01 15:32:40,963 DEBUG [master:l-namenode1:60000.oldLogCleaner] > master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: > l-hbase2.dba.cn1.qunar.com%2C60020%2C1440559908245.1441089890476 > 2015-09-01 15:32:40,966 DEBUG [master:l-namenode1:60000.oldLogCleaner] > master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: > l-hbase2.dba.cn1.qunar.com%2C60020%2C1440559908245.1441089939050 > 2015-09-01 15:35:40,697 DEBUG [master:l-namenode1:60000.oldLogCleaner] > master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: > l-hbase5.dba.cn1.qunar.com%2C60020%2C1440574191618.1441088646120 > 2015-09-01 15:36:04,970 INFO [main-EventThread] > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, > processing expiration [l-hbase4.dba.cn1.qunar.com,60020,1440573682913] > 2015-09-01 15:36:04,973 DEBUG [main-EventThread] master.AssignmentManager: > based on AM, current region=hbase:meta,,1.1588230740 is on server= > l-hbase3.dba.cn1.qunar.com,60020,1440559933207 server being checked: > l-hbase4.dba.cn1.qunar.com,60020,1440573682913 > 2015-09-01 15:36:04,973 DEBUG [main-EventThread] master.ServerManager: > Added=l-hbase4.dba.cn1.qunar.com,60020,1440573682913 to dead servers, > submitted shutdown handler to be executed meta=false > 2015-09-01 15:36:04,976 DEBUG [main-EventThread] > zookeeper.RegionServerTracker: RS node: /hbase/airfare/rs/ > l-hbase5.dba.cn1.qunar.com,60020,1440574191618 data: PBUF^H��^C > 2015-09-01 15:36:04,976 DEBUG [main-EventThread] > zookeeper.RegionServerTracker: RS node: /hbase/airfare/rs/ > l-hbase3.dba.cn1.qunar.com,60020,1440559933207 data: PBUF^H��^C > 2015-09-01 15:36:04,976 DEBUG [main-EventThread] > zookeeper.RegionServerTracker: RS node: /hbase/airfare/rs/ > l-hbase1.dba.cn1.qunar.com,60020,1440573706827 data: PBUF^H��^C > 2015-09-01 15:36:04,977 DEBUG [main-EventThread] > zookeeper.RegionServerTracker: RS node: /hbase/airfare/rs/ > l-hbase2.dba.cn1.qunar.com,60020,1440559908245 data: PBUF^H��^C > 2015-09-01 15:36:05,045 INFO > [MASTER_SERVER_OPERATIONS-l-namenode1:60000-2] > handler.ServerShutdownHandler: Splitting logs for > l-hbase4.dba.cn1.qunar.com,60020,1440573682913 before assignment. > 2015-09-01 15:36:05,047 DEBUG > [MASTER_SERVER_OPERATIONS-l-namenode1:60000-2] master.MasterFileSystem: > Renamed region directory: hdfs://mycluster:8020/hbase/airfare/WALs/ > l-hbase4.dba.cn1.qunar.com,60020,1440573682913-splitting > 2015-09-01 15:36:05,047 INFO > [MASTER_SERVER_OPERATIONS-l-namenode1:60000-2] master.SplitLogManager: dead > splitlog workers [l-hbase4.dba.cn1.qunar.com,60020,1440573682913]
