Hi,
Based on you logs:
2015-09-01 15:35:58,047 INFO  [JvmPauseMonitor] util.JvmPauseMonitor:
Detected pause in JVM or host machine *(eg GC): pause of approximately
4954ms*

you had long running GC which cause timeout in communication between
regionserver and zookeeper:
2015-09-01 15:36:04,970 INFO  [main-EventThread]
zookeeper.RegionServerTracker: *RegionServer ephemeral node deleted,
processing expiration [l-hbase4.dba.cn1.qunar.com
<http://l-hbase4.dba.cn1.qunar.com>,*

*60020,1440573682913]*
that have cause to rs shutdown.

Check your GC configuration in hbase-env.sh, also check load that you are
generating on your cluster.

Regards
Samir


On Tue, Sep 1, 2015 at 1:59 PM, 聪聪 <[email protected]> wrote:

> hi,all:
>
>
> I use the HBase version is hbase-0.96.0.This afternoon(2015-09-01 15:35),I
> met a problem.One of regionservers shutdown,I don't know why.Can we get
> some help over here?
>
>
>
> Regionserver on the log is as follows:
> 2015-09-01 15:35:45,476 DEBUG
> [regionserver60020-smallCompactions-1440573854394] backup.HFileArchiver:
> Finished archiving from class
> org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile,
> file:hdfs://mycluster:8020/hbase/airfare/data/atp/atp_fare/c0eb67f08c2e3818f2e52812ec69b71c/i/ff73fb07c62044c082f6b4f92e0ed7ca,
> to
> hdfs://mycluster:8020/hbase/airfare/archive/data/atp/atp_fare/c0eb67f08c2e3818f2e52812ec69b71c/i/ff73fb07c62044c082f6b4f92e0ed7ca
> 2015-09-01 15:35:45,476 INFO
> [regionserver60020-smallCompactions-1440573854394] regionserver.HStore:
> Completed compaction of 3 file(s) in i of
> atp:atp_fare,I,1440507763404.c0eb67f08c2e3818f2e52812ec69b71c. into
> fb746308d219490d81e5f8f1dd8b60f1(size=56.9 M), total size for store is 1.8
> G. This selection was in queue for 0sec, and took 4sec to execute.
> 2015-09-01 15:35:45,476 INFO
> [regionserver60020-smallCompactions-1440573854394]
> regionserver.CompactSplitThread: Completed compaction: Request =
> regionName=atp:atp_fare,I,1440507763404.c0eb67f08c2e3818f2e52812ec69b71c.,
> storeName=i, fileCount=3, fileSize=56.9 M, priority=24,
> time=29115372534800314; duration=4sec
> 2015-09-01 15:35:45,476 DEBUG
> [regionserver60020-smallCompactions-1440573854394]
> regionserver.CompactSplitThread: CompactSplitThread Status:
> compaction_queue=(0:0), split_queue=0, merge_queue=0
> 2015-09-01 15:35:58,047 INFO  [JvmPauseMonitor] util.JvmPauseMonitor:
> Detected pause in JVM or host machine (eg GC): pause of approximately 4954ms
> GC pool 'G1 Young Generation' had collection(s): count=1 time=224ms
> GC pool 'G1 Old Generation' had collection(s): count=1 time=5077ms
> 2015-09-01 15:36:04,883 INFO  [main] zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.4.5-cdh5.2.0--1, built on 10/11/2014 20:49
> GMT
> 2015-09-01 15:36:04,883 INFO  [main] zookeeper.ZooKeeper: Client
> environment:host.name=l-hbase4.dba.cn1.qunar.com
> 2015-09-01 15:36:04,884 INFO  [main] zookeeper.ZooKeeper: Client
> environment:java.version=1.7.0_45
> 2015-09-01 15:36:04,884 INFO  [main] zookeeper.ZooKeeper: Client
> environment:java.vendor=Oracle Corporation
> 2015-09-01 15:36:04,884 INFO  [main] zookeeper.ZooKeeper: Client
> environment:java.home=/home/q/java/jdk1.7.0_45/jre
>
>
>
>
>
> Master on the log  is as follows:
> 015-09-01 15:32:40,918 DEBUG [master:l-namenode1:60000.oldLogCleaner]
> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
> l-hbase2.dba.cn1.qunar.com%2C60020%2C1440559908245.1441089788639
> 2015-09-01 15:32:40,920 DEBUG [master:l-namenode1:60000.oldLogCleaner]
> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
> l-hbase2.dba.cn1.qunar.com%2C60020%2C1440559908245.1441089839980
> 2015-09-01 15:32:40,963 DEBUG [master:l-namenode1:60000.oldLogCleaner]
> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
> l-hbase2.dba.cn1.qunar.com%2C60020%2C1440559908245.1441089890476
> 2015-09-01 15:32:40,966 DEBUG [master:l-namenode1:60000.oldLogCleaner]
> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
> l-hbase2.dba.cn1.qunar.com%2C60020%2C1440559908245.1441089939050
> 2015-09-01 15:35:40,697 DEBUG [master:l-namenode1:60000.oldLogCleaner]
> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
> l-hbase5.dba.cn1.qunar.com%2C60020%2C1440574191618.1441088646120
> 2015-09-01 15:36:04,970 INFO  [main-EventThread]
> zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> processing expiration [l-hbase4.dba.cn1.qunar.com,60020,1440573682913]
> 2015-09-01 15:36:04,973 DEBUG [main-EventThread] master.AssignmentManager:
> based on AM, current region=hbase:meta,,1.1588230740 is on server=
> l-hbase3.dba.cn1.qunar.com,60020,1440559933207 server being checked:
> l-hbase4.dba.cn1.qunar.com,60020,1440573682913
> 2015-09-01 15:36:04,973 DEBUG [main-EventThread] master.ServerManager:
> Added=l-hbase4.dba.cn1.qunar.com,60020,1440573682913 to dead servers,
> submitted shutdown handler to be executed meta=false
> 2015-09-01 15:36:04,976 DEBUG [main-EventThread]
> zookeeper.RegionServerTracker: RS node: /hbase/airfare/rs/
> l-hbase5.dba.cn1.qunar.com,60020,1440574191618 data: PBUF^H��^C
> 2015-09-01 15:36:04,976 DEBUG [main-EventThread]
> zookeeper.RegionServerTracker: RS node: /hbase/airfare/rs/
> l-hbase3.dba.cn1.qunar.com,60020,1440559933207 data: PBUF^H��^C
> 2015-09-01 15:36:04,976 DEBUG [main-EventThread]
> zookeeper.RegionServerTracker: RS node: /hbase/airfare/rs/
> l-hbase1.dba.cn1.qunar.com,60020,1440573706827 data: PBUF^H��^C
> 2015-09-01 15:36:04,977 DEBUG [main-EventThread]
> zookeeper.RegionServerTracker: RS node: /hbase/airfare/rs/
> l-hbase2.dba.cn1.qunar.com,60020,1440559908245 data: PBUF^H��^C
> 2015-09-01 15:36:05,045 INFO
> [MASTER_SERVER_OPERATIONS-l-namenode1:60000-2]
> handler.ServerShutdownHandler: Splitting logs for
> l-hbase4.dba.cn1.qunar.com,60020,1440573682913 before assignment.
> 2015-09-01 15:36:05,047 DEBUG
> [MASTER_SERVER_OPERATIONS-l-namenode1:60000-2] master.MasterFileSystem:
> Renamed region directory: hdfs://mycluster:8020/hbase/airfare/WALs/
> l-hbase4.dba.cn1.qunar.com,60020,1440573682913-splitting
> 2015-09-01 15:36:05,047 INFO
> [MASTER_SERVER_OPERATIONS-l-namenode1:60000-2] master.SplitLogManager: dead
> splitlog workers [l-hbase4.dba.cn1.qunar.com,60020,1440573682913]

Reply via email to