thanks fyi.Yes,i did not turn the debug and try it now .I also doubt the heavy
cpu load caused ,then checked cpu highest Utilization is 60%(Cpu user )
My region server gc parameter is :export SERVER_GC_OPTS="-verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:{{log_dir}}/gc.log-`date
+'%Y%m%d%H%M'`"
The 10/12 log was rolled .i got the same crash log yesterday(10/18). Details
in the attachment 'regionServer.log', and the JVM pause at "2016-10-17
18:44:07,232" in line 82 .
Thanks so much.
------------------ ???????? ------------------
??????: "Ted Yu";<[email protected]>;
????????: 2016??10??19??(??????) ????11:17
??????: "[email protected]"<[email protected]>;
????: Re: HBase resgionServer crashed with no gc detected
Can you show more of the region server log prior to 23:48:13 (including the
pause) ?
Was the region server under heavy load during the pause ?
Consider turning on DEBUG logging if you haven't.
Please also share GC parameters.
Thanks
On Tue, Oct 18, 2016 at 7:58 PM, who.cat <[email protected]> wrote:
> Hi all:
> I've a HDP big data cluster with 4 nodes and create by Ambari the HBase
> is 1.1.2.
> As running YCSB for benchmark the RegionServer instance or the Hmaster
> instance crashes which it's logs shows:
>
> ---------------------log start ---------------------
> 2016-10-12 23:48:13,591 INFO [main-SendThread(Node1:2181)]
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid
> 0x157b7f5f0bc0005, likely server has closed socket, closing socket
> connection and attempting reconnect
> 2016-10-12 23:48:13,595 INFO [HBase-Metrics2-1] impl.MetricsSinkAdapter:
> Sink timeline started
> 2016-10-12 23:48:13,606 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl:
> Scheduled snapshot period at 10 second(s).
> 2016-10-12 23:48:13,606 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl:
> HBase metrics system started
> 2016-10-12 23:48:14,496 INFO [main-SendThread(Node4:2181)]
> zookeeper.ClientCnxn: Opening socket connection to server Node4/
> 1.1.6.104:2181. Will not attempt to authenticate using SASL (unknown
> error)
> 2016-10-12 23:48:14,506 INFO [main-SendThread(Node4:2181)]
> zookeeper.ClientCnxn: Socket connection established to Node4/
> 1.17.6.104:2181, initiating session
> 2016-10-12 23:48:14,517 INFO [main-SendThread(Node4:2181)]
> zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session
> 0x157b7f5f0bc0005 has expired, closing socket connection
> 2016-10-12 23:48:14,517 FATAL [main-EventThread]
> regionserver.HRegionServer: ABORTING region server
> node1,16020,1476260847716: regionserver:16020-0x157b7f5f0bc0005,
> quorum=node2:2181,node1:2181,node4:2181, baseZNode=/hbase-unsecure
> regionserver:16020-0x157b7f5f0bc0005 received expired from ZooKeeper,
> aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
> at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.
> connectionEvent(ZooKeeperWatcher.java:585)
> at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.
> process(ZooKeeperWatcher.java:517)
> at org.apache.zookeeper.ClientCnxn$EventThread.
> processEvent(ClientCnxn.java:534)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(
> ClientCnxn.java:510)
> 2016-10-12 23:48:14,518 FATAL [main-EventThread]
> regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
> [org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint]
> ---------------------log end---------------------
>
> After checked the log ,it shows that the region server jvm paused a long
> time and the zkclient cannot send heartbeats, the session times out Which
> the 'reference guide' had descripted http://hbase.apache.org/book.
> html#trouble.rs.runtime.zkexpired .So a read the log detail and to find
> the java GC event but there's no full gc occurred.
> And more a found the same symptom in the DataNode instance .
>
> The node os is Centos7 maybe the kernel futex bug ,after checking the
> bug was fixed in my OS .
> There's any other factor caused the problem except java GC?
> Anyone who got the same problem ? Any ideas ?
> Thank you .