The log file was not delivered by the mailing list. Consider using pastebin or third party site.
On Tue, Oct 18, 2016 at 10:38 PM, who.cat <[email protected]> wrote: > thanks fyi.Yes,i did not turn the debug and try it now .I also doubt the > heavy cpu load caused ,then checked cpu highest Utilization is 60%(Cpu > user ) > My region server gc parameter is :export SERVER_GC_OPTS="-verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:{{log_dir}}/gc.log-`date > +'%Y%m%d%H%M'`" > The 10/12 log was rolled .i got the same crash log yesterday(10/18). > Details in the attachment 'regionServer.log', and the JVM pause at > "2016-10-17 18:44:07,232" in line 82 . > Thanks so much. > > > > > > ------------------ 原始邮件 ------------------ > *发件人:* "Ted Yu";<[email protected]>; > *发送时间:* 2016年10月19日(星期三) 中午11:17 > *收件人:* "[email protected]"<[email protected]>; > *主题:* Re: HBase resgionServer crashed with no gc detected > > Can you show more of the region server log prior to 23:48:13 (including the > pause) ? > > Was the region server under heavy load during the pause ? > > Consider turning on DEBUG logging if you haven't. > > Please also share GC parameters. > > Thanks > > On Tue, Oct 18, 2016 at 7:58 PM, who.cat <[email protected]> wrote: > > > Hi all: > > I've a HDP big data cluster with 4 nodes and create by Ambari the HBase > > is 1.1.2. > > As running YCSB for benchmark the RegionServer instance or the Hmaster > > instance crashes which it's logs shows: > > > > ---------------------log start --------------------- > > 2016-10-12 23:48:13,591 INFO [main-SendThread(Node1:2181)] > > zookeeper.ClientCnxn: Unable to read additional data from server > sessionid > > 0x157b7f5f0bc0005, likely server has closed socket, closing socket > > connection and attempting reconnect > > 2016-10-12 23:48:13,595 INFO [HBase-Metrics2-1] impl.MetricsSinkAdapter: > > Sink timeline started > > 2016-10-12 23:48:13,606 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: > > Scheduled snapshot period at 10 second(s). > > 2016-10-12 23:48:13,606 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: > > HBase metrics system started > > 2016-10-12 23:48:14,496 INFO [main-SendThread(Node4:2181)] > > zookeeper.ClientCnxn: Opening socket connection to server Node4/ > > 1.1.6.104:2181. Will not attempt to authenticate using SASL (unknown > > error) > > 2016-10-12 23:48:14,506 INFO [main-SendThread(Node4:2181)] > > zookeeper.ClientCnxn: Socket connection established to Node4/ > > 1.17.6.104:2181, initiating session > > 2016-10-12 23:48:14,517 INFO [main-SendThread(Node4:2181)] > > zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session > > 0x157b7f5f0bc0005 has expired, closing socket connection > > 2016-10-12 23:48:14,517 FATAL [main-EventThread] > > regionserver.HRegionServer: ABORTING region server > > node1,16020,1476260847716: regionserver:16020-0x157b7f5f0bc0005, > > quorum=node2:2181,node1:2181,node4:2181, baseZNode=/hbase-unsecure > > regionserver:16020-0x157b7f5f0bc0005 received expired from ZooKeeper, > > aborting > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired > > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher. > > connectionEvent(ZooKeeperWatcher.java:585) > > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher. > > process(ZooKeeperWatcher.java:517) > > at org.apache.zookeeper.ClientCnxn$EventThread. > > processEvent(ClientCnxn.java:534) > > at org.apache.zookeeper.ClientCnxn$EventThread.run( > > ClientCnxn.java:510) > > 2016-10-12 23:48:14,518 FATAL [main-EventThread] > > regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: > > [org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint] > > ---------------------log end--------------------- > > > > After checked the log ,it shows that the region server jvm paused a long > > time and the zkclient cannot send heartbeats, the session times out Which > > the 'reference guide' had descripted http://hbase.apache.org/book. > > html#trouble.rs.runtime.zkexpired .So a read the log detail and to find > > the java GC event but there's no full gc occurred. > > And more a found the same symptom in the DataNode instance . > > > > The node os is Centos7 maybe the kernel futex bug ,after checking the > > bug was fixed in my OS . > > There's any other factor caused the problem except java GC? > > Anyone who got the same problem ? Any ideas ? > > Thank you . > >
