Re: HBase resgionServer crashed with no gc detected

Ted Yu Wed, 19 Oct 2016 07:07:18 -0700

The log file was not delivered by the mailing list.

Consider using pastebin or third party site.


On Tue, Oct 18, 2016 at 10:38 PM, who.cat <[email protected]> wrote:

> thanks fyi.Yes,i did not turn the debug and try it now .I also doubt the
> heavy cpu load  caused ,then checked cpu highest  Utilization is 60%(Cpu
> user )
> My region server  gc parameter is :export SERVER_GC_OPTS="-verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:{{log_dir}}/gc.log-`date
> +'%Y%m%d%H%M'`"
> The 10/12 log was rolled .i  got the same crash log yesterday(10/18).
> Details in the attachment 'regionServer.log', and the JVM pause at
> "2016-10-17 18:44:07,232" in line 82 .
> Thanks so much.
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Ted Yu";<[email protected]>;
> *发送时间:* 2016年10月19日(星期三) 中午11:17
> *收件人:* "[email protected]"<[email protected]>;
> *主题:* Re: HBase resgionServer crashed with no gc detected
>
> Can you show more of the region server log prior to 23:48:13 (including the
> pause) ?
>
> Was the region server under heavy load during the pause ?
>
> Consider turning on DEBUG logging if you haven't.
>
> Please also share GC parameters.
>
> Thanks
>
> On Tue, Oct 18, 2016 at 7:58 PM, who.cat <[email protected]> wrote:
>
> > Hi all:
> > I've a  HDP big data cluster with 4 nodes and create by Ambari  the HBase
> > is        1.1.2.
> > As running YCSB for benchmark the RegionServer instance or the Hmaster
> > instance crashes which it's logs shows:
> >
> > ---------------------log start ---------------------
> > 2016-10-12 23:48:13,591 INFO  [main-SendThread(Node1:2181)]
> > zookeeper.ClientCnxn: Unable to read additional data from server
> sessionid
> > 0x157b7f5f0bc0005, likely server has closed socket, closing socket
> > connection and attempting reconnect
> > 2016-10-12 23:48:13,595 INFO  [HBase-Metrics2-1] impl.MetricsSinkAdapter:
> > Sink timeline started
> > 2016-10-12 23:48:13,606 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl:
> > Scheduled snapshot period at 10 second(s).
> > 2016-10-12 23:48:13,606 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl:
> > HBase metrics system started
> > 2016-10-12 23:48:14,496 INFO  [main-SendThread(Node4:2181)]
> > zookeeper.ClientCnxn: Opening socket connection to server Node4/
> > 1.1.6.104:2181. Will not attempt to authenticate using SASL (unknown
> > error)
> > 2016-10-12 23:48:14,506 INFO  [main-SendThread(Node4:2181)]
> > zookeeper.ClientCnxn: Socket connection established to Node4/
> > 1.17.6.104:2181, initiating session
> > 2016-10-12 23:48:14,517 INFO  [main-SendThread(Node4:2181)]
> > zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session
> > 0x157b7f5f0bc0005 has expired, closing socket connection
> > 2016-10-12 23:48:14,517 FATAL [main-EventThread]
> > regionserver.HRegionServer: ABORTING region server
> > node1,16020,1476260847716: regionserver:16020-0x157b7f5f0bc0005,
> > quorum=node2:2181,node1:2181,node4:2181, baseZNode=/hbase-unsecure
> > regionserver:16020-0x157b7f5f0bc0005 received expired from ZooKeeper,
> > aborting
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired
> >         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.
> > connectionEvent(ZooKeeperWatcher.java:585)
> >         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.
> > process(ZooKeeperWatcher.java:517)
> >         at org.apache.zookeeper.ClientCnxn$EventThread.
> > processEvent(ClientCnxn.java:534)
> >         at org.apache.zookeeper.ClientCnxn$EventThread.run(
> > ClientCnxn.java:510)
> > 2016-10-12 23:48:14,518 FATAL [main-EventThread]
> > regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
> > [org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint]
> > ---------------------log end---------------------
> >
> > After checked the log ,it shows  that the region server jvm paused a long
> > time and the zkclient cannot send heartbeats, the session times out Which
> > the 'reference guide' had descripted http://hbase.apache.org/book.
> > html#trouble.rs.runtime.zkexpired  .So a read the log detail and to find
> > the  java GC event  but there's no  full gc occurred.
> > And more a found the same symptom in the  DataNode instance .
> >
> > The node os is Centos7 maybe the  kernel  futex bug  ,after checking the
> > bug was fixed in my OS .
> >  There's any other factor caused the problem except java GC?
> > Anyone who got the same problem ? Any ideas ?
> > Thank you .
>
>

Re: HBase resgionServer crashed with no gc detected

Reply via email to