Reply Inline..
---------- Forwarded message ----------
From: Vamshi Krishna <[email protected]>
Date: Wed, Aug 21, 2013 at 10:42 PM
Subject: Hbase region server disconnecting with master after some time.
To: [email protected]
Hi all,
Facing problem with Hbase region server disconnecting with master after
some time. I set up Hbase cluster with 2 machines where Machine-1 (M1) is
master and Region server and M2 is only Region server.
After running hbase-start.sh , all the daemons are started perfectly but
after some time i see M2 region server is dead. I am running zookeeper on
M1 alone.
The error i found is M2 region server log is pasted below.
2013-08-22 10:31:38,554 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=vamshi_RS:2181 sessionTimeout=180000
watcher=regionserver:60020
*Anil: Above line means that RS & ZK and were unable to communicate for 3
minutes(180 sec). Hence RS is deemed dead by ZK. It seems like you have
some networking/firewall problem. *
2013-08-22 10:31:38,564 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
this process is 4076@vamshi
2013-08-22 10:31:38,568 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server vamshi_RS/192.168.1.57:2181. Will not attempt
to authenticate using SASL (Unable to locate a login configuration)
2013-08-22 10:32:41,675 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
for server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2013-08-22 10:32:41,791 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
ZooKeeper exception:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/master
2013-08-22 10:32:41,791 INFO org.apache.hadoop.hbase.util.RetryCounter:
Sleeping 2000ms before retry #1...
2013-08-22 10:32:42,789 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server vamshi_RS/192.168.1.57:2181. Will not attempt
to authenticate using SASL (Unable to locate a login configuration)
2013-08-22 10:33:45,929 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
for server null, unexpected error, closing socket connection and attempting
reconnect
.
.
..
2013-08-22 10:35:54,542 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020 Unable to set watcher on znode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172)
at
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:420)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:648)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:609)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:735)
at java.lang.Thread.run(Thread.java:662)
.
..
2013-08-22 10:35:57,549 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
vamshi,60020,1377147698472: Initialization of RS failed. Hence aborting RS.
java.io.IOException: Received the shutdown message while waiting.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:680)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:649)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:609)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:735)
at java.lang.Thread.run(Thread.java:662)
2013-08-22 10:35:57,550 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort:
loaded coprocessors are: []
2013-08-22 10:35:57,550 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Initialization
of RS failed. Hence aborting RS.
2013-08-22 10:35:57,552 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Registered RegionServer
MXBean
2013-08-22 10:35:57,553 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
vamshi,60020,1377147698472: Unhandled exception: null
java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:756)
at java.lang.Thread.run(Thread.java:662)
2013-08-22 10:35:57,553 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort:
loaded coprocessors are: []
2013-08-22 10:35:57,553 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled
exception: null
2013-08-22 10:35:57,554 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook starting;
hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-5,5,main]
2013-08-22 10:35:57,554 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown hook
2013-08-22 10:35:57,555 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown
hook thread.
2013-08-22 10:35:57,556 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished..
hbase-site.xml :
Please find my hbase-site.xml content which which is same in both M1 and M2.
<property>
<name>hbase.rootdir</name>
<!--value>hdfs://vamshi:54310/home/biginfolabs/BILSftwrs/hbase-0.94.10/data/</value-->
<value>/home/biginfolabs/BILSftwrs/hbase-0.94.10/hbstmp/</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>vamshi_RS</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.hregion.max.filesize</name>
<value>50</value>
</property>
<property>
<name>hbase.balancer.period</name>
<value>60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>vamshi_RS</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/biginfolabs/BILSftwrs/hbase-0.94.10/zkptmp</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>1000</value>
<description>Number of rows that will be fetched when calling next
</property>
<property>
<name>hbase.zookeeper.property.maxClientCnxns</name>
<value>1024</value>
</property>
<property>
<name>hbase.coprocessor.user.region.classes</name>
<value>com.bil.coproc.ColumnAggregationEndpoint</value>
</property>
Please somebody help me in figuring out my mistake and provide the solution.
Thank you.
--
*Regards*
*
Vamshi
*
--
Thanks & Regards,
Anil Gupta