Hi,

Please help!!

HBase version: 0.94
ZooKeeper: 3.4.4

One of the regional servers stopped very quickly after HBASE is started:

### Check JPS after HBASE cluster was started, could find the HRegionServer 
process (*** there is no any ZooKeeper instance running in this server ***)
$ jps
24767 Jps
18418 TaskTracker
24678 HRegionServer
18156 DataNode

### Wait a while and checked JPS again,  HRegionServer process gone
$ jps
18418 TaskTracker
24784 Jps
18156 DataNode


### Here is the setting in hbase-site.xml ( enabled hbase.cluster.distributed, 
set up 3 ZooKeepers, timeout= 60000)
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>

<property>
<name>hbase.ZooKeeper.quorum</name>
<value>m146,m145,m143</value>
</property>

<property>
<name>zookeeper.session.timeout</name>
<value>60000</value>
</property>


### hbase-env.sh also tells HBASE not to manage local instance of ZooKeeper
export HBASE_MANAGES_ZK=false


###This server can connect to the 3 ZooKeepers,
./zkCli.sh -server m145,m146,m143       ==>  [zk: m145,m146,m143(CONNECTED) 0]


### checked the hbase log file, found something odd,  seemed that it tried to 
connect local ZooKeeper 
2012-11-21 17:30:33,066 INFO org.apache.zookeeper.ZooKeeper: Initiating client 
connection, connectString=localhost:2181 sessionTimeout=60000 
watcher=regionserver:60020

2012-11-21 17:31:33,254 WARN 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient 
ZooKeeper exception: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/master

2012-11-21 17:31:33,254 INFO org.apache.hadoop.hbase.util.RetryCounter: 
Sleeping 2000ms before retry #1...
2012-11-21 17:32:33,262 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 60010ms for sessionid 0x0, closing 
socket connection and attempting reconnect

2012-11-21 17:32:33,362 WARN 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient 
ZooKeeper exception: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/master

......

2012-11-21 17:34:33,570 ERROR 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed 
after 3 retries
2012-11-21 17:34:33,571 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: 
regionserver:60020 Unable to set watcher on znode /hbase/master
2012-11-21 17:34:33,573 ERROR 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020 Received 
unexpected KeeperException, re-throwing exception
2012-11-21 17:34:33,573 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
......
2012-11-21 17:34:33,576 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded 
coprocessors are: []

2012-11-21 17:34:36,580 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
m144,60020,1353490232962: Initialization of RS failed.  Hence aborting RS.
java.io.IOException: Received the shutdown message while waiting.
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:623)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:598)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:560)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:669)
        at java.lang.Thread.run(Thread.java:662)
2012-11-21 17:34:36,581 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded 
coprocessors are: []


Please help!
QUESTION: Is it a bug and I need to check something else?  

Thanks






Reply via email to