Can you do JPS on your master and look at the logs too? Another think, can you try with hbase.zookeeper.quorum instead of hbase.ZooKeeper.quorum?
2012/11/21, [email protected] <[email protected]>: > Hi, > > Here are my HBase configuration and test: > > 1) {$HBASE_HOME}hbase/conf/hbase-site.xml > <property> > <name>hbase.ZooKeeper.quorum</name> > <value>m146,m145,m143</value> > </property> > > <property> > <name>zookeeper.session.timeout</name> > <value>60000</value> > </property> > > > 2) {$HBASE_HOME}hbase/conf/hbase-env.sh > export HBASE_MANAGES_ZK=false > > > 3) I used " {$ZK_HOME}/bin/zkCli.sh -server m145,m146,m143" to test the > connection, it worked > [zk: m145,m146,m143(CONNECTED) 0] > > > 4) from the logs, I found that the connectString was odd, the RegionServer > did not use the setting of "hbase.ZooKeeper.quorum" in conf/hbase-site.xml, > it seemed that it always used the default and tried to connect > "localhost:2181" in the distributed cluster: > > 2012-11-21 17:21:42,299 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=localhost:2181 sessionTimeout=60000 > watcher=regionserver:60020 > ... > 2012-11-21 17:21:42,313 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server localhost/127.0.0.1:2181. Will not attempt to > authenticate using SASL (Unable to locate a login configura$ > ... > 2012-11-21 17:21:42,316 WARN org.apache.zookeeper.ClientCnxn: Session > 0x0 > for server null, unexpected error, closing socket connection and attempting > reconnect java.net.ConnectException: Connection refused > ... (remark: it tried above 3 times, then had FATAL error as follows) > > 2012-11-21 17:21:57,846 ERROR > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020 > Received unexpected KeeperException, re-throwing exception > ... > 2012-11-21 17:21:57,847 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > ... > > > > Please help. > > Thanks > > > > > > On 22 Nov 2012, at 1:22 AM, Jean-Marc Spaggiari wrote: > >> Hi, >> >> What do you have on your HBase configuration? Are you passing the name >> of the Quorum servers? >> $ cat conf/hbase-site.xml >> ...... >> </property> >> <property> >> <name>hbase.zookeeper.quorum</name> >> <value>cube,latitude,node3</value> >> <description>Comma separated list of servers in the ZooKeeper >> Quorum. >> For example, >> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". >> By default this is set to localhost for local and pseudo-distributed >> modes >> of operation. For a fully-distributed setup, this should be set to a >> full >> list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in >> hbase-env.sh >> this is the list of servers which we will start/stop ZooKeeper on. >> </description> >> </property> >> ..... >> >> 2012/11/21, [email protected] <[email protected]>: >>> Hi, >>> >>> >>> I have the following line in /etc/hosts in all servers, should I keep it >>> or >>> comment it out or ...? >>> >>> 127.0.0.1 localhost >>> >>> Please help. >>> >>> Thanks >>> >>> >>> >>> On 21 Nov 2012, at 7:16 PM, [email protected] wrote: >>> >>>> Hi, >>>> >>>> >>>> Please help!! >>>> >>>> HBase version: 0.94 >>>> ZooKeeper: 3.4.4 >>>> >>>> One of the regional servers stopped very quickly after HBASE is >>>> started: >>>> >>>> ### Check JPS after HBASE cluster was started, could find the >>>> HRegionServer process (*** there is no any ZooKeeper instance running >>>> in >>>> this server ***) >>>> $ jps >>>> 24767 Jps >>>> 18418 TaskTracker >>>> 24678 HRegionServer >>>> 18156 DataNode >>>> >>>> ### Wait a while and checked JPS again, HRegionServer process gone >>>> $ jps >>>> 18418 TaskTracker >>>> 24784 Jps >>>> 18156 DataNode >>>> >>>> >>>> ### Here is the setting in hbase-site.xml ( enabled >>>> hbase.cluster.distributed, set up 3 ZooKeepers, timeout= 60000) >>>> <property> >>>> <name>hbase.cluster.distributed</name> >>>> <value>true</value> >>>> </property> >>>> >>>> <property> >>>> <name>hbase.ZooKeeper.quorum</name> >>>> <value>m146,m145,m143</value> >>>> </property> >>>> >>>> <property> >>>> <name>zookeeper.session.timeout</name> >>>> <value>60000</value> >>>> </property> >>>> >>>> >>>> ### hbase-env.sh also tells HBASE not to manage local instance of >>>> ZooKeeper >>>> export HBASE_MANAGES_ZK=false >>>> >>>> >>>> ###This server can connect to the 3 ZooKeepers, >>>> ./zkCli.sh -server m145,m146,m143 ==> [zk: >>>> m145,m146,m143(CONNECTED) >>>> 0] >>>> >>>> >>>> ### checked the hbase log file, found something odd, seemed that it >>>> tried >>>> to connect local ZooKeeper >>>> 2012-11-21 17:30:33,066 INFO org.apache.zookeeper.ZooKeeper: Initiating >>>> client connection, connectString=localhost:2181 sessionTimeout=60000 >>>> watcher=regionserver:60020 >>>> >>>> 2012-11-21 17:31:33,254 WARN >>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly >>>> transient >>>> ZooKeeper exception: >>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>> KeeperErrorCode = ConnectionLoss for /hbase/master >>>> >>>> 2012-11-21 17:31:33,254 INFO org.apache.hadoop.hbase.util.RetryCounter: >>>> Sleeping 2000ms before retry #1... >>>> 2012-11-21 17:32:33,262 INFO org.apache.zookeeper.ClientCnxn: Client >>>> session timed out, have not heard from server in 60010ms for sessionid >>>> 0x0, closing socket connection and attempting reconnect >>>> >>>> 2012-11-21 17:32:33,362 WARN >>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly >>>> transient >>>> ZooKeeper exception: >>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>> KeeperErrorCode = ConnectionLoss for /hbase/master >>>> >>>> ...... >>>> >>>> 2012-11-21 17:34:33,570 ERROR >>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper >>>> exists >>>> failed after 3 retries >>>> 2012-11-21 17:34:33,571 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: >>>> regionserver:60020 Unable to set watcher on znode /hbase/master >>>> 2012-11-21 17:34:33,573 ERROR >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020 >>>> Received unexpected KeeperException, re-throwing exception >>>> 2012-11-21 17:34:33,573 FATAL >>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region >>>> server >>>> ...... >>>> 2012-11-21 17:34:33,576 FATAL >>>> org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: >>>> loaded coprocessors are: [] >>>> >>>> 2012-11-21 17:34:36,580 FATAL >>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region >>>> server >>>> m144,60020,1353490232962: Initialization of RS failed. Hence aborting >>>> RS. >>>> java.io.IOException: Received the shutdown message while waiting. >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:623) >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:598) >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:560) >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:669) >>>> at java.lang.Thread.run(Thread.java:662) >>>> 2012-11-21 17:34:36,581 FATAL >>>> org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: >>>> loaded coprocessors are: [] >>>> >>>> >>>> Please help! >>>> QUESTION: Is it a bug and I need to check something else? >>>> >>>> Thanks >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> > >
