Just to be clear, you are not actually running exactly 2 ZK nodes are you?  I 
think one ZK node on your master is sufficient for this size of cluster.  If 
that node goes down you entire cluster is gone in any case.  And remember, you 
need to have an odd number of ZK nodes.  And 3 nodes probably doesn't make 
sense either -- if you have a large enough cluster to need a ZK quorum, then 
you probably want to have the ability to take one node offline and have the 
cluster work with an additional failure.
Dave


From: Anthony Ikeda [mailto:[email protected]]
Sent: Wednesday, June 02, 2010 5:38 PM
To: [email protected]
Subject: Trying to get the region servers working....

I've successfully got hadoop installed and running:
Server1 (172.28.1.138) - master, namenode,  jobtracker, tasktracker
Server2 (172.28.1.139) - slave, datanode
Server3 (172.28.2.136) - slave, datanode
Server4 (172.28.2.137) - Slave, datanode

I'm now trying to get HBase up and running with the HBase managing ZooKeeper.

My HBase setup is:
Server1 - master, zookeeper1
Server2 - slave, regionserver
Server3 - slave, regionserver, zookeeper2
Server4 - Slave, regionserver

However the region servers seem to keep resolving the master server to 
127.0.0.1:60000

This is the log entry (${HBASE_HOME}/logs/ 
hbase-hbase-regionserver-SVRH127.log):
2010-06-03 09:57:52,394 INFO org.apache.zookeeper.ClientCnxn: Server connection 
successful
2010-06-03 09:57:52,432 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: 
SyncConnected, type: None, path: null
2010-06-03 09:57:52,433 DEBUG 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Set watcher on master 
address ZNode /hbase/master
2010-06-03 09:57:52,485 DEBUG 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/master 
got 127.0.0.1:60000
2010-06-03 09:57:52,486 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at 
127.0.0.1:60000 that we are up
2010-06-03 09:58:52,914 WARN 
org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to 
master. Retrying. Error was:
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)


>From what I can tell in the ZooKeeper logs, it has started successfully and is 
>communicating.
${HBASE_HOME/logs/ hbase-hbase-zookeeper-SVRH124.log
2010-06-03 10:05:47,286 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Created server
2010-06-03 10:05:47,288 INFO org.apache.zookeeper.server.quorum.Follower: 
Following /172.28.2.136:2888
2010-06-03 10:05:47,290 INFO 
org.apache.zookeeper.server.quorum.FastLeaderElection: Sending new notification.
2010-06-03 10:05:47,321 INFO org.apache.zookeeper.server.quorum.Follower: 
Getting a snapshot from leader
2010-06-03 10:05:47,335 INFO 
org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 200000000
2010-06-03 10:06:07,272 WARN org.apache.zookeeper.server.quorum.Follower: Got 
zxid 0x200000001 expected 0x1
Thu Jun  3 10:14:09 EST 2010 Stopping zookeeper
Thu Jun  3 10:14:09 EST 2010 Killing zookeeper

And ${HBASE_HOME/logs/ hbase-hbase-zookeeper-SVRH127.log
2010-06-03 10:05:48,008 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Created server
2010-06-03 10:05:48,015 INFO 
org.apache.zookeeper.server.quorum.FastLeaderElection: Sending new notification.
2010-06-03 10:05:48,016 INFO org.apache.zookeeper.server.persistence.FileSnap: 
Reading snapshot /home/hbase/zkeeper/data/version-2/snapshot.0
2010-06-03 10:05:48,020 INFO 
org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 10000000b
2010-06-03 10:05:48,041 INFO 
org.apache.zookeeper.server.quorum.FollowerHandler: Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.quorumpeer$quorumser...@6f878144
2010-06-03 10:05:48,041 WARN 
org.apache.zookeeper.server.quorum.FollowerHandler: Sending snapshot last zxid 
of peer is 0x10000000b  zxid of leader is 0x200000000
2010-06-03 10:05:48,048 WARN org.apache.zookeeper.server.quorum.Leader: 
Commiting zxid 0x200000000 from /172.28.2.136:2888 not first!
2010-06-03 10:05:48,048 WARN org.apache.zookeeper.server.quorum.Leader: First 
is 0
2010-06-03 10:06:07,992 INFO org.apache.zookeeper.server.NIOServerCnxn: 
Connected to /172.28.1.138:23600 lastZxid 0
2010-06-03 10:06:07,992 INFO org.apache.zookeeper.server.NIOServerCnxn: 
Creating new session 0x228fb20c3760000
2010-06-03 10:06:08,010 INFO org.apache.zookeeper.server.NIOServerCnxn: 
Finished init of 0x228fb20c3760000 valid:true
2010-06-03 10:06:30,002 INFO org.apache.zookeeper.server.SessionTrackerImpl: 
Expiring session 0x128fb1975310000
2010-06-03 10:06:30,003 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Expiring session 0x128fb1975310000
2010-06-03 10:06:30,004 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Processed session termination request for id: 0x128fb1975310000
2010-06-03 10:06:30,004 INFO org.apache.zookeeper.server.SessionTrackerImpl: 
Expiring session 0x128fb1975310003
2010-06-03 10:06:30,004 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Expiring session 0x128fb1975310003
2010-06-03 10:06:30,004 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Processed session termination request for id: 0x128fb1975310003
2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.SessionTrackerImpl: 
Expiring session 0x128fb1975310001
2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Expiring session 0x128fb1975310001
2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Processed session termination request for id: 0x128fb1975310001
2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.SessionTrackerImpl: 
Expiring session 0x128fb1975310002
2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Expiring session 0x128fb1975310002
2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Processed session termination request for id: 0x128fb1975310002
2010-06-03 10:09:59,904 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Processed session termination request for id: 0x228fb20c3760000
2010-06-03 10:09:59,906 INFO org.apache.zookeeper.server.NIOServerCnxn: closing 
session:0x228fb20c3760000 NIOServerCnxn: 
java.nio.channels.SocketChannel[connected local=/172.28.2.136:2181 
remote=/172.28.1.138:23600]
Thu Jun  3 10:14:09 EST 2010 Stopping zookeeper
Thu Jun  3 10:14:09 EST 2010 Killing zookeeper


The hbase-site.xml for each server is configured as:
<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://172.28.1.138/hbase</value>
        </property>
        <property>
                <name>hbase.master</name>
                <value>172.28.1.138:60000</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>
        <property>
                <name>hbase.zookeeper.quorum</name>
                <value>172.28.1.138,172.28.2.136</value>
        </property>
</configuration>

My ${HBASE_HOME}/conf/regionservers files are:
Server1 (172.28.1.138):
172.28.2.136
172.28.2.137
172.28.1.139

Server2 (172.28.1.139):
172.28.2.136
172.28.2.137
172.28.1.139

Server3 (172.28.2.136):
172.28.2.136
172.28.2.137
172.28.1.139

Server4 (172.28.2.137):
172.28.2.136
172.28.2.137
172.28.1.139

Question:
Why can't the region servers contact the master? I've checked the /etc/hosts 
file and there are 2 entries to resolve the server name (127.0.0.1 and 
172.28.x.x) with 127.0.0.1 coming first. But I've been told not to change this 
as it affects other functions of the server.

Anthony Ikeda
Java Analyst/Programmer
Cardlink Services Limited
Level 4, 3 Rider Boulevard
Rhodes NSW 2138

Web: www.*cardlink.com.au<http://*www.*cardlink.com.au> | Tel: + 61 2 9646 9221 
| Fax: + 61 2 9646 9283
[cid:[email protected]]


**********************************************************************
This e-mail message and any attachments are intended only for the use of the 
addressee(s) named above and may contain information that is privileged and 
confidential. If you are not the intended recipient, any display, 
dissemination, distribution, or copying is strictly prohibited. If you believe 
you have received this e-mail message in error, please immediately notify the 
sender by replying to this e-mail message or by telephone to (02) 9646 9222. 
Please delete the email and any attachments and do not retain the email or any 
attachments in any form.
**********************************************************************

Reply via email to