Just to be clear, you are not actually running exactly 2 ZK nodes are you? I think one ZK node on your master is sufficient for this size of cluster. If that node goes down you entire cluster is gone in any case. And remember, you need to have an odd number of ZK nodes. And 3 nodes probably doesn't make sense either -- if you have a large enough cluster to need a ZK quorum, then you probably want to have the ability to take one node offline and have the cluster work with an additional failure. Dave
From: Anthony Ikeda [mailto:[email protected]] Sent: Wednesday, June 02, 2010 5:38 PM To: [email protected] Subject: Trying to get the region servers working.... I've successfully got hadoop installed and running: Server1 (172.28.1.138) - master, namenode, jobtracker, tasktracker Server2 (172.28.1.139) - slave, datanode Server3 (172.28.2.136) - slave, datanode Server4 (172.28.2.137) - Slave, datanode I'm now trying to get HBase up and running with the HBase managing ZooKeeper. My HBase setup is: Server1 - master, zookeeper1 Server2 - slave, regionserver Server3 - slave, regionserver, zookeeper2 Server4 - Slave, regionserver However the region servers seem to keep resolving the master server to 127.0.0.1:60000 This is the log entry (${HBASE_HOME}/logs/ hbase-hbase-regionserver-SVRH127.log): 2010-06-03 09:57:52,394 INFO org.apache.zookeeper.ClientCnxn: Server connection successful 2010-06-03 09:57:52,432 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: None, path: null 2010-06-03 09:57:52,433 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Set watcher on master address ZNode /hbase/master 2010-06-03 09:57:52,485 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/master got 127.0.0.1:60000 2010-06-03 09:57:52,486 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at 127.0.0.1:60000 that we are up 2010-06-03 09:58:52,914 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >From what I can tell in the ZooKeeper logs, it has started successfully and is >communicating. ${HBASE_HOME/logs/ hbase-hbase-zookeeper-SVRH124.log 2010-06-03 10:05:47,286 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server 2010-06-03 10:05:47,288 INFO org.apache.zookeeper.server.quorum.Follower: Following /172.28.2.136:2888 2010-06-03 10:05:47,290 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Sending new notification. 2010-06-03 10:05:47,321 INFO org.apache.zookeeper.server.quorum.Follower: Getting a snapshot from leader 2010-06-03 10:05:47,335 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 200000000 2010-06-03 10:06:07,272 WARN org.apache.zookeeper.server.quorum.Follower: Got zxid 0x200000001 expected 0x1 Thu Jun 3 10:14:09 EST 2010 Stopping zookeeper Thu Jun 3 10:14:09 EST 2010 Killing zookeeper And ${HBASE_HOME/logs/ hbase-hbase-zookeeper-SVRH127.log 2010-06-03 10:05:48,008 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server 2010-06-03 10:05:48,015 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Sending new notification. 2010-06-03 10:05:48,016 INFO org.apache.zookeeper.server.persistence.FileSnap: Reading snapshot /home/hbase/zkeeper/data/version-2/snapshot.0 2010-06-03 10:05:48,020 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 10000000b 2010-06-03 10:05:48,041 INFO org.apache.zookeeper.server.quorum.FollowerHandler: Follower sid: 1 : info : org.apache.zookeeper.server.quorum.quorumpeer$quorumser...@6f878144 2010-06-03 10:05:48,041 WARN org.apache.zookeeper.server.quorum.FollowerHandler: Sending snapshot last zxid of peer is 0x10000000b zxid of leader is 0x200000000 2010-06-03 10:05:48,048 WARN org.apache.zookeeper.server.quorum.Leader: Commiting zxid 0x200000000 from /172.28.2.136:2888 not first! 2010-06-03 10:05:48,048 WARN org.apache.zookeeper.server.quorum.Leader: First is 0 2010-06-03 10:06:07,992 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /172.28.1.138:23600 lastZxid 0 2010-06-03 10:06:07,992 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session 0x228fb20c3760000 2010-06-03 10:06:08,010 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x228fb20c3760000 valid:true 2010-06-03 10:06:30,002 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x128fb1975310000 2010-06-03 10:06:30,003 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x128fb1975310000 2010-06-03 10:06:30,004 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination request for id: 0x128fb1975310000 2010-06-03 10:06:30,004 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x128fb1975310003 2010-06-03 10:06:30,004 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x128fb1975310003 2010-06-03 10:06:30,004 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination request for id: 0x128fb1975310003 2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x128fb1975310001 2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x128fb1975310001 2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination request for id: 0x128fb1975310001 2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x128fb1975310002 2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x128fb1975310002 2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination request for id: 0x128fb1975310002 2010-06-03 10:09:59,904 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination request for id: 0x228fb20c3760000 2010-06-03 10:09:59,906 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x228fb20c3760000 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/172.28.2.136:2181 remote=/172.28.1.138:23600] Thu Jun 3 10:14:09 EST 2010 Stopping zookeeper Thu Jun 3 10:14:09 EST 2010 Killing zookeeper The hbase-site.xml for each server is configured as: <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://172.28.1.138/hbase</value> </property> <property> <name>hbase.master</name> <value>172.28.1.138:60000</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>172.28.1.138,172.28.2.136</value> </property> </configuration> My ${HBASE_HOME}/conf/regionservers files are: Server1 (172.28.1.138): 172.28.2.136 172.28.2.137 172.28.1.139 Server2 (172.28.1.139): 172.28.2.136 172.28.2.137 172.28.1.139 Server3 (172.28.2.136): 172.28.2.136 172.28.2.137 172.28.1.139 Server4 (172.28.2.137): 172.28.2.136 172.28.2.137 172.28.1.139 Question: Why can't the region servers contact the master? I've checked the /etc/hosts file and there are 2 entries to resolve the server name (127.0.0.1 and 172.28.x.x) with 127.0.0.1 coming first. But I've been told not to change this as it affects other functions of the server. Anthony Ikeda Java Analyst/Programmer Cardlink Services Limited Level 4, 3 Rider Boulevard Rhodes NSW 2138 Web: www.*cardlink.com.au<http://*www.*cardlink.com.au> | Tel: + 61 2 9646 9221 | Fax: + 61 2 9646 9283 [cid:[email protected]] ********************************************************************** This e-mail message and any attachments are intended only for the use of the addressee(s) named above and may contain information that is privileged and confidential. If you are not the intended recipient, any display, dissemination, distribution, or copying is strictly prohibited. If you believe you have received this e-mail message in error, please immediately notify the sender by replying to this e-mail message or by telephone to (02) 9646 9222. Please delete the email and any attachments and do not retain the email or any attachments in any form. **********************************************************************
