Are you running a standalone ZK cluster? If so, are you doing the standard ZK diagnostic things?
ZK supports several simple checks which you can access by sending special four byte sequences to the standard client port. See http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html#sc_zkCommands <http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html#sc_zkCommands>(but watch out with the echo idiom ... some versions of ZK or nc may close the connection too quickly to get teh reply) On Thu, Jan 20, 2011 at 1:52 PM, mike anderson <[email protected]>wrote: > Alright I'm stumped here. It looks like Zookeeper isn't playing nice and > causing the master to not start. Any suggestions? > > > I'm seeing a lot of these: 2011-01-20 16:47:55,744 INFO > org.apache.zookeeper.ClientCnxn: Unable to read additional data from server > sessionid 0x0, likely server has closed socket, closing socket connection > and attempting reconnect > > Which looks bad, full log below. > > -mike > > > > > Thu Jan 20 16:47:47 EST 2011 Starting master on carr.host.com > ulimit -n 20480 > 2011-01-20 16:47:47,571 INFO org.apache.hadoop.hbase.ipc.HBaseRpcMetrics: > Initializing RPC Metrics with hostName=HMaster, port=60000 > 2011-01-20 16:47:47,628 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > Responder: starting > 2011-01-20 16:47:47,630 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > listener on 60000: starting > 2011-01-20 16:47:47,630 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 0 on 60000: starting > 2011-01-20 16:47:47,631 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 1 on 60000: starting > 2011-01-20 16:47:47,631 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 2 on 60000: starting > 2011-01-20 16:47:47,631 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 3 on 60000: starting > 2011-01-20 16:47:47,631 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 4 on 60000: starting > 2011-01-20 16:47:47,631 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 5 on 60000: starting > 2011-01-20 16:47:47,631 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 6 on 60000: starting > 2011-01-20 16:47:47,631 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 7 on 60000: starting > 2011-01-20 16:47:47,632 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 8 on 60000: starting > 2011-01-20 16:47:47,633 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 9 on 60000: starting > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > environment:host.name=carr.host.com > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > environment:java.version=1.6.0_07 > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > environment:java.vendor=Sun Microsystems Inc. > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > environment:java.home=/usr/java/jre1.6.0_07 > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > > environment:java.class.path=/usr/local/hbase/bin/../conf:/usr/java/jre1.6.0_07/lib/tools.jar:/usr/local/hbase/bin/..:/usr/local/hbase/bin/../hbase-0.90.0.jar:/usr/local/hbase/bin/../hbase-0.90.0-tests.jar:/usr/local/hbase/bin/../lib/activation-1.1.jar:/usr/local/hbase/bin/../lib/ant-1.6.5.jar:/usr/local/hbase/bin/../lib/ant-1.7.1.jar:/usr/local/hbase/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/hbase/bin/../lib/asm-3.1.jar:/usr/local/hbase/bin/../lib/avro-1.3.3.jar:/usr/local/hbase/bin/../lib/commons-cli-1.2.jar:/usr/local/hbase/bin/../lib/commons-codec-1.4.jar:/usr/local/hbase/bin/../lib/commons-el-1.0.jar:/usr/local/hbase/bin/../lib/commons-httpclient-3.1.jar:/usr/local/hbase/bin/../lib/commons-lang-2.5.jar:/usr/local/hbase/bin/../lib/commons-logging-1.1.1.jar:/usr/local/hbase/bin/../lib/commons-net-1.4.1.jar:/usr/local/hbase/bin/../lib/core-3.1.1.jar:/usr/local/hbase/bin/../lib/guava-r06.jar:/usr/local/hbase/bin/../lib/hadoop-0.20.2-core.jar:/usr/local/hbase/bin/../lib/hbase-0.90.0.jar:/usr/local/hbase/bin/../lib/hsqldb-1.8.0.10.jar:/usr/local/hbase/bin/../lib/jackson-core-asl-1.5.5.jar:/usr/local/hbase/bin/../lib/jackson-jaxrs-1.5.5.jar:/usr/local/hbase/bin/../lib/jackson-mapper-asl-1.4.2.jar:/usr/local/hbase/bin/../lib/jackson-xc-1.5.5.jar:/usr/local/hbase/bin/../lib/jasper-compiler-5.5.23.jar:/usr/local/hbase/bin/../lib/jasper-runtime-5.5.23.jar:/usr/local/hbase/bin/../lib/jaxb-api-2.1.jar:/usr/local/hbase/bin/../lib/jaxb-impl-2.1.12.jar:/usr/local/hbase/bin/../lib/jersey-core-1.4.jar:/usr/local/hbase/bin/../lib/jersey-json-1.4.jar:/usr/local/hbase/bin/../lib/jersey-server-1.4.jar:/usr/local/hbase/bin/../lib/jets3t-0.7.1.jar:/usr/local/hbase/bin/../lib/jettison-1.1.jar:/usr/local/hbase/bin/../lib/jetty-6.1.26.jar:/usr/local/hbase/bin/../lib/jetty-util-6.1.26.jar:/usr/local/hbase/bin/../lib/jline-0.9.94.jar:/usr/local/hbase/bin/../lib/jruby-complete-1.0.3.jar:/usr/local/hbase/bin/../lib/jsp-2.1-6.1.14.jar:/usr/local/hbase/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/local/hbase/bin/../lib/jsr311-api-1.1.1.jar:/usr/local/hbase/bin/../lib/junit-3.8.1.jar:/usr/local/hbase/bin/../lib/kfs-0.3.jar:/usr/local/hbase/bin/../lib/log4j-1.2.16.jar:/usr/local/hbase/bin/../lib/oro-2.0.8.jar:/usr/local/hbase/bin/../lib/paranamer-2.2.jar:/usr/local/hbase/bin/../lib/paranamer-ant-2.2.jar:/usr/local/hbase/bin/../lib/paranamer-generator-2.2.jar:/usr/local/hbase/bin/../lib/protobuf-java-2.3.0.jar:/usr/local/hbase/bin/../lib/qdox-1.10.1.jar:/usr/local/hbase/bin/../lib/servlet-api-2.5-6.1.14.jar:/usr/local/hbase/bin/../lib/slf4j-api-1.5.8.jar:/usr/local/hbase/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/local/hbase/bin/../lib/stax-api-1.0.1.jar:/usr/local/hbase/bin/../lib/thrift-0.2.0.jar:/usr/local/hbase/bin/../lib/xmlenc-0.52.jar:/usr/local/hbase/bin/../lib/zookeeper-3.3.2.jar > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > > environment:java.library.path=/usr/java/jre1.6.0_07/lib/amd64/server:/usr/java/jre1.6.0_07/lib/amd64:/usr/java/jre1.6.0_07/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > environment:java.io.tmpdir=/tmp > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > environment:java.compiler=<NA> > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > environment:os.name=Linux > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > environment:os.arch=amd64 > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > environment:os.version=2.6.18-53.1.21.el5 > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > environment:user.name=pubget > 2011-01-20 16:47:47,651 INFO org.apache.zookeeper.ZooKeeper: Client > environment:user.home=/home/pubget > 2011-01-20 16:47:47,652 INFO org.apache.zookeeper.ZooKeeper: Client > environment:user.dir=/home/pubget/hbase-0.90.0 > 2011-01-20 16:47:47,652 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=ghetto163.host.com:2181, > ghetto162.host.com:2181,ghetto161.host.com:2181,ghetto160.host.com:2181, > ghetto169.host.com:2181,ghetto167.host.com:2181,ghetto165.host.com:2181, > ghetto164.host.com:2181 sessionTimeout=180000 watcher=master:60000 > 2011-01-20 16:47:47,673 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto167.host.com/10.0.16.167:2181 > 2011-01-20 16:47:47,678 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto167.host.com/10.0.16.167:2181, initiating > session > 2011-01-20 16:47:47,683 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:47,957 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto163.host.com/10.0.16.163:2181 > 2011-01-20 16:47:47,957 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto163.host.com/10.0.16.163:2181, initiating > session > 2011-01-20 16:47:47,958 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:48,906 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto169.host.com/10.0.16.169:2181 > 2011-01-20 16:47:48,907 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto169.host.com/10.0.16.169:2181, initiating > session > 2011-01-20 16:47:48,907 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:49,784 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto160.host.com/10.0.16.160:2181 > 2011-01-20 16:47:49,785 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto160.host.com/10.0.16.160:2181, initiating > session > 2011-01-20 16:47:49,785 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:50,605 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto162.host.com/10.0.16.162:2181 > 2011-01-20 16:47:50,606 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto162.host.com/10.0.16.162:2181, initiating > session > 2011-01-20 16:47:50,607 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:50,885 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto161.host.com/10.0.16.161:2181 > 2011-01-20 16:47:50,886 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto161.host.com/10.0.16.161:2181, initiating > session > 2011-01-20 16:47:50,887 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:51,150 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto164.host.com/10.0.16.164:2181 > 2011-01-20 16:47:51,150 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto164.host.com/10.0.16.164:2181, initiating > session > 2011-01-20 16:47:51,151 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:52,042 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto165.host.com/10.0.16.165:2181 > 2011-01-20 16:47:52,042 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto165.host.com/10.0.16.165:2181, initiating > session > 2011-01-20 16:47:52,043 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:52,832 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto167.host.com/10.0.16.167:2181 > 2011-01-20 16:47:52,833 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto167.host.com/10.0.16.167:2181, initiating > session > 2011-01-20 16:47:52,833 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:53,749 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto163.host.com/10.0.16.163:2181 > 2011-01-20 16:47:53,750 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto163.host.com/10.0.16.163:2181, initiating > session > 2011-01-20 16:47:53,750 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:54,632 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto169.host.com/10.0.16.169:2181 > 2011-01-20 16:47:54,632 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto169.host.com/10.0.16.169:2181, initiating > session > 2011-01-20 16:47:54,633 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:55,254 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto160.host.com/10.0.16.160:2181 > 2011-01-20 16:47:55,254 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto160.host.com/10.0.16.160:2181, initiating > session > 2011-01-20 16:47:55,255 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:55,743 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto162.host.com/10.0.16.162:2181 > 2011-01-20 16:47:55,743 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto162.host.com/10.0.16.162:2181, initiating > session > 2011-01-20 16:47:55,744 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:56,598 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto161.host.com/10.0.16.161:2181 > 2011-01-20 16:47:56,598 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto161.host.com/10.0.16.161:2181, initiating > session > 2011-01-20 16:47:56,599 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:57,696 INFO org.apache.zookeeper.ClientCnxn: Opening > socket > connection to server ghetto164.host.com/10.0.16.164:2181 > 2011-01-20 16:47:57,697 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to ghetto164.host.com/10.0.16.164:2181, initiating > session > 2011-01-20 16:47:57,697 INFO org.apache.zookeeper.ClientCnxn: Unable to > read > additional data from server sessionid 0x0, likely server has closed socket, > closing socket connection and attempting reconnect > 2011-01-20 16:47:57,802 ERROR > org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master > java.lang.RuntimeException: Failed construction of Master: class > org.apache.hadoop.hbase.master.HMaster > at > org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1057) > at > > org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:142) > at > > org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:102) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > > org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) > at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1071) > Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /hbase > at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) > at > > org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:902) > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133) > at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:217) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > Source) > at java.lang.reflect.Constructor.newInstance(Unknown Source) > at > org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1052) > ... 5 more >
