hi,all: I hava a hadoop2.7.3 HA cluster: hosts config is :
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 #::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.10.30 dmpm1.you.com master1 192.168.10.31 dmpm2.you.com master2 192.168.10.32 dmps1.you.com slave1 192.168.10.29 dmps2.you.com slave2 which 192.168.10.31 (dmpm2.you.com master2 ) has both namenode and datanode started. I found master2 's datanode log continue output warn msg : 2017-01-13 20:26:31,673 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.net.ConnectException: Call From dmpm2.you.com/192.168.10.31 to master2:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.GeneratedConstructorAccessor9.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) at org.apache.hadoop.ipc.Client.call(Client.java:1479) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy15.sendHeartbeat(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:554) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:653) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:824) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528) at org.apache.hadoop.ipc.Client.call(Client.java:1451) ... 8 more 2017-01-13 20:26:31,926 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.net.NoRouteToHostException: No Route to Host from dmpm2.you.com/192.168.10.31 to master1:9000 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.or g/hadoop/NoRouteToHost at sun.reflect.GeneratedConstructorAccessor10.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:758) at org.apache.hadoop.ipc.Client.call(Client.java:1479) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy15.sendHeartbeat(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:152) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:554) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:653) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:824) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.NoRouteToHostException: No route to host at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528) at org.apache.hadoop.ipc.Client.call(Client.java:1451) ... 8 more 2017-01-13 20:26:32,674 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master2/192.168.10.31:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2017-01-13 20:26:33,675 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master2/192.168.10.31:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2017-01-13 20:26:34,675 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master2/192.168.10.31:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2017-01-13 20:26:35,675 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master2/192.168.10.31:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2017-01-13 20:26:35,931 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master1/192.168.10.30:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2017-01-13 20:26:36,676 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master2/192.168.10.31:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS but the datanode on master2 was actually in service and 9000 port on listening: [hadoop@master2 logs]$ netstat -anp | grep 9000 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 0 0 192.168.10.31:9000 0.0.0.0:* LISTEN 85756/java tcp 0 0 192.168.10.31:9000 192.168.10.29:36500 ESTABLISHED 85756/java tcp 0 0 192.168.10.31:57968 192.168.10.31:9000 ESTABLISHED 57349/java tcp 0 0 192.168.10.31:34961 192.168.10.30:9000 ESTABLISHED 57349/java tcp 0 0 192.168.10.31:9000 192.168.10.32:50726 ESTABLISHED 85756/java tcp 0 0 192.168.10.31:9000 192.168.10.31:57968 ESTABLISHED 85756/java tcp6 0 0 192.168.10.31:48511 192.168.10.30:9000 TIME_WAIT - tcp6 0 0 192.168.10.31:48517 192.168.10.30:9000 TIME_WAIT - unix 3 [ ] STREAM CONNECTED 96690003 - unix 3 [ ] STREAM CONNECTED 96690002 - , I don't know how to resolve this trouble. 2017-02-20 lk_hadoop
