If the nodes can communicate and distribute data, then the odds are that the issue isn't going to be in his /etc/hosts.
A more relevant question is if he's running a firewall on each of these machines? A simple test... ssh to one node, ping other nodes and the control nodes at random to see if they can see one another. Then check to see if there is a firewall running which would limit the types of traffic between nodes. One other side note... are these machines multi-homed? On Aug 13, 2012, at 7:51 AM, Mohammad Tariq <[email protected]> wrote: > Hello there, > > Could you please share your /etc/hosts file, if you don't mind. > > Regards, > Mohammad Tariq > > > > On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek <[email protected]> > wrote: > Hi, > > i am currently trying to run my hadoop program on a cluster. Sadly though my > datanodes and tasktrackers seem to have difficulties with their communication > as their logs say: > * Some datanodes and tasktrackers seem to have portproblems of some kind as > it can be seen in the logs below. I wondered if this might be due to reasons > correllated with the localhost entry in /etc/hosts as you can read in alot of > posts with similar errors, but i checked the file neither localhost nor > 127.0.0.1/127.0.1.1 is bound there. (although you can ping localhost... the > technician of the cluster said he'd be looking for the mechanics resolving > localhost) > * The other nodes can not speak with the namenode and jobtracker (its-cs131). > Although it is absolutely not clear, why this is happening: the "dfs -put" i > do directly before the job is running fine, which seems to imply that > communication between those servers is working flawlessly. > > Is there any reason why this might happen? > > > Regards, > Elmar > > LOGS BELOW: > > \____Datanodes > > After successfully putting the data to hdfs (at this point i thought namenode > and datanodes have to communicate), i get the following errors when starting > the job: > > There are 2 kinds of logs i found: the first one is big (about 12MB) and > looks like this: > ############################### LOG TYPE 1 > ############################################################ > 2012-08-13 08:23:27,331 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35554. Already tried 0 time(s). > 2012-08-13 08:23:28,332 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35554. Already tried 1 time(s). > 2012-08-13 08:23:29,332 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35554. Already tried 2 time(s). > 2012-08-13 08:23:30,332 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35554. Already tried 3 time(s). > 2012-08-13 08:23:31,333 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35554. Already tried 4 time(s). > 2012-08-13 08:23:32,333 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35554. Already tried 5 time(s). > 2012-08-13 08:23:33,334 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35554. Already tried 6 time(s). > 2012-08-13 08:23:34,334 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35554. Already tried 7 time(s). > 2012-08-13 08:23:35,334 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35554. Already tried 8 time(s). > 2012-08-13 08:23:36,335 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35554. Already tried 9 time(s). > 2012-08-13 08:23:36,335 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > java.net.ConnectException: Call to its-cs131/141.51.205.41:35554 failed on > connection exception: java.net.ConnectException: Connection refused > at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095) > at org.apache.hadoop.ipc.Client.call(Client.java:1071) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) > at $Proxy5.sendHeartbeat(Unknown Source) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:904) > at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560) > at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202) > at org.apache.hadoop.ipc.Client.call(Client.java:1046) > ... 5 more > > ... (this continues til the end of the log) > > The second is short kind: > ########################### LOG TYPE 2 > ############################################################ > 2012-08-13 00:59:19,038 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting DataNode > STARTUP_MSG: host = its-cs133.its.uni-kassel.de/141.51.205.43 > STARTUP_MSG: args = [] > STARTUP_MSG: version = 1.0.2 > STARTUP_MSG: build = > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r > 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012 > ************************************************************/ > 2012-08-13 00:59:19,203 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: > loaded properties from hadoop-metrics2.properties > 2012-08-13 00:59:19,216 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > MetricsSystem,sub=Stats registered. > 2012-08-13 00:59:19,217 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period > at 10 second(s). > 2012-08-13 00:59:19,218 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system > started > 2012-08-13 00:59:19,306 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi > registered. > 2012-08-13 00:59:19,346 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded > the native-hadoop library > 2012-08-13 00:59:20,482 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35554. Already tried 0 time(s). > 2012-08-13 00:59:21,584 INFO org.apache.hadoop.hdfs.server.common.Storage: > Storage directory /home/work/bmacek/hadoop/hdfs/slave is not formatted. > 2012-08-13 00:59:21,584 INFO org.apache.hadoop.hdfs.server.common.Storage: > Formatting ... > 2012-08-13 00:59:21,787 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Registered FSDatasetStatusMBean > 2012-08-13 00:59:21,897 INFO > org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: Shutting > down all async disk service threads... > 2012-08-13 00:59:21,897 INFO > org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All async > disk service threads have been shut down. > 2012-08-13 00:59:21,898 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.BindException: > Problem binding to /0.0.0.0:50010 : Address already in use > at org.apache.hadoop.ipc.Server.bind(Server.java:227) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:404) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) > at org.apache.hadoop.ipc.Server.bind(Server.java:225) > ... 7 more > > 2012-08-13 00:59:21,899 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down DataNode at > its-cs133.its.uni-kassel.de/141.51.205.43 > ************************************************************/ > > > > > > \_____TastTracker > With TaskTrackers it is the same: there are 2 kinds. > ############################### LOG TYPE 1 > ############################################################ > 2012-08-13 02:09:54,645 INFO org.apache.hadoop.mapred.TaskTracker: Resending > 'status' to 'its-cs131' with reponseId '879 > 2012-08-13 02:09:55,646 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35555. Already tried 0 time(s). > 2012-08-13 02:09:56,646 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35555. Already tried 1 time(s). > 2012-08-13 02:09:57,647 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35555. Already tried 2 time(s). > 2012-08-13 02:09:58,647 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35555. Already tried 3 time(s). > 2012-08-13 02:09:59,648 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35555. Already tried 4 time(s). > 2012-08-13 02:10:00,648 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35555. Already tried 5 time(s). > 2012-08-13 02:10:01,649 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35555. Already tried 6 time(s). > 2012-08-13 02:10:02,649 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35555. Already tried 7 time(s). > 2012-08-13 02:10:03,650 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35555. Already tried 8 time(s). > 2012-08-13 02:10:04,650 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35555. Already tried 9 time(s). > 2012-08-13 02:10:04,651 ERROR org.apache.hadoop.mapred.TaskTracker: Caught > exception: java.net.ConnectException: Call to its-cs131/141.51.205.41:35555 > failed on connection exception: java.net.ConnectException: Connection refused > at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095) > at org.apache.hadoop.ipc.Client.call(Client.java:1071) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) > at org.apache.hadoop.mapred.$Proxy5.heartbeat(Unknown Source) > at > org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1857) > at > org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1653) > at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2503) > at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3744) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560) > at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202) > at org.apache.hadoop.ipc.Client.call(Client.java:1046) > ... 6 more > > > ########################### LOG TYPE 2 > ############################################################ > 2012-08-13 00:59:24,376 INFO org.apache.hadoop.mapred.TaskTracker: > STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting TaskTracker > STARTUP_MSG: host = its-cs133.its.uni-kassel.de/141.51.205.43 > STARTUP_MSG: args = [] > STARTUP_MSG: version = 1.0.2 > STARTUP_MSG: build = > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r > 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012 > ************************************************************/ > 2012-08-13 00:59:24,569 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: > loaded properties from hadoop-metrics2.properties > 2012-08-13 00:59:24,626 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > MetricsSystem,sub=Stats registered. > 2012-08-13 00:59:24,627 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period > at 10 second(s). > 2012-08-13 00:59:24,627 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics system > started > 2012-08-13 00:59:24,950 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi > registered. > 2012-08-13 00:59:25,146 INFO org.mortbay.log: Logging to > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > org.mortbay.log.Slf4jLog > 2012-08-13 00:59:25,206 INFO org.apache.hadoop.http.HttpServer: Added global > filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) > 2012-08-13 00:59:25,232 INFO org.apache.hadoop.mapred.TaskLogsTruncater: > Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 > 2012-08-13 00:59:25,237 INFO org.apache.hadoop.mapred.TaskTracker: Starting > tasktracker with owner as bmacek > 2012-08-13 00:59:25,239 INFO org.apache.hadoop.mapred.TaskTracker: Good > mapred local directories are: /home/work/bmacek/hadoop/hdfs/tmp/mapred/local > 2012-08-13 00:59:25,244 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded > the native-hadoop library > 2012-08-13 00:59:25,255 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm > registered. > 2012-08-13 00:59:25,256 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > TaskTrackerMetrics registered. > 2012-08-13 00:59:25,279 INFO org.apache.hadoop.ipc.Server: Starting > SocketReader > 2012-08-13 00:59:25,282 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > RpcDetailedActivityForPort54850 registered. > 2012-08-13 00:59:25,282 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > RpcActivityForPort54850 registered. > 2012-08-13 00:59:25,287 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 54850: starting > 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 0 on 54850: starting > 2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 1 on 54850: starting > 2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.TaskTracker: > TaskTracker up at: localhost/127.0.0.1:54850 > 2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 3 on 54850: starting > 2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 2 on 54850: starting > 2012-08-13 00:59:25,289 INFO org.apache.hadoop.mapred.TaskTracker: Starting > tracker tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850 > 2012-08-13 00:59:26,321 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: its-cs131/141.51.205.41:35555. Already tried 0 time(s). > 2012-08-13 00:59:38,104 INFO org.apache.hadoop.mapred.TaskTracker: Starting > thread: Map-events fetcher for all reduce tasks on > tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850 > 2012-08-13 00:59:38,120 INFO org.apache.hadoop.util.ProcessTree: setsid > exited with exit code 0 > 2012-08-13 00:59:38,134 INFO org.apache.hadoop.mapred.TaskTracker: Using > ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@445e228 > 2012-08-13 00:59:38,137 WARN org.apache.hadoop.mapred.TaskTracker: > TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is > disabled. > 2012-08-13 00:59:38,145 INFO org.apache.hadoop.mapred.IndexCache: IndexCache > created with max memory = 10485760 > 2012-08-13 00:59:38,158 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > ShuffleServerMetrics registered. > 2012-08-13 00:59:38,161 INFO org.apache.hadoop.http.HttpServer: Port returned > by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening > the listener on 50060 > 2012-08-13 00:59:38,161 ERROR org.apache.hadoop.mapred.TaskTracker: Can not > start task tracker because java.net.BindException: Address already in use > at sun.nio.ch.Net.bind(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) > at > org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) > at org.apache.hadoop.http.HttpServer.start(HttpServer.java:581) > at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1502) > at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742) > > 2012-08-13 00:59:38,163 INFO org.apache.hadoop.mapred.TaskTracker: > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down TaskTracker at > its-cs133.its.uni-kassel.de/141.51.205.43 > ************************************************************/ >
