Hi Michael,

well i can ssh from any node to any other without being prompted. The reason for this is, that my home dir is mounted in every server in the cluster.

If the machines are multihomed: i dont know. i could ask if this would be of importance.

Shall i?

Regards,
Elmar

Am 13.08.12 14:59, schrieb Michael Segel:
If the nodes can communicate and distribute data, then the odds are that the issue isn't going to be in his /etc/hosts.

A more relevant question is if he's running a firewall on each of these machines?

A simple test... ssh to one node, ping other nodes and the control nodes at random to see if they can see one another. Then check to see if there is a firewall running which would limit the types of traffic between nodes.

One other side note... are these machines multi-homed?

On Aug 13, 2012, at 7:51 AM, Mohammad Tariq <[email protected] <mailto:[email protected]>> wrote:

Hello there,

     Could you please share your /etc/hosts file, if you don't mind.

Regards,
    Mohammad Tariq



On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    i am currently trying to run my hadoop program on a cluster.
    Sadly though my datanodes and tasktrackers seem to have
    difficulties with their communication as their logs say:
    * Some datanodes and tasktrackers seem to have portproblems of
    some kind as it can be seen in the logs below. I wondered if this
    might be due to reasons correllated with the localhost entry in
    /etc/hosts as you can read in alot of posts with similar errors,
    but i checked the file neither localhost nor 127.0.0.1/127.0.1.1
    <http://127.0.0.1/127.0.1.1> is bound there. (although you can
    ping localhost... the technician of the cluster said he'd be
    looking for the mechanics resolving localhost)
    * The other nodes can not speak with the namenode and jobtracker
    (its-cs131). Although it is absolutely not clear, why this is
    happening: the "dfs -put" i do directly before the job is running
    fine, which seems to imply that communication between those
    servers is working flawlessly.

    Is there any reason why this might happen?


    Regards,
    Elmar

    LOGS BELOW:

    \____Datanodes

    After successfully putting the data to hdfs (at this point i
    thought namenode and datanodes have to communicate), i get the
    following errors when starting the job:

    There are 2 kinds of logs i found: the first one is big (about
    12MB) and looks like this:
    ############################### LOG TYPE 1
    ############################################################
    2012-08-13 08:23:27,331 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35554
    <http://141.51.205.41:35554/>. Already tried 0 time(s).
    2012-08-13 08:23:28,332 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35554
    <http://141.51.205.41:35554/>. Already tried 1 time(s).
    2012-08-13 08:23:29,332 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35554
    <http://141.51.205.41:35554/>. Already tried 2 time(s).
    2012-08-13 08:23:30,332 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35554
    <http://141.51.205.41:35554/>. Already tried 3 time(s).
    2012-08-13 08:23:31,333 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35554
    <http://141.51.205.41:35554/>. Already tried 4 time(s).
    2012-08-13 08:23:32,333 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35554
    <http://141.51.205.41:35554/>. Already tried 5 time(s).
    2012-08-13 08:23:33,334 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35554
    <http://141.51.205.41:35554/>. Already tried 6 time(s).
    2012-08-13 08:23:34,334 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35554
    <http://141.51.205.41:35554/>. Already tried 7 time(s).
    2012-08-13 08:23:35,334 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35554
    <http://141.51.205.41:35554/>. Already tried 8 time(s).
    2012-08-13 08:23:36,335 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35554
    <http://141.51.205.41:35554/>. Already tried 9 time(s).
    2012-08-13 08:23:36,335 WARN
    org.apache.hadoop.hdfs.server.datanode.DataNode:
    java.net.ConnectException: Call to its-cs131/141.51.205.41:35554
    <http://141.51.205.41:35554/> failed on connection exception:
    java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
        at org.apache.hadoop.ipc.Client.call(Client.java:1071)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
        at $Proxy5.sendHeartbeat(Unknown Source)
        at
    
org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:904)
        at
    org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458)
        at java.lang.Thread.run(Thread.java:619)
    Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
    sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at org.apache.hadoop.net
    
<http://org.apache.hadoop.net/>.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net
    <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:489)
        at
    org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
        at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
        at
    org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
        at org.apache.hadoop.ipc.Client.call(Client.java:1046)
        ... 5 more

    ... (this continues til the end of the log)

    The second is short kind:
    ########################### LOG TYPE 2
    ############################################################
    2012-08-13 00:59:19,038 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting DataNode
    STARTUP_MSG:   host = its-cs133.its.uni-kassel.de/141.51.205.43
    <http://its-cs133.its.uni-kassel.de/141.51.205.43>
    STARTUP_MSG:   args = []
    STARTUP_MSG:   version = 1.0.2
    STARTUP_MSG:   build =
    https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2
    -r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
    ************************************************************/
    2012-08-13 00:59:19,203 INFO
    org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
    from hadoop-metrics2.properties
    2012-08-13 00:59:19,216 INFO
    org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for
    source MetricsSystem,sub=Stats registered.
    2012-08-13 00:59:19,217 INFO
    org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
    snapshot period at 10 second(s).
    2012-08-13 00:59:19,218 INFO
    org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode
    metrics system started
    2012-08-13 00:59:19,306 INFO
    org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for
    source ugi registered.
    2012-08-13 00:59:19,346 INFO
    org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop
    library
    2012-08-13 00:59:20,482 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35554
    <http://141.51.205.41:35554/>. Already tried 0 time(s).
    2012-08-13 00:59:21,584 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Storage directory
    /home/work/bmacek/hadoop/hdfs/slave is not formatted.
    2012-08-13 00:59:21,584 INFO
    org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
    2012-08-13 00:59:21,787 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: Registered
    FSDatasetStatusMBean
    2012-08-13 00:59:21,897 INFO
    org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
    Shutting down all async disk service threads...
    2012-08-13 00:59:21,897 INFO
    org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
    All async disk service threads have been shut down.
    2012-08-13 00:59:21,898 ERROR
    org.apache.hadoop.hdfs.server.datanode.DataNode:
    java.net.BindException: Problem binding to /0.0.0.0:50010
    <http://0.0.0.0:50010/> : Address already in use
        at org.apache.hadoop.ipc.Server.bind(Server.java:227)
        at
    
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:404)
        at
    org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
        at
    
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
        at
    
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
        at
    
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
        at
    
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
        at
    org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
    Caused by: java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind(Native Method)
        at sun.nio.ch
    
<http://sun.nio.ch/>.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
        at sun.nio.ch
    <http://sun.nio.ch/>.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
        at org.apache.hadoop.ipc.Server.bind(Server.java:225)
        ... 7 more

    2012-08-13 00:59:21,899 INFO
    org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down DataNode at
    its-cs133.its.uni-kassel.de/141.51.205.43
    <http://its-cs133.its.uni-kassel.de/141.51.205.43>
    ************************************************************/





    \_____TastTracker
    With TaskTrackers it is the same: there are 2 kinds.
    ############################### LOG TYPE 1
    ############################################################
    2012-08-13 02:09:54,645 INFO
    org.apache.hadoop.mapred.TaskTracker: Resending 'status' to
    'its-cs131' with reponseId '879
    2012-08-13 02:09:55,646 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35555
    <http://141.51.205.41:35555/>. Already tried 0 time(s).
    2012-08-13 02:09:56,646 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35555
    <http://141.51.205.41:35555/>. Already tried 1 time(s).
    2012-08-13 02:09:57,647 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35555
    <http://141.51.205.41:35555/>. Already tried 2 time(s).
    2012-08-13 02:09:58,647 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35555
    <http://141.51.205.41:35555/>. Already tried 3 time(s).
    2012-08-13 02:09:59,648 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35555
    <http://141.51.205.41:35555/>. Already tried 4 time(s).
    2012-08-13 02:10:00,648 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35555
    <http://141.51.205.41:35555/>. Already tried 5 time(s).
    2012-08-13 02:10:01,649 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35555
    <http://141.51.205.41:35555/>. Already tried 6 time(s).
    2012-08-13 02:10:02,649 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35555
    <http://141.51.205.41:35555/>. Already tried 7 time(s).
    2012-08-13 02:10:03,650 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35555
    <http://141.51.205.41:35555/>. Already tried 8 time(s).
    2012-08-13 02:10:04,650 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35555
    <http://141.51.205.41:35555/>. Already tried 9 time(s).
    2012-08-13 02:10:04,651 ERROR
    org.apache.hadoop.mapred.TaskTracker: Caught exception:
    java.net.ConnectException: Call to its-cs131/141.51.205.41:35555
    <http://141.51.205.41:35555/> failed on connection exception:
    java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
        at org.apache.hadoop.ipc.Client.call(Client.java:1071)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
        at org.apache.hadoop.mapred.$Proxy5.heartbeat(Unknown Source)
        at
    
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1857)
        at
    org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1653)
        at
    org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2503)
        at
    org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3744)
    Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
    sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at org.apache.hadoop.net
    
<http://org.apache.hadoop.net/>.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net
    <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:489)
        at
    org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
        at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
        at
    org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
        at org.apache.hadoop.ipc.Client.call(Client.java:1046)
        ... 6 more


    ########################### LOG TYPE 2
    ############################################################
    2012-08-13 00:59:24,376 INFO
    org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting TaskTracker
    STARTUP_MSG:   host = its-cs133.its.uni-kassel.de/141.51.205.43
    <http://its-cs133.its.uni-kassel.de/141.51.205.43>
    STARTUP_MSG:   args = []
    STARTUP_MSG:   version = 1.0.2
    STARTUP_MSG:   build =
    https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2
    -r 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
    ************************************************************/
    2012-08-13 00:59:24,569 INFO
    org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
    from hadoop-metrics2.properties
    2012-08-13 00:59:24,626 INFO
    org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for
    source MetricsSystem,sub=Stats registered.
    2012-08-13 00:59:24,627 INFO
    org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
    snapshot period at 10 second(s).
    2012-08-13 00:59:24,627 INFO
    org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker
    metrics system started
    2012-08-13 00:59:24,950 INFO
    org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for
    source ugi registered.
    2012-08-13 00:59:25,146 INFO org.mortbay.log: Logging to
    org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
    org.mortbay.log.Slf4jLog
    2012-08-13 00:59:25,206 INFO org.apache.hadoop.http.HttpServer:
    Added global filtersafety
    (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
    2012-08-13 00:59:25,232 INFO
    org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
    truncater with mapRetainSize=-1 and reduceRetainSize=-1
    2012-08-13 00:59:25,237 INFO
    org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with
    owner as bmacek
    2012-08-13 00:59:25,239 INFO
    org.apache.hadoop.mapred.TaskTracker: Good mapred local
    directories are: /home/work/bmacek/hadoop/hdfs/tmp/mapred/local
    2012-08-13 00:59:25,244 INFO
    org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop
    library
    2012-08-13 00:59:25,255 INFO
    org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for
    source jvm registered.
    2012-08-13 00:59:25,256 INFO
    org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for
    source TaskTrackerMetrics registered.
    2012-08-13 00:59:25,279 INFO org.apache.hadoop.ipc.Server:
    Starting SocketReader
    2012-08-13 00:59:25,282 INFO
    org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for
    source RpcDetailedActivityForPort54850 registered.
    2012-08-13 00:59:25,282 INFO
    org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for
    source RpcActivityForPort54850 registered.
    2012-08-13 00:59:25,287 INFO org.apache.hadoop.ipc.Server: IPC
    Server Responder: starting
    2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC
    Server listener on 54850: starting
    2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC
    Server handler 0 on 54850: starting
    2012-08-13 00:59:25,288 INFO org.apache.hadoop.ipc.Server: IPC
    Server handler 1 on 54850: starting
    2012-08-13 00:59:25,289 INFO
    org.apache.hadoop.mapred.TaskTracker: TaskTracker up at:
    localhost/127.0.0.1:54850 <http://127.0.0.1:54850/>
    2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC
    Server handler 3 on 54850: starting
    2012-08-13 00:59:25,289 INFO org.apache.hadoop.ipc.Server: IPC
    Server handler 2 on 54850: starting
    2012-08-13 00:59:25,289 INFO
    org.apache.hadoop.mapred.TaskTracker: Starting tracker
    tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850
    <http://127.0.0.1:54850/>
    2012-08-13 00:59:26,321 INFO org.apache.hadoop.ipc.Client:
    Retrying connect to server: its-cs131/141.51.205.41:35555
    <http://141.51.205.41:35555/>. Already tried 0 time(s).
    2012-08-13 00:59:38,104 INFO
    org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events
    fetcher for all reduce tasks on
    tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850
    <http://127.0.0.1:54850/>
    2012-08-13 00:59:38,120 INFO org.apache.hadoop.util.ProcessTree:
    setsid exited with exit code 0
    2012-08-13 00:59:38,134 INFO
    org.apache.hadoop.mapred.TaskTracker: Using
    ResourceCalculatorPlugin :
    org.apache.hadoop.util.LinuxResourceCalculatorPlugin@445e228
    2012-08-13 00:59:38,137 WARN
    org.apache.hadoop.mapred.TaskTracker: TaskTracker's
    totalMemoryAllottedForTasks is -1. TaskMemoryManager is disabled.
    2012-08-13 00:59:38,145 INFO org.apache.hadoop.mapred.IndexCache:
    IndexCache created with max memory = 10485760
    2012-08-13 00:59:38,158 INFO
    org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for
    source ShuffleServerMetrics registered.
    2012-08-13 00:59:38,161 INFO org.apache.hadoop.http.HttpServer:
    Port returned by webServer.getConnectors()[0].getLocalPort()
    before open() is -1. Opening the listener on 50060
    2012-08-13 00:59:38,161 ERROR
    org.apache.hadoop.mapred.TaskTracker: Can not start task tracker
    because java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind(Native Method)
        at sun.nio.ch
    
<http://sun.nio.ch/>.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
        at sun.nio.ch
    <http://sun.nio.ch/>.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
        at
    
org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
        at org.apache.hadoop.http.HttpServer.start(HttpServer.java:581)
        at
    org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1502)
        at
    org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742)

    2012-08-13 00:59:38,163 INFO
    org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down TaskTracker at
    its-cs133.its.uni-kassel.de/141.51.205.43
    <http://its-cs133.its.uni-kassel.de/141.51.205.43>
    ************************************************************/




Reply via email to