This is the ifconfig output from the machine I'm used to submit the
parallel job:

### ifconfig output - master node ###

[root@lcbcpc02 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:15:17:10:53:C8 
          inet addr:128.178.54.74  Bcast:128.178.54.255  Mask:255.255.255.0
          inet6 addr: fe80::215:17ff:fe10:53c8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11563938 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6670398 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:16562149093 (15.4 GiB)  TX bytes:1312532185 (1.2 GiB)
          Base address:0x2020 Memory:c2820000-c2840000

eth1      Link encap:Ethernet  HWaddr 00:15:17:10:53:C9 
          inet addr:192.168.0.1  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::215:17ff:fe10:53c9/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Base address:0x2000 Memory:c2800000-c2820000

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:468156 errors:0 dropped:0 overruns:0 frame:0
          TX packets:468156 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:500286061 (477.1 MiB)  TX bytes:500286061 (477.1 MiB)




This is the ifconfig output from the "slave node":

### ifconfig output - slave node ###

[root@lcbcpc04 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:15:17:10:53:74 
          inet addr:128.178.54.76  Bcast:128.178.54.255  Mask:255.255.255.0
          inet6 addr: fe80::215:17ff:fe10:5374/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:320264 errors:0 dropped:0 overruns:0 frame:0
          TX packets:151942 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:139280839 (132.8 MiB)  TX bytes:82889237 (79.0 MiB)
          Base address:0x2020 Memory:c2820000-c2840000

eth1      Link encap:Ethernet  HWaddr 00:15:17:10:53:75 
          inet addr:192.168.0.1  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::215:17ff:fe10:5375/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Base address:0x2000 Memory:c2800000-c2820000

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:2820 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2820 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2178053 (2.0 MiB)  TX bytes:2178053 (2.0 MiB)


Thanks Jeff!!!



Jeff Squyres wrote:
> I'm assuming that these are Linux hosts.  If so, errno 111 is  
> "connection refused" possibly meaning that there is still some  
> firewall active or the wrong interface is being used to establish  
> connections between these machines.
>
> Can you send the output of "ifconfig" (might be /sbin/ifconfig on  
> your machine?) from both machines?
>
>
> On Feb 11, 2007, at 3:45 PM, matteo.guglie...@epfl.ch wrote:
>
>   
>> Since I've installed openmpi I cannot submit any job that uses cpus  
>> from
>> different machines.
>>
>> ### hostfile ###
>> lcbcpc02.epfl.ch slots=4 max-slots=4
>> lcbcpc04.epfl.ch slots=4 max-slots=4
>> ################
>>
>> ### error message ###
>> [matteo@lcbcpc02 TEST]$ mpirun --hostfile ~matteo/hostfile -np 8
>> /home/matteo/Software/NWChem/5.0/bin/nwchem ./nwchem.nw
>> [0,1,5][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: 
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> [0,1,6][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: 
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> connect() failed with errno=111
>> 6: lcbcpc04.epfl.ch len=16
>> [0,1,4][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: 
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> connect() failed with errno=111
>> 4: lcbcpc04.epfl.ch len=16
>> [0,1,7][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: 
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> connect() failed with errno=111
>> 7: lcbcpc04.epfl.ch len=16
>> connect() failed with errno=111
>> 5: lcbcpc04.epfl.ch len=16
>> #####################
>>
>> I did disable the firewall on both machines but I still get that  
>> error message.
>>
>> Thanks,
>> MG.
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>     
>
>
>   

Reply via email to