This is the ifconfig output from the machine I'm used to submit the parallel job:
### ifconfig output - master node ### [root@lcbcpc02 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:15:17:10:53:C8 inet addr:128.178.54.74 Bcast:128.178.54.255 Mask:255.255.255.0 inet6 addr: fe80::215:17ff:fe10:53c8/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11563938 errors:0 dropped:0 overruns:0 frame:0 TX packets:6670398 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:16562149093 (15.4 GiB) TX bytes:1312532185 (1.2 GiB) Base address:0x2020 Memory:c2820000-c2840000 eth1 Link encap:Ethernet HWaddr 00:15:17:10:53:C9 inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::215:17ff:fe10:53c9/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Base address:0x2000 Memory:c2800000-c2820000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:468156 errors:0 dropped:0 overruns:0 frame:0 TX packets:468156 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:500286061 (477.1 MiB) TX bytes:500286061 (477.1 MiB) This is the ifconfig output from the "slave node": ### ifconfig output - slave node ### [root@lcbcpc04 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:15:17:10:53:74 inet addr:128.178.54.76 Bcast:128.178.54.255 Mask:255.255.255.0 inet6 addr: fe80::215:17ff:fe10:5374/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:320264 errors:0 dropped:0 overruns:0 frame:0 TX packets:151942 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:139280839 (132.8 MiB) TX bytes:82889237 (79.0 MiB) Base address:0x2020 Memory:c2820000-c2840000 eth1 Link encap:Ethernet HWaddr 00:15:17:10:53:75 inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::215:17ff:fe10:5375/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Base address:0x2000 Memory:c2800000-c2820000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:2820 errors:0 dropped:0 overruns:0 frame:0 TX packets:2820 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2178053 (2.0 MiB) TX bytes:2178053 (2.0 MiB) Thanks Jeff!!! Jeff Squyres wrote: > I'm assuming that these are Linux hosts. If so, errno 111 is > "connection refused" possibly meaning that there is still some > firewall active or the wrong interface is being used to establish > connections between these machines. > > Can you send the output of "ifconfig" (might be /sbin/ifconfig on > your machine?) from both machines? > > > On Feb 11, 2007, at 3:45 PM, matteo.guglie...@epfl.ch wrote: > > >> Since I've installed openmpi I cannot submit any job that uses cpus >> from >> different machines. >> >> ### hostfile ### >> lcbcpc02.epfl.ch slots=4 max-slots=4 >> lcbcpc04.epfl.ch slots=4 max-slots=4 >> ################ >> >> ### error message ### >> [matteo@lcbcpc02 TEST]$ mpirun --hostfile ~matteo/hostfile -np 8 >> /home/matteo/Software/NWChem/5.0/bin/nwchem ./nwchem.nw >> [0,1,5][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: >> 572:mca_btl_tcp_endpoint_complete_connect] >> [0,1,6][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: >> 572:mca_btl_tcp_endpoint_complete_connect] >> connect() failed with errno=111 >> 6: lcbcpc04.epfl.ch len=16 >> [0,1,4][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: >> 572:mca_btl_tcp_endpoint_complete_connect] >> connect() failed with errno=111 >> 4: lcbcpc04.epfl.ch len=16 >> [0,1,7][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: >> 572:mca_btl_tcp_endpoint_complete_connect] >> connect() failed with errno=111 >> 7: lcbcpc04.epfl.ch len=16 >> connect() failed with errno=111 >> 5: lcbcpc04.epfl.ch len=16 >> ##################### >> >> I did disable the firewall on both machines but I still get that >> error message. >> >> Thanks, >> MG. >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > >