Re: [OMPI users] openMPI 1.1.4 - connect() failed with errno=111
Jeff Squyres wrote: > On Feb 12, 2007, at 2:34 PM, Matteo Guglielmi wrote: > > >> Those nic "eth1" are not connected at all... all the machines use >> only the eth0 >> interface which have different IP for each PC. >> > > Gotcha. But, FWIW, OMPI doesn't know that because they have valid IP > addresses. So it thinks they're on the same subnet (on the same > host, actually), and therefore thinks that they should be routable. > > >> Anyway you solved my problem suggesting me those FAQ entries!!! >> --mca btl_tcp_if_exclude lo,eth1 that's the magic option which >> works for me!!! >> > > Excellent -- glad to help. > > Another solution might be to simply disable those NICs since they're > not hooked up to anything; then OMPI should work without any options. > Yep that's even better! > Good luck! > > Thanks again, I was playing around with the firewall so far and couldn't get any solution out of it... and now I know why... because the problem wasn't there!!! Oh my gosh... you helped me a lot! Cheers, MG.
Re: [OMPI users] openMPI 1.1.4 - connect() failed with errno=111
Jeff Squyres wrote: > On Feb 12, 2007, at 12:54 PM, Matteo Guglielmi wrote: > > >> This is the ifconfig output from the machine I'm used to submit the >> parallel job: >> > > It looks like both of your nodes share an IP address: > > >> [root@lcbcpc02 ~]# ifconfig >> eth1 Link encap:Ethernet HWaddr 00:15:17:10:53:C9 >> inet addr:192.168.0.1 Bcast:192.168.0.255 Mask: >> 255.255.255.0 >> [root@lcbcpc04 ~]# ifconfig >> eth1 Link encap:Ethernet HWaddr 00:15:17:10:53:75 >> inet addr:192.168.0.1 Bcast:192.168.0.255 Mask: >> 255.255.255.0 >> > > This will be problematic to more than just OMPI if these two > interfaces are on the same network. The solution is to ensure that > all your nodes have unique IP addresses. > > If these NICs are on different networks, than it's a valid network > configuration, but Open MPI (by default) will assume that these are > routable to each other. You can tell Open MPI to not use eth1 in > this case -- see this FAQ entries for details: > >http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network >http://www.open-mpi.org/faq/?category=tcp#tcp-selection >http://www.open-mpi.org/faq/?category=tcp#tcp-routability > > Those nic "eth1" are not connected at all... all the machines use only the eth0 interface which have different IP for each PC. Anyway you solved my problem suggesting me those FAQ entries!!! *--mca btl_tcp_if_exclude lo,eth1 that's the magic option which works for me!!! * Thanks Jeff!!! Thanks MG.
Re: [OMPI users] openMPI 1.1.4 - connect() failed with errno=111
On Feb 12, 2007, at 12:54 PM, Matteo Guglielmi wrote: This is the ifconfig output from the machine I'm used to submit the parallel job: It looks like both of your nodes share an IP address: [root@lcbcpc02 ~]# ifconfig eth1 Link encap:Ethernet HWaddr 00:15:17:10:53:C9 inet addr:192.168.0.1 Bcast:192.168.0.255 Mask: 255.255.255.0 [root@lcbcpc04 ~]# ifconfig eth1 Link encap:Ethernet HWaddr 00:15:17:10:53:75 inet addr:192.168.0.1 Bcast:192.168.0.255 Mask: 255.255.255.0 This will be problematic to more than just OMPI if these two interfaces are on the same network. The solution is to ensure that all your nodes have unique IP addresses. If these NICs are on different networks, than it's a valid network configuration, but Open MPI (by default) will assume that these are routable to each other. You can tell Open MPI to not use eth1 in this case -- see this FAQ entries for details: http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network http://www.open-mpi.org/faq/?category=tcp#tcp-selection http://www.open-mpi.org/faq/?category=tcp#tcp-routability -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] openMPI 1.1.4 - connect() failed with errno=111
I'm assuming that these are Linux hosts. If so, errno 111 is "connection refused" possibly meaning that there is still some firewall active or the wrong interface is being used to establish connections between these machines. Can you send the output of "ifconfig" (might be /sbin/ifconfig on your machine?) from both machines? On Feb 11, 2007, at 3:45 PM, matteo.guglie...@epfl.ch wrote: Since I've installed openmpi I cannot submit any job that uses cpus from different machines. ### hostfile ### lcbcpc02.epfl.ch slots=4 max-slots=4 lcbcpc04.epfl.ch slots=4 max-slots=4 ### error message ### [matteo@lcbcpc02 TEST]$ mpirun --hostfile ~matteo/hostfile -np 8 /home/matteo/Software/NWChem/5.0/bin/nwchem ./nwchem.nw [0,1,5][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect] [0,1,6][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=111 6: lcbcpc04.epfl.ch len=16 [0,1,4][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=111 4: lcbcpc04.epfl.ch len=16 [0,1,7][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=111 7: lcbcpc04.epfl.ch len=16 connect() failed with errno=111 5: lcbcpc04.epfl.ch len=16 # I did disable the firewall on both machines but I still get that error message. Thanks, MG. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems