Thankyou for that. I have not yet been able to get the IT department to help me with disabling the firewalls, but hopefully that is the problem. Sorry for the late response, I was hoping the IT department would be faster.
Robertson Message: 2 List-Post: users@lists.open-mpi.org Date: Fri, 6 Feb 2009 17:27:34 -0500 From: Jeff Squyres <jsquy...@cisco.com> Subject: Re: [OMPI users] OpenMPI hangs across multiple nodes. To: Open MPI Users <us...@open-mpi.org> Message-ID: <8ba0e4a5-fa7c-430b-8731-231ed6e67...@cisco.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Open MPI requires that there be no TCP firewall between hosts that are used in a single parallel job -- it uses random TCP ports between peers. On Feb 5, 2009, at 2:39 AM, Robertson Burgess wrote: > I have checked with IT. It is TCP. I have been told that there's a > firewall on the nodes. Should I open some ports on the firewall, and > if so, which ones? > > Robertson > >>>> Robertson Burgess 5/02/2009 5:09 pm >>> > Thankyou for your help. > I tried the command > mpirun -np 4 -host node1,node2 -mca btl tcp,self random > but still got the same result. > > I'm pretty sure that the communication between the nodes is TCP but > I'm not sure, I've emailedIT support to ask them, but am yet to hear > back from them. > Other than that I'm running the latest release of OMPI (1.3) and I > installed it on both nodes. And yes they are in the same absolute > paths. > My configuration was very standard: > > shell$ gunzip -c openmpi-1.3.tar.gz | tar xf - > shell$ cd openmpi-1.3 > shell$ ./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=/ > home/bburgess/bin/bin > shell$ make all install > > Again thankyou for your help, I'll have to investigate whether my > assumption about my connections being TCP are correct. When I was > setting it up at first, and before I'd configured the nodes to log > into each other without a password, I did get the message > > user@ node.newcastle.edu.au's password: > > In my log files, so it did at least seem to be reaching the other > node. Does that mean that my connections are working, or could it be > more to it than that? > > Robertson Burgess > > > Message: 2 > Date: Wed, 4 Feb 2009 15:37:44 +0200 > From: Lenny Verkhovsky <lenny.verkhov...@gmail.com> > Subject: Re: [OMPI users] OpenMPI hangs across multiple nodes. > To: Open MPI Users <us...@open-mpi.org> > Message-ID: > <453d39990902040537o45137abbh2f12db423d971...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > what kind of communication between nodes do you have - tcp, openib ( > IB/IWARP ) ? > you can try > > mpirun -np 4 -host node1,node2 -mca btl tcp,self random > > > > On Wed, Feb 4, 2009 at 1:21 AM, Ralph Castain <r...@lanl.gov> wrote: >> Could you tell us which version of OpenMPI you are using, and how >> it was >> configured? >> >> Did you install the OMPI libraries and binaries on both nodes? Are >> they in >> the same absolute path locations? >> >> Thanks >> Ralph >> >> >> On Feb 3, 2009, at 3:46 PM, Robertson Burgess wrote: >> >>> Dear users, >>> I am quite new to OpenMPI, I have compiled it on two nodes, each >>> node with >>> 8 CPU cores. The two nodes are identical. The code I am using >>> works in >>> parallel across the 8 cores on a single node. However, whenever I >>> try to run >>> across both nodes, OpenMPI simply hangs. There is no output >>> whatsoever, when >>> I run it in background, outputting to a log file, the log file is >>> always >>> empty. The cores do not appear to be doing anything at all, either >>> on the >>> host node or on the remote node. This happens whether I am running >>> my code, >>> or even if I when I tell it to run a process that doesn't even >>> exist, for >>> instance >>> >>> mpirun -np 4 -host node1,node2 random >>> >>> Simply results in the terminal hanging, so all I can do is close the >>> terminal and open up a new one. >>> >>> mpirun -np 4 -host node1,node2 random >& log.log & >>> >>> simply produces and empty log.log file >>> >>> I am running Redhat Linux on the systems, and compiled OpenMPI >>> with the >>> Intel Compilers 10.1. As I've said, it works fine on one node. I >>> have set up >>> both nodes such that they can log into each other via ssh without >>> the need >>> for a password, and I have altered my .bashrc file so the PATH and >>> LD_LIBRARY_PATH include the appropriate folders. >>> I have looked through the FAQ and mailing lists, but I was unable >>> to find >>> anything that really matched my problem. Any help would be greatly >>> appreciated. >>> >>> Sincerely, >>> Robertson Burgess >>> University of Newcastle >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ************************************** > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems ------------------------------