Re: [OMPI users] Problem running an mpi applicatio​n on nodes with more than one interface

2012-02-17 Thread Jeff Squyres
On Feb 17, 2012, at 11:59 AM, Jingcha Joba wrote: > Also, I am looking for a good way to start understanding the implementation > level details for OpenMPI. Can you point me to some good source? > (PS: To start with, I have already read the FAQ section) Unfortunately, there isn't a lot of good

Re: [OMPI users] Problems with gridengine integration on RHEL 6

2012-02-17 Thread Ralph Castain
Hi guys Our apologies - the rshd launcher isn't supposed to be in a release branch. We've removed it for the next release. Sorry for the problem... :-( On Fri, Feb 17, 2012 at 11:42 AM, Brian McNally wrote: > Dave, > > Thanks for the suggestion, adding "-mca plm ^rshd" did

Re: [OMPI users] Problems with gridengine integration on RHEL 6

2012-02-17 Thread Brian McNally
Dave, Thanks for the suggestion, adding "-mca plm ^rshd" did force mpirun to spawn things via qrsh rather than SSH. My problem is solved! -- Brian McNally On 02/16/2012 03:05 AM, Dave Love wrote: Brian McNally writes: Hi Dave, I looked through the INSTALL, VERSION,

Re: [OMPI users] Problem running an mpi applicatio​n on nodes with more than one interface

2012-02-17 Thread Jingcha Joba
Yes. I did. Because it was a same NIC with two ports each capable of delivering 5gb/s, I never thought that they should be in different subnet. But once I changed the subnet for one of the ports on both the nodes, it seemed to work.. Also, I am looking for a good way to start understanding the

Re: [OMPI users] Problem running an mpi applicatio​n on nodes with more than one interface

2012-02-17 Thread Richard Bardwell
Yes, they were on the same subnet. I guess that is the problem. - Original Message - From: "Jeff Squyres" To: "Open MPI Users" Sent: Friday, February 17, 2012 4:20 PM Subject: Re: [OMPI users] Problem running an mpi applicatio​n on nodes with

Re: [OMPI users] Problem running an mpi applicatio​n on nodes with more than one interface

2012-02-17 Thread Jeff Squyres
Did you have both of the ethernet ports on the same subnet, or were they on different subnets? On Feb 17, 2012, at 5:36 AM, Richard Bardwell wrote: > I had exactly the same problem. > Trying to run mpi between 2 separate machines, with each machine having > 2 ethernet ports, causes really

Re: [OMPI users] Problem running an mpi applicatio​n on nodes with more than one interface

2012-02-17 Thread Jeff Squyres
+1 It is definitely bad Linux practice to have 2 ports on the same subnet. If you still want that configuration, however (e.g., you have some conditions in your environment that make it workable), you can make Open MPI only use one or more of those interfaces via the btl_tcp_if_include (or

Re: [OMPI users] Problem running an mpi applicatio​n on nodes with more than one interface

2012-02-17 Thread Rolf vandeVaart
Open MPI cannot handle having two interfaces on a node on the same subnet. I believe it has to do with our matching code when we try to match up a connection. The result is a hang as you observe. I also believe it is not good practice to have two interfaces on the same subnet. If you put them

Re: [OMPI users] Problem running an mpi applicatio​n on nodes with more than one interface

2012-02-17 Thread Richard Bardwell
I had exactly the same problem. Trying to run mpi between 2 separate machines, with each machine having 2 ethernet ports, causes really weird behaviour on the most basic code. I had to disable one of the ethernet ports on each of the machines and it worked just fine after that. No idea why though