On Sep 29, 2009, at 9:58 AM, <worl...@ukr.net> <worl...@ukr.net> wrote:

> Open MPI should just "figure it out" and do the Right Thing at run-
> time -- is that not happening?
you are right it should.
But I want to exclude any traffic from OpenMPI communications, like NFS, traffic from other jobs and so on.
And use only special ethernet interface for this purpose.

So I have OpenMPI 1.3.3 installed on all nodes and head node in the same directory.
OS is the same on all cluster - debian 5.0
On nodes I have two interfaces eth0 - for NFS and so on...
and eht1 for OpenMPI.
On head node I have 5 interfaces: eth0 for NFS, eth4 for OpenMPI
Network is next:
1) Head node eht0 + nodes eht0    : 192.168.0.0/24
2) Head node eth4 + nodes eth1    : 192.168.1.0/24

So how I can configure OpenMPI for using only network 2) for my purpose?


Try using "--mca btl_tcp_if_exclude eth0 --mca oob_tcp_if_exclude eth0". This will tell all machines not to use eth0. The only other network available is eth4 or eth1, so it should do the Right thing.

Note that Open MPI has *two* TCP subsystems: the one used for MPI communications and the one used for "out of band" communications. BTL is the MPI communication subsystem; "oob" is the Out of Band communications subsystem.

Other problem is next:
I try to run some examples. But unfortunately it is not work.
My be it is not correctly configured network.

I can submit any jobs only on one host from this host.
When I submit from head node for example to other nodes it hangs without any messages. And on node where I want to calculate I see that here is started orted daemon.
(I use default config files)

Below is examples:
mpirun -v --mca btl self,sm,tcp --mca btl_base_verbose 30 --mca btl_tcp_if_include eth0 -np 2 -host n10,n11 cpi
no output, no calculations, only orted daemon on nodes

mpirun -v --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 2 - host n10,n11 cpi
the same as abowe

mpirun -v --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 2 - host n00,n00 cpi
n00 is head node - it works and produces output.


It sounds like OMPI is getting confused between the non-uniform networks. I have heard reports of OMPI not liking networks with different interface names, but it's not immediately obvious why the interface names are relevant to OMPI's selection criteria (and not enough details are available in the reports I heard before).

Try the *_if_exclude methods above and see if that works for you. If not, let us know.

--
Jeff Squyres
jsquy...@cisco.com

Reply via email to