Re: [OMPI users] Multiple Subnet MPI Fail

2010-11-22 Thread Paul Monday (Parallel Scientific)
Thanks for the quick response ... I've been thinking about this today and tried 
a few things on my CentOS mini connected cluster ...

To use tcp btl I will have to set up a bridge on A with ib0 and ib1 
participating in the bridge, then tcp btl could be used as you suggest.  
Unfortunately, the obvious solution to use bridge-utils on CentOS does not 
support Infiniband adapters.

This is now straying out of MPI range to a networking issue ... any ideas would 
be greatly appreciated on bridging at the IP over IB tier in a cluster.  This 
must be a solved problem but I'm not having a lot of luck with google and the 
archives.

Paul Monday



On Nov 22, 2010, at 7:46 AM, Terry Dontje wrote:

> You're gonna have to use a protocol that can route through a machine, OFED 
> User Verbs (ie openib) does not do this.  The only way I know of to do this 
> via OMPI is with the tcp btl.
> 
> --td
> 
> On 11/22/2010 09:28 AM, Paul Monday (Parallel Scientific) wrote:
>> 
>> We've been using OpenMPI in a switched environment with success, but we've 
>> moved to a point to point environment to do some work.  Some of the nodes 
>> cannot talk directly to one another, sort of like this with computers A,B, C 
>> with A having two ports: 
>> 
>> A(1)(opensm)-->B 
>> A(2)(opensm)-->C 
>> 
>> B is not connected to C in any way. 
>> 
>> When we try to run our OpenMPI program we are receiving: 
>> At least one pair of MPI processes are unable to reach each other for 
>> MPI communications.  This means that no Open MPI device has indicated 
>> that it can be used to communicate between these processes.  This is 
>> an error; Open MPI requires that all MPI processes be able to reach 
>> each other.  This error can sometimes be the result of forgetting to 
>> specify the "self" BTL. 
>> 
>>   Process 1 ([[1581,1],5]) is on host: pg-B 
>>   Process 2 ([[1581,1],0]) is on host: pg-C 
>>   BTLs attempted: openib self sm 
>> 
>> Your MPI job is now going to abort; sorry. 
>> 
>> 
>> I hope I'm not being overly naive but, is their a way to join the subnets at 
>> the MPI layer?  It seems like IP over IB would be too high up the stack. 
>> 
>> Paul Monday 
>> ___ 
>> users mailing list 
>> us...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/users 
> 
> 
> -- 
> 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Multiple Subnet MPI Fail

2010-11-22 Thread Terry Dontje
You're gonna have to use a protocol that can route through a machine, 
OFED User Verbs (ie openib) does not do this.  The only way I know of to 
do this via OMPI is with the tcp btl.


--td

On 11/22/2010 09:28 AM, Paul Monday (Parallel Scientific) wrote:
We've been using OpenMPI in a switched environment with success, but 
we've moved to a point to point environment to do some work.  Some of 
the nodes cannot talk directly to one another, sort of like this with 
computers A,B, C with A having two ports:


A(1)(opensm)-->B
A(2)(opensm)-->C

B is not connected to C in any way.

When we try to run our OpenMPI program we are receiving:
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[1581,1],5]) is on host: pg-B
  Process 2 ([[1581,1],0]) is on host: pg-C
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.


I hope I'm not being overly naive but, is their a way to join the 
subnets at the MPI layer?  It seems like IP over IB would be too high 
up the stack.


Paul Monday
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





[OMPI users] Multiple Subnet MPI Fail

2010-11-22 Thread Paul Monday (Parallel Scientific)
We've been using OpenMPI in a switched environment with success, but 
we've moved to a point to point environment to do some work.  Some of 
the nodes cannot talk directly to one another, sort of like this with 
computers A,B, C with A having two ports:


A(1)(opensm)-->B
A(2)(opensm)-->C

B is not connected to C in any way.

When we try to run our OpenMPI program we are receiving:
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[1581,1],5]) is on host: pg-B
  Process 2 ([[1581,1],0]) is on host: pg-C
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.


I hope I'm not being overly naive but, is their a way to join the 
subnets at the MPI layer?  It seems like IP over IB would be too high up 
the stack.


Paul Monday