Short version: 
--------------

What you really want is:

    mpirun --mca pml ob1 ...

The "--mca mtl ^psm" way will get the same result, but forcing pml=ob1 is 
really a slightly Better solution (from a semantic perspective)

More detail:
------------

Similarly, there's actually 3 different PMLs (PML = point-to-point message 
layer -- it's the layer that effects MPI point-to-point semantics, and drives 
an underlying transport layer).  Here's a section from the README:

- There are three MPI network models available: "ob1", "csum", and
  "cm".  "ob1" and "csum" use BTL ("Byte Transfer Layer") components
  for each supported network.  "cm" uses MTL ("Matching Transport
  Layer") components for each supported network.

  - "ob1" supports a variety of networks that can be used in
    combination with each other (per OS constraints; e.g., there are
    reports that the GM and OpenFabrics kernel drivers do not operate
    well together):

    - OpenFabrics: InfiniBand, iWARP, and RoCE
    - Loopback (send-to-self)
    - Myrinet MX and Open-MX
    - Portals
    - Quadrics Elan
    - Shared memory
    - TCP
    - SCTP
    - uDAPL
    - Windows Verbs

  - "csum" is exactly the same as "ob1", except that it performs
    additional data integrity checks to ensure that the received data
    is intact (vs. trusting the underlying network to deliver the data
    correctly).  csum supports all the same networks as ob1, but there
    is a performance penalty for the additional integrity checks.

  - "cm" supports a smaller number of networks (and they cannot be
    used together), but may provide better better overall MPI
    performance:

    - Myrinet MX and Open-MX
    - InfiniPath PSM
    - Mellanox MXM
    - Portals

    Open MPI will, by default, choose to use "cm" when the InfiniPath
    PSM or the Mellanox MXM MTL can be used.  Otherwise, "ob1" will be
    used and the corresponding BTLs will be selected.  "csum" will never
    be selected by default.  Users can force the use of ob1 or cm if
    desired by setting the "pml" MCA parameter at run-time:

      shell$ mpirun --mca pml ob1 ...
      or
      shell$ mpirun --mca pml csum ...
      or
      shell$ mpirun --mca pml cm ...

This means that: if you force ob1 (or csum), then only BTLs will be used.  If 
you force cm, then only MTLs will be used.  If you don't specify which PML to 
use, then OMPI will prefer cm/MTLs (if it finds any available MTLs) over 
ob1/BTLs.



On Oct 15, 2013, at 12:38 PM, Kevin M. Hildebrand <ke...@umd.edu> wrote:

> Ahhh, that's the piece I was missing.  I've been trying to debug everything I 
> could think of related to 'btl', and was completely unaware that 'mtl' was 
> also a transport.
> 
> If I run a job using --mca mtl ^psm, it does indeed run properly across all 
> of my nodes.  (Whether or not that's the 'right' thing to do is yet to be 
> determined.)
> 
> Thanks for your help!
> 
> Kevin
> 
> 
> -----Original Message-----
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Dave Love
> Sent: Tuesday, October 15, 2013 10:16 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Need help running jobs across different IB vendors
> 
> "Kevin M. Hildebrand" <ke...@umd.edu> writes:
> 
>> Hi, I'm trying to run an OpenMPI 1.6.5 job across a set of nodes, some
>> with Mellanox cards and some with Qlogic cards.
> 
> Maybe you shouldn't...  (I'm blessed in one cluster with three somewhat
> incompatible types of QLogic card and a set of Mellanox ones, but
> they're in separate islands, apart from the two different SDR ones.)
> 
>> I'm getting errors indicating "At least one pair of MPI processes are unable 
>> to reach each other for MPI communications".  As far as I can tell all of 
>> the nodes are properly configured and able to reach each other, via IP and 
>> non-IP connections.
>> I've also discovered that even if I turn off the IB transport via "--mca btl 
>> tcp,self" I'm still getting the same issue.
>> The test works fine if I run it confined to hosts with identical IB cards.
>> I'd appreciate some assistance in figuring out what I'm doing wrong.
> 
> I assume the QLogic cards are using PSM.  You'd need to force them to
> use openib with something like --mca mtl ^psm and make sure they have
> the ipathverbs library available.  You probably won't like the resulting
> performance -- users here noticed when one set fell back to openib from
> psm recently.
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to