When I tried --mca pml cm it complains that "PML cm cannot be selected".  Maybe 
I needed to enable cm when I configured openmpi?  I didn't specifically enable 
or disable it.  It could also be that my getinfo routine doesn't have a 
capability set properly. 

my latest command line was:
mpirun --mca pml cm --mca mtl ofi --mca mtl_ofi_provider_include "lf;ofi_rxm" 
./mpi_latency (where lf is my provider)

Thanks for the pointers, I will do some more debugging on my end.

Don
________________________________________
From: Barrett, Brian <[email protected]>
Sent: Wednesday, November 13, 2019 12:53 PM
To: Hefty, Sean; Byrne, John (Labs); Don Fry; [email protected]
Subject: Re: [ofiwg] noob questions

You can force Open MPI to use libfabric as its transport by adding "-mca pml cm 
-mca mtl ofi" to the mpirun command line.

Brian

-----Original Message-----
From: ofiwg <[email protected]> on behalf of "Hefty, Sean" 
<[email protected]>
Date: Wednesday, November 13, 2019 at 12:52 PM
To: "Byrne, John (Labs)" <[email protected]>, Don Fry <[email protected]>, 
"[email protected]" <[email protected]>
Subject: Re: [ofiwg] noob questions

    My guess is that OpenMPI has an internal socket transport that it is using. 
 You likely need to force MPI to use libfabric, but I don't know enough about 
OMPI to do that.

    Jeff (copied) likely knows the answer here, but you may need to create him 
a new meme for his assistance.

    - Sean

    > -----Original Message-----
    > From: ofiwg <[email protected]> On Behalf Of Byrne, 
John (Labs)
    > Sent: Wednesday, November 13, 2019 11:26 AM
    > To: Don Fry <[email protected]>; [email protected]
    > Subject: Re: [ofiwg] noob questions
    >
    > You only mention the dgram and msg types and the mtl_ofi component wants 
rdm. If you
    > don’t support rdm, I would have expected your getinfo routine to return 
error -61.  You
    > can try using the ofi_rxm provider with your provider to add rdm support, 
replacing
    > verbs in “--mca mtl_ofi_provider_include verbs;ofi_rxm” with your 
provider.
    >
    >
    >
    > openmpi transport selection is complex. Adding insane levels of verbosity 
can help you
    > understand what is happening. I tend to use: --mca mtl_base_verbose 100 
--mca
    > btl_base_verbose 100 --mca pml_base_verbose 100
    >
    >
    >
    > John Byrne
    >
    >
    >
    > From: ofiwg [mailto:[email protected]] On Behalf Of Don 
Fry
    > Sent: Wednesday, November 13, 2019 10:54 AM
    > To: [email protected]
    > Subject: [ofiwg] noob questions
    >
    >
    >
    > I have written a libfabric provider for our hardware and it passes all 
the fabtests I
    > expect it to (dgram and msg).  I am trying to run some MPI tests using 
libfabrics under
    > openmpi (4.0.2).  When I run a simple ping-pong test using mpirun it 
sends and receives
    > the messages using the tcp/ip protocol.  It does call my fi_getinfo 
routine, but
    > doesn't use my provider send/receive routines.  I have rebuilt the 
libfabric library
    > disabling sockets, then again --disable-tcp, then --disable-udp, and 
fi_info reports
    > fewer and fewer providers until it only lists my provider, but each time 
I run the mpi
    > test, it still uses the ip protocol to exchange messages.
    >
    >
    >
    > When I configured openmpi I specified --with-libfabric=/usr/local/ and 
the libfabric
    > library is being loaded and executed.
    >
    >
    >
    > I am probably doing something obviously wrong, but I don't know enough 
about MPI or
    > maybe libfabric, so need some help. If this is the wrong list, redirect 
me.
    >
    > ​
    >
    > Any suggestions?
    >
    > Don

    _______________________________________________
    ofiwg mailing list
    [email protected]
    https://lists.openfabrics.org/mailman/listinfo/ofiwg


_______________________________________________
ofiwg mailing list
[email protected]
https://lists.openfabrics.org/mailman/listinfo/ofiwg

Reply via email to