That likely means that something failed in initializing the OFI provider.  
Without seeing the debugging output John mentioned, it's really hard to say 
*why* it failed to initialize.  There are many reasons, including not being 
able to conform to a bunch of provider assumptions that Open MPI has on its 
providers.

Brian

-----Original Message-----
From: Don Fry <[email protected]>
Date: Wednesday, November 13, 2019 at 2:01 PM
To: "Barrett, Brian" <[email protected]>, "Hefty, Sean" 
<[email protected]>, "Byrne, John (Labs)" <[email protected]>, 
"[email protected]" <[email protected]>
Subject: Re: [ofiwg] noob questions

    When I tried --mca pml cm it complains that "PML cm cannot be selected".  
Maybe I needed to enable cm when I configured openmpi?  I didn't specifically 
enable or disable it.  It could also be that my getinfo routine doesn't have a 
capability set properly. 
    
    my latest command line was:
    mpirun --mca pml cm --mca mtl ofi --mca mtl_ofi_provider_include 
"lf;ofi_rxm" ./mpi_latency (where lf is my provider)
    
    Thanks for the pointers, I will do some more debugging on my end.
    
    Don
    ________________________________________
    From: Barrett, Brian <[email protected]>
    Sent: Wednesday, November 13, 2019 12:53 PM
    To: Hefty, Sean; Byrne, John (Labs); Don Fry; [email protected]
    Subject: Re: [ofiwg] noob questions
    
    You can force Open MPI to use libfabric as its transport by adding "-mca 
pml cm -mca mtl ofi" to the mpirun command line.
    
    Brian
    
    -----Original Message-----
    From: ofiwg <[email protected]> on behalf of "Hefty, 
Sean" <[email protected]>
    Date: Wednesday, November 13, 2019 at 12:52 PM
    To: "Byrne, John (Labs)" <[email protected]>, Don Fry 
<[email protected]>, "[email protected]" 
<[email protected]>
    Subject: Re: [ofiwg] noob questions
    
        My guess is that OpenMPI has an internal socket transport that it is 
using.  You likely need to force MPI to use libfabric, but I don't know enough 
about OMPI to do that.
    
        Jeff (copied) likely knows the answer here, but you may need to create 
him a new meme for his assistance.
    
        - Sean
    
        > -----Original Message-----
        > From: ofiwg <[email protected]> On Behalf Of Byrne, 
John (Labs)
        > Sent: Wednesday, November 13, 2019 11:26 AM
        > To: Don Fry <[email protected]>; [email protected]
        > Subject: Re: [ofiwg] noob questions
        >
        > You only mention the dgram and msg types and the mtl_ofi component 
wants rdm. If you
        > don’t support rdm, I would have expected your getinfo routine to 
return error -61.  You
        > can try using the ofi_rxm provider with your provider to add rdm 
support, replacing
        > verbs in “--mca mtl_ofi_provider_include verbs;ofi_rxm” with your 
provider.
        >
        >
        >
        > openmpi transport selection is complex. Adding insane levels of 
verbosity can help you
        > understand what is happening. I tend to use: --mca mtl_base_verbose 
100 --mca
        > btl_base_verbose 100 --mca pml_base_verbose 100
        >
        >
        >
        > John Byrne
        >
        >
        >
        > From: ofiwg [mailto:[email protected]] On Behalf Of 
Don Fry
        > Sent: Wednesday, November 13, 2019 10:54 AM
        > To: [email protected]
        > Subject: [ofiwg] noob questions
        >
        >
        >
        > I have written a libfabric provider for our hardware and it passes 
all the fabtests I
        > expect it to (dgram and msg).  I am trying to run some MPI tests 
using libfabrics under
        > openmpi (4.0.2).  When I run a simple ping-pong test using mpirun it 
sends and receives
        > the messages using the tcp/ip protocol.  It does call my fi_getinfo 
routine, but
        > doesn't use my provider send/receive routines.  I have rebuilt the 
libfabric library
        > disabling sockets, then again --disable-tcp, then --disable-udp, and 
fi_info reports
        > fewer and fewer providers until it only lists my provider, but each 
time I run the mpi
        > test, it still uses the ip protocol to exchange messages.
        >
        >
        >
        > When I configured openmpi I specified --with-libfabric=/usr/local/ 
and the libfabric
        > library is being loaded and executed.
        >
        >
        >
        > I am probably doing something obviously wrong, but I don't know 
enough about MPI or
        > maybe libfabric, so need some help. If this is the wrong list, 
redirect me.
        >
        > ​
        >
        > Any suggestions?
        >
        > Don
    
        _______________________________________________
        ofiwg mailing list
        [email protected]
        https://lists.openfabrics.org/mailman/listinfo/ofiwg
    
    
    

_______________________________________________
ofiwg mailing list
[email protected]
https://lists.openfabrics.org/mailman/listinfo/ofiwg

Reply via email to