Just pulling from your debug here, it looks like you have some requirements 
that your provider cannot satisfy for OpenMPI.

checking info in util_getinfo
lf
libfabric:20561:lf:core:ofi_check_info():998<info> Unsupported capabilities
libfabric:20561:lf:core:ofi_check_info():999<info> Supported: FI_MSG, 
FI_MULTICAST, FI_RECV, FI_SEND
libfabric:20561:lf:core:ofi_check_info():999<info> Requested: FI_MSG, FI_RMA, 
FI_READ, FI_RECV, FI_SEND, FI_REMOTE_READ
checking info in util_getinfo
lf
libfabric:20561:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint type
libfabric:20561:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_DGRAM
libfabric:20561:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_MSG
libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf 
returned -61 (No data available)
libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider ofi_rxm 
returned -61 (No data available)
libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, 
skipping ofi_rxm
libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, 
skipping ofi_rxd
libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, 
skipping ofi_mrail
checking info in util_getinfo
lf
libfabric:20561:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint type
libfabric:20561:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_MSG
libfabric:20561:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_DGRAM
checking info in util_getinfo
lf
libfabric:20561:lf:core:ofi_check_mr_mode():510<info> Invalid memory 
registration mode
libfabric:20561:lf:core:ofi_check_mr_mode():511<info> Expected: 
libfabric:20561:lf:core:ofi_check_mr_mode():511<info> Given: 
libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf 
returned -61 (No data available)

-- Jim
 

On 11/13/19, 4:36 PM, "ofiwg on behalf of Don Fry" 
<[email protected] on behalf of [email protected]> wrote:

    Here is another run with the output suggested by James Swaro
    
    Don
    ________________________________________
    From: Don Fry
    Sent: Wednesday, November 13, 2019 2:26 PM
    To: Barrett, Brian; Hefty, Sean; Byrne, John (Labs); 
[email protected]
    Subject: Re: [ofiwg] noob questions
    
    attached is the output of mpirun with some of my debugging printf's
    
    Don
    ________________________________________
    From: Barrett, Brian <[email protected]>
    Sent: Wednesday, November 13, 2019 2:05 PM
    To: Don Fry; Hefty, Sean; Byrne, John (Labs); [email protected]
    Subject: Re: [ofiwg] noob questions
    
    That likely means that something failed in initializing the OFI provider.  
Without seeing the debugging output John mentioned, it's really hard to say 
*why* it failed to initialize.  There are many reasons, including not being 
able to conform to a bunch of provider assumptions that Open MPI has on its 
providers.
    
    Brian
    
    -----Original Message-----
    From: Don Fry <[email protected]>
    Date: Wednesday, November 13, 2019 at 2:01 PM
    To: "Barrett, Brian" <[email protected]>, "Hefty, Sean" 
<[email protected]>, "Byrne, John (Labs)" <[email protected]>, 
"[email protected]" <[email protected]>
    Subject: Re: [ofiwg] noob questions
    
        When I tried --mca pml cm it complains that "PML cm cannot be 
selected".  Maybe I needed to enable cm when I configured openmpi?  I didn't 
specifically enable or disable it.  It could also be that my getinfo routine 
doesn't have a capability set properly.
    
        my latest command line was:
        mpirun --mca pml cm --mca mtl ofi --mca mtl_ofi_provider_include 
"lf;ofi_rxm" ./mpi_latency (where lf is my provider)
    
        Thanks for the pointers, I will do some more debugging on my end.
    
        Don
        ________________________________________
        From: Barrett, Brian <[email protected]>
        Sent: Wednesday, November 13, 2019 12:53 PM
        To: Hefty, Sean; Byrne, John (Labs); Don Fry; 
[email protected]
        Subject: Re: [ofiwg] noob questions
    
        You can force Open MPI to use libfabric as its transport by adding 
"-mca pml cm -mca mtl ofi" to the mpirun command line.
    
        Brian
    
        -----Original Message-----
        From: ofiwg <[email protected]> on behalf of "Hefty, 
Sean" <[email protected]>
        Date: Wednesday, November 13, 2019 at 12:52 PM
        To: "Byrne, John (Labs)" <[email protected]>, Don Fry 
<[email protected]>, "[email protected]" 
<[email protected]>
        Subject: Re: [ofiwg] noob questions
    
            My guess is that OpenMPI has an internal socket transport that it 
is using.  You likely need to force MPI to use libfabric, but I don't know 
enough about OMPI to do that.
    
            Jeff (copied) likely knows the answer here, but you may need to 
create him a new meme for his assistance.
    
            - Sean
    
            > -----Original Message-----
            > From: ofiwg <[email protected]> On Behalf Of 
Byrne, John (Labs)
            > Sent: Wednesday, November 13, 2019 11:26 AM
            > To: Don Fry <[email protected]>; [email protected]
            > Subject: Re: [ofiwg] noob questions
            >
            > You only mention the dgram and msg types and the mtl_ofi 
component wants rdm. If you
            > don’t support rdm, I would have expected your getinfo routine to 
return error -61.  You
            > can try using the ofi_rxm provider with your provider to add rdm 
support, replacing
            > verbs in “--mca mtl_ofi_provider_include verbs;ofi_rxm” with your 
provider.
            >
            >
            >
            > openmpi transport selection is complex. Adding insane levels of 
verbosity can help you
            > understand what is happening. I tend to use: --mca 
mtl_base_verbose 100 --mca
            > btl_base_verbose 100 --mca pml_base_verbose 100
            >
            >
            >
            > John Byrne
            >
            >
            >
            > From: ofiwg [mailto:[email protected]] On 
Behalf Of Don Fry
            > Sent: Wednesday, November 13, 2019 10:54 AM
            > To: [email protected]
            > Subject: [ofiwg] noob questions
            >
            >
            >
            > I have written a libfabric provider for our hardware and it 
passes all the fabtests I
            > expect it to (dgram and msg).  I am trying to run some MPI tests 
using libfabrics under
            > openmpi (4.0.2).  When I run a simple ping-pong test using mpirun 
it sends and receives
            > the messages using the tcp/ip protocol.  It does call my 
fi_getinfo routine, but
            > doesn't use my provider send/receive routines.  I have rebuilt 
the libfabric library
            > disabling sockets, then again --disable-tcp, then --disable-udp, 
and fi_info reports
            > fewer and fewer providers until it only lists my provider, but 
each time I run the mpi
            > test, it still uses the ip protocol to exchange messages.
            >
            >
            >
            > When I configured openmpi I specified 
--with-libfabric=/usr/local/ and the libfabric
            > library is being loaded and executed.
            >
            >
            >
            > I am probably doing something obviously wrong, but I don't know 
enough about MPI or
            > maybe libfabric, so need some help. If this is the wrong list, 
redirect me.
            >
            > ​
            >
            > Any suggestions?
            >
            > Don
    
            _______________________________________________
            ofiwg mailing list
            [email protected]
            https://lists.openfabrics.org/mailman/listinfo/ofiwg
    
    
    
    
    

_______________________________________________
ofiwg mailing list
[email protected]
https://lists.openfabrics.org/mailman/listinfo/ofiwg

Reply via email to