It will need ofi_rxd and/or ofi_rxm since it supports both DGRAM and MSG. 

attached is the info output with and without the debugging.

fi_info shows:
lf;ofi_rxm, 
lf;ofi_rxd, 
lf (msg) and 
lf (dgram)

I have not yet implemented the RMA or Remote_Read, and haven't looked at the 
difference between FI_READ and FI_RECV.

Don
________________________________________
From: Hefty, Sean <[email protected]>
Sent: Wednesday, November 13, 2019 2:42 PM
To: James Swaro; Don Fry; Barrett, Brian; Byrne, John (Labs); 
[email protected]
Subject: RE: [ofiwg] noob questions

Can you provide the output of fi_info -v for your provider?

From the output below, it looks like your provider will rely on the ofi_rxd 
utility provider for its functionality.  I.e. your provider supports DGRAM 
endpoints.  Can you confirm that?

- Sean

> -----Original Message-----
> From: James Swaro <[email protected]>
> Sent: Wednesday, November 13, 2019 2:39 PM
> To: Don Fry <[email protected]>; Barrett, Brian <[email protected]>; 
> Hefty, Sean
> <[email protected]>; Byrne, John (Labs) <[email protected]>;
> [email protected]
> Subject: Re: [ofiwg] noob questions
>
> Just pulling from your debug here, it looks like you have some requirements 
> that your
> provider cannot satisfy for OpenMPI.
>
> checking info in util_getinfo
> lf
> libfabric:20561:lf:core:ofi_check_info():998<info> Unsupported capabilities
> libfabric:20561:lf:core:ofi_check_info():999<info> Supported: FI_MSG, 
> FI_MULTICAST,
> FI_RECV, FI_SEND
> libfabric:20561:lf:core:ofi_check_info():999<info> Requested: FI_MSG, FI_RMA, 
> FI_READ,
> FI_RECV, FI_SEND, FI_REMOTE_READ
> checking info in util_getinfo
> lf
> libfabric:20561:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint 
> type
> libfabric:20561:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_DGRAM
> libfabric:20561:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_MSG
> libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf 
> returned -61
> (No data available)
> libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider 
> ofi_rxm returned
> -61 (No data available)
> libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, 
> skipping
> ofi_rxm
> libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, 
> skipping
> ofi_rxd
> libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, 
> skipping
> ofi_mrail
> checking info in util_getinfo
> lf
> libfabric:20561:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint 
> type
> libfabric:20561:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_MSG
> libfabric:20561:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_DGRAM
> checking info in util_getinfo
> lf
> libfabric:20561:lf:core:ofi_check_mr_mode():510<info> Invalid memory 
> registration mode
> libfabric:20561:lf:core:ofi_check_mr_mode():511<info> Expected:
> libfabric:20561:lf:core:ofi_check_mr_mode():511<info> Given:
> libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf 
> returned -61
> (No data available)
>
> -- Jim
>
>
> On 11/13/19, 4:36 PM, "ofiwg on behalf of Don Fry" 
> <[email protected]
> on behalf of [email protected]> wrote:
>
>     Here is another run with the output suggested by James Swaro
>
>     Don
>     ________________________________________
>     From: Don Fry
>     Sent: Wednesday, November 13, 2019 2:26 PM
>     To: Barrett, Brian; Hefty, Sean; Byrne, John (Labs); 
> [email protected]
>     Subject: Re: [ofiwg] noob questions
>
>     attached is the output of mpirun with some of my debugging printf's
>
>     Don
>     ________________________________________
>     From: Barrett, Brian <[email protected]>
>     Sent: Wednesday, November 13, 2019 2:05 PM
>     To: Don Fry; Hefty, Sean; Byrne, John (Labs); [email protected]
>     Subject: Re: [ofiwg] noob questions
>
>     That likely means that something failed in initializing the OFI provider. 
>  Without
> seeing the debugging output John mentioned, it's really hard to say *why* it 
> failed to
> initialize.  There are many reasons, including not being able to conform to a 
> bunch of
> provider assumptions that Open MPI has on its providers.
>
>     Brian
>
>     -----Original Message-----
>     From: Don Fry <[email protected]>
>     Date: Wednesday, November 13, 2019 at 2:01 PM
>     To: "Barrett, Brian" <[email protected]>, "Hefty, Sean" 
> <[email protected]>,
> "Byrne, John (Labs)" <[email protected]>, "[email protected]"
> <[email protected]>
>     Subject: Re: [ofiwg] noob questions
>
>         When I tried --mca pml cm it complains that "PML cm cannot be 
> selected".  Maybe
> I needed to enable cm when I configured openmpi?  I didn't specifically 
> enable or
> disable it.  It could also be that my getinfo routine doesn't have a 
> capability set
> properly.
>
>         my latest command line was:
>         mpirun --mca pml cm --mca mtl ofi --mca mtl_ofi_provider_include 
> "lf;ofi_rxm"
> ./mpi_latency (where lf is my provider)
>
>         Thanks for the pointers, I will do some more debugging on my end.
>
>         Don
>         ________________________________________
>         From: Barrett, Brian <[email protected]>
>         Sent: Wednesday, November 13, 2019 12:53 PM
>         To: Hefty, Sean; Byrne, John (Labs); Don Fry; 
> [email protected]
>         Subject: Re: [ofiwg] noob questions
>
>         You can force Open MPI to use libfabric as its transport by adding 
> "-mca pml cm
> -mca mtl ofi" to the mpirun command line.
>
>         Brian
>
>         -----Original Message-----
>         From: ofiwg <[email protected]> on behalf of 
> "Hefty, Sean"
> <[email protected]>
>         Date: Wednesday, November 13, 2019 at 12:52 PM
>         To: "Byrne, John (Labs)" <[email protected]>, Don Fry 
> <[email protected]>,
> "[email protected]" <[email protected]>
>         Subject: Re: [ofiwg] noob questions
>
>             My guess is that OpenMPI has an internal socket transport that it 
> is using.
> You likely need to force MPI to use libfabric, but I don't know enough about 
> OMPI to do
> that.
>
>             Jeff (copied) likely knows the answer here, but you may need to 
> create him
> a new meme for his assistance.
>
>             - Sean
>
>             > -----Original Message-----
>             > From: ofiwg <[email protected]> On Behalf Of 
> Byrne,
> John (Labs)
>             > Sent: Wednesday, November 13, 2019 11:26 AM
>             > To: Don Fry <[email protected]>; [email protected]
>             > Subject: Re: [ofiwg] noob questions
>             >
>             > You only mention the dgram and msg types and the mtl_ofi 
> component wants
> rdm. If you
>             > don’t support rdm, I would have expected your getinfo routine 
> to return
> error -61.  You
>             > can try using the ofi_rxm provider with your provider to add 
> rdm support,
> replacing
>             > verbs in “--mca mtl_ofi_provider_include verbs;ofi_rxm” with 
> your
> provider.
>             >
>             >
>             >
>             > openmpi transport selection is complex. Adding insane levels of 
> verbosity
> can help you
>             > understand what is happening. I tend to use: --mca 
> mtl_base_verbose 100 -
> -mca
>             > btl_base_verbose 100 --mca pml_base_verbose 100
>             >
>             >
>             >
>             > John Byrne
>             >
>             >
>             >
>             > From: ofiwg [mailto:[email protected]] On 
> Behalf Of Don
> Fry
>             > Sent: Wednesday, November 13, 2019 10:54 AM
>             > To: [email protected]
>             > Subject: [ofiwg] noob questions
>             >
>             >
>             >
>             > I have written a libfabric provider for our hardware and it 
> passes all
> the fabtests I
>             > expect it to (dgram and msg).  I am trying to run some MPI 
> tests using
> libfabrics under
>             > openmpi (4.0.2).  When I run a simple ping-pong test using 
> mpirun it
> sends and receives
>             > the messages using the tcp/ip protocol.  It does call my 
> fi_getinfo
> routine, but
>             > doesn't use my provider send/receive routines.  I have rebuilt 
> the
> libfabric library
>             > disabling sockets, then again --disable-tcp, then 
> --disable-udp, and
> fi_info reports
>             > fewer and fewer providers until it only lists my provider, but 
> each time
> I run the mpi
>             > test, it still uses the ip protocol to exchange messages.
>             >
>             >
>             >
>             > When I configured openmpi I specified 
> --with-libfabric=/usr/local/ and
> the libfabric
>             > library is being loaded and executed.
>             >
>             >
>             >
>             > I am probably doing something obviously wrong, but I don't know 
> enough
> about MPI or
>             > maybe libfabric, so need some help. If this is the wrong list, 
> redirect
> me.
>             >
>             >
>             >
>             > Any suggestions?
>             >
>             > Don
>
>             _______________________________________________
>             ofiwg mailing list
>             [email protected]
>             https://lists.openfabrics.org/mailman/listinfo/ofiwg
>
>
>
>
>

Attachment: info.dbg
Description: info.dbg

---
fi_info:
    caps: [ FI_MSG, FI_RMA, FI_TAGGED, FI_READ, FI_WRITE, FI_RECV, FI_SEND, 
FI_REMOTE_READ, FI_REMOTE_WRITE, FI_MULTI_RECV, FI_LOCAL_COMM, FI_REMOTE_COMM ]
    mode: [  ]
    addr_format: FI_SOCKADDR_IN
    src_addrlen: 16
    dest_addrlen: 0
    src_addr: fi_sockaddr_in://192.168.1.35:0
    dest_addr: (null)
    handle: (nil)
    fi_tx_attr:
        caps: [ FI_MSG, FI_RMA, FI_TAGGED, FI_READ, FI_WRITE, FI_RECV, FI_SEND, 
FI_REMOTE_READ, FI_REMOTE_WRITE, FI_SOURCE, FI_DIRECTED_RECV ]
        mode: [  ]
        op_flags: [  ]
        msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR, 
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, FI_ORDER_SAS ]
        comp_order: [ FI_ORDER_NONE ]
        inject_size: 16320
        size: 1024
        iov_limit: 4
        rma_iov_limit: 4
    fi_rx_attr:
        caps: [ FI_MSG, FI_RMA, FI_TAGGED, FI_READ, FI_WRITE, FI_RECV, FI_SEND, 
FI_REMOTE_READ, FI_REMOTE_WRITE, FI_MULTI_RECV, FI_SOURCE, FI_DIRECTED_RECV ]
        mode: [  ]
        op_flags: [  ]
        msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR, 
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, FI_ORDER_SAS ]
        comp_order: [ FI_ORDER_NONE ]
        total_buffered_recv: 0
        size: 1024
        iov_limit: 4
    fi_ep_attr:
        type: FI_EP_RDM
        protocol: FI_PROTO_RXM
        protocol_version: 1
        max_msg_size: 6552800
        msg_prefix_size: 0
        max_order_raw_size: 65528
        max_order_war_size: 65528
        max_order_waw_size: 65528
        mem_tag_format: 0xaaaaaaaaaaaaaaaa
        tx_ctx_cnt: 1
        rx_ctx_cnt: 1
        auth_key_size: 0
    fi_domain_attr:
        domain: 0x0
        name: lf
        threading: FI_THREAD_SAFE
        control_progress: FI_PROGRESS_AUTO
        data_progress: FI_PROGRESS_MANUAL
        resource_mgmt: FI_RM_ENABLED
        av_type: FI_AV_UNSPEC
        mr_mode: [ FI_MR_BASIC, FI_MR_SCALABLE ]
        mr_key_size: 0
        cq_data_size: 0
        cq_cnt: 65536
        ep_cnt: 32768
        tx_ctx_cnt: 1
        rx_ctx_cnt: 1
        max_ep_tx_ctx: 1
        max_ep_rx_ctx: 1
        max_ep_stx_ctx: 0
        max_ep_srx_ctx: 0
        cntr_cnt: 0
        mr_iov_limit: 1
    caps: [ FI_LOCAL_COMM, FI_REMOTE_COMM ]
    mode: [  ]
        auth_key_size: 0
        max_err_data: 0
        mr_cnt: 0
    fi_fabric_attr:
        name: lf
        prov_name: lf;ofi_rxm
        prov_version: 1.0
        api_version: 1.8
    nic_fid: (nil)
---
fi_info:
    caps: [ FI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC, FI_READ, FI_WRITE, FI_RECV, 
FI_SEND, FI_REMOTE_READ, FI_REMOTE_WRITE, FI_MULTI_RECV, FI_LOCAL_COMM, 
FI_REMOTE_COMM, FI_RMA_EVENT, FI_SOURCE, FI_DIRECTED_RECV ]
    mode: [  ]
    addr_format: FI_SOCKADDR_IN
    src_addrlen: 16
    dest_addrlen: 0
    src_addr: fi_sockaddr_in://192.168.1.35:0
    dest_addr: (null)
    handle: (nil)
    fi_tx_attr:
        caps: [ FI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC, FI_READ, FI_WRITE, 
FI_SEND, FI_MULTI_RECV, FI_RMA_EVENT, FI_SOURCE, FI_DIRECTED_RECV ]
        mode: [  ]
        op_flags: [ FI_COMPLETION, FI_INJECT, FI_INJECT_COMPLETE, 
FI_TRANSMIT_COMPLETE, FI_DELIVERY_COMPLETE ]
        msg_order: [ FI_ORDER_SAS ]
        comp_order: [ FI_ORDER_NONE ]
        inject_size: 3880
        size: 1024
        iov_limit: 4
        rma_iov_limit: 4
    fi_rx_attr:
        caps: [ FI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC, FI_RECV, FI_REMOTE_READ, 
FI_REMOTE_WRITE, FI_MULTI_RECV, FI_RMA_EVENT, FI_SOURCE, FI_DIRECTED_RECV ]
        mode: [  ]
        op_flags: [ FI_MULTI_RECV, FI_COMPLETION ]
        msg_order: [ FI_ORDER_SAS ]
        comp_order: [ FI_ORDER_NONE ]
        total_buffered_recv: 0
        size: 1024
        iov_limit: 4
    fi_ep_attr:
        type: FI_EP_RDM
        protocol: FI_PROTO_RXD
        protocol_version: 1
        max_msg_size: 18446744073709551615
        msg_prefix_size: 0
        max_order_raw_size: 18446744073709551615
        max_order_war_size: 0
        max_order_waw_size: 18446744073709551615
        mem_tag_format: 0xaaaaaaaaaaaaaaaa
        tx_ctx_cnt: 1
        rx_ctx_cnt: 1
        auth_key_size: 0
    fi_domain_attr:
        domain: 0x0
        name: lf
        threading: FI_THREAD_SAFE
        control_progress: FI_PROGRESS_MANUAL
        data_progress: FI_PROGRESS_MANUAL
        resource_mgmt: FI_RM_ENABLED
        av_type: FI_AV_UNSPEC
        mr_mode: [ FI_MR_BASIC, FI_MR_SCALABLE ]
        mr_key_size: 8
        cq_data_size: 8
        cq_cnt: 128
        ep_cnt: 128
        tx_ctx_cnt: 1
        rx_ctx_cnt: 1
        max_ep_tx_ctx: 1
        max_ep_rx_ctx: 1
        max_ep_stx_ctx: 0
        max_ep_srx_ctx: 0
        cntr_cnt: 0
        mr_iov_limit: 1
    caps: [ FI_LOCAL_COMM, FI_REMOTE_COMM ]
    mode: [  ]
        auth_key_size: 0
        max_err_data: 0
        mr_cnt: 0
    fi_fabric_attr:
        name: lf
        prov_name: lf;ofi_rxd
        prov_version: 1.0
        api_version: 1.8
    nic_fid: (nil)
---
fi_info:
    caps: [ FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND ]
    mode: [  ]
    addr_format: FI_SOCKADDR_IN
    src_addrlen: 16
    dest_addrlen: 0
    src_addr: fi_sockaddr_in://192.168.1.35:0
    dest_addr: (null)
    handle: (nil)
    fi_tx_attr:
        caps: [ FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_RECV, FI_SEND, 
FI_REMOTE_READ, FI_REMOTE_WRITE, FI_MULTI_RECV, FI_SHARED_AV ]
        mode: [  ]
        op_flags: [  ]
        msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR, 
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, FI_ORDER_SAS ]
        comp_order: [ FI_ORDER_STRICT ]
        inject_size: 0
        size: 1024
        iov_limit: 4
        rma_iov_limit: 4
    fi_rx_attr:
        caps: [ FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_RECV, FI_SEND, 
FI_REMOTE_READ, FI_REMOTE_WRITE, FI_MULTI_RECV, FI_SHARED_AV ]
        mode: [  ]
        op_flags: [  ]
        msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR, 
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, FI_ORDER_SAS ]
        comp_order: [ FI_ORDER_STRICT ]
        total_buffered_recv: 0
        size: 1024
        iov_limit: 4
    fi_ep_attr:
        type: FI_EP_MSG
        protocol: FI_PROTO_LF
        protocol_version: 0
        max_msg_size: 6552800
        msg_prefix_size: 0
        max_order_raw_size: 65528
        max_order_war_size: 65528
        max_order_waw_size: 65528
        mem_tag_format: 0x0000000000000000
        tx_ctx_cnt: 1
        rx_ctx_cnt: 1
        auth_key_size: 0
    fi_domain_attr:
        domain: 0x0
        name: lf
        threading: FI_THREAD_SAFE
        control_progress: FI_PROGRESS_AUTO
        data_progress: FI_PROGRESS_AUTO
        resource_mgmt: FI_RM_ENABLED
        av_type: FI_AV_UNSPEC
        mr_mode: [  ]
        mr_key_size: 0
        cq_data_size: 0
        cq_cnt: 256
        ep_cnt: 256
        tx_ctx_cnt: 256
        rx_ctx_cnt: 256
        max_ep_tx_ctx: 1
        max_ep_rx_ctx: 1
        max_ep_stx_ctx: 0
        max_ep_srx_ctx: 0
        cntr_cnt: 0
        mr_iov_limit: 0
    caps: [  ]
    mode: [  ]
        auth_key_size: 0
        max_err_data: 0
        mr_cnt: 0
    fi_fabric_attr:
        name: lf
        prov_name: lf
        prov_version: 0.1
        api_version: 1.8
    nic_fid: (nil)
---
fi_info:
    caps: [ FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND ]
    mode: [  ]
    addr_format: FI_SOCKADDR_IN
    src_addrlen: 16
    dest_addrlen: 0
    src_addr: fi_sockaddr_in://192.168.1.35:0
    dest_addr: (null)
    handle: (nil)
    fi_tx_attr:
        caps: [ FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND, FI_MULTI_RECV, 
FI_SHARED_AV ]
        mode: [  ]
        op_flags: [  ]
        msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR, 
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, FI_ORDER_SAS ]
        comp_order: [ FI_ORDER_STRICT ]
        inject_size: 0
        size: 1024
        iov_limit: 4
        rma_iov_limit: 0
    fi_rx_attr:
        caps: [ FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND, FI_MULTI_RECV, 
FI_SHARED_AV ]
        mode: [  ]
        op_flags: [  ]
        msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR, 
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, FI_ORDER_SAS ]
        comp_order: [ FI_ORDER_STRICT ]
        total_buffered_recv: 65536
        size: 1024
        iov_limit: 4
    fi_ep_attr:
        type: FI_EP_DGRAM
        protocol: FI_PROTO_LF
        protocol_version: 0
        max_msg_size: 6552800
        msg_prefix_size: 0
        max_order_raw_size: 65528
        max_order_war_size: 65528
        max_order_waw_size: 65528
        mem_tag_format: 0x0000000000000000
        tx_ctx_cnt: 1
        rx_ctx_cnt: 1
        auth_key_size: 0
    fi_domain_attr:
        domain: 0x0
        name: lf
        threading: FI_THREAD_SAFE
        control_progress: FI_PROGRESS_AUTO
        data_progress: FI_PROGRESS_AUTO
        resource_mgmt: FI_RM_ENABLED
        av_type: FI_AV_UNSPEC
        mr_mode: [  ]
        mr_key_size: 0
        cq_data_size: 0
        cq_cnt: 256
        ep_cnt: 256
        tx_ctx_cnt: 256
        rx_ctx_cnt: 256
        max_ep_tx_ctx: 1
        max_ep_rx_ctx: 1
        max_ep_stx_ctx: 0
        max_ep_srx_ctx: 0
        cntr_cnt: 0
        mr_iov_limit: 0
    caps: [  ]
    mode: [  ]
        auth_key_size: 0
        max_err_data: 0
        mr_cnt: 0
    fi_fabric_attr:
        name: lf
        prov_name: lf
        prov_version: 0.1
        api_version: 1.8
    nic_fid: (nil)
_______________________________________________
ofiwg mailing list
[email protected]
https://lists.openfabrics.org/mailman/listinfo/ofiwg

Reply via email to