It will need ofi_rxd and/or ofi_rxm since it supports both DGRAM and MSG. attached is the info output with and without the debugging.
fi_info shows: lf;ofi_rxm, lf;ofi_rxd, lf (msg) and lf (dgram) I have not yet implemented the RMA or Remote_Read, and haven't looked at the difference between FI_READ and FI_RECV. Don ________________________________________ From: Hefty, Sean <[email protected]> Sent: Wednesday, November 13, 2019 2:42 PM To: James Swaro; Don Fry; Barrett, Brian; Byrne, John (Labs); [email protected] Subject: RE: [ofiwg] noob questions Can you provide the output of fi_info -v for your provider? From the output below, it looks like your provider will rely on the ofi_rxd utility provider for its functionality. I.e. your provider supports DGRAM endpoints. Can you confirm that? - Sean > -----Original Message----- > From: James Swaro <[email protected]> > Sent: Wednesday, November 13, 2019 2:39 PM > To: Don Fry <[email protected]>; Barrett, Brian <[email protected]>; > Hefty, Sean > <[email protected]>; Byrne, John (Labs) <[email protected]>; > [email protected] > Subject: Re: [ofiwg] noob questions > > Just pulling from your debug here, it looks like you have some requirements > that your > provider cannot satisfy for OpenMPI. > > checking info in util_getinfo > lf > libfabric:20561:lf:core:ofi_check_info():998<info> Unsupported capabilities > libfabric:20561:lf:core:ofi_check_info():999<info> Supported: FI_MSG, > FI_MULTICAST, > FI_RECV, FI_SEND > libfabric:20561:lf:core:ofi_check_info():999<info> Requested: FI_MSG, FI_RMA, > FI_READ, > FI_RECV, FI_SEND, FI_REMOTE_READ > checking info in util_getinfo > lf > libfabric:20561:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint > type > libfabric:20561:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_DGRAM > libfabric:20561:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_MSG > libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf > returned -61 > (No data available) > libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider > ofi_rxm returned > -61 (No data available) > libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, > skipping > ofi_rxm > libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, > skipping > ofi_rxd > libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, > skipping > ofi_mrail > checking info in util_getinfo > lf > libfabric:20561:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint > type > libfabric:20561:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_MSG > libfabric:20561:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_DGRAM > checking info in util_getinfo > lf > libfabric:20561:lf:core:ofi_check_mr_mode():510<info> Invalid memory > registration mode > libfabric:20561:lf:core:ofi_check_mr_mode():511<info> Expected: > libfabric:20561:lf:core:ofi_check_mr_mode():511<info> Given: > libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf > returned -61 > (No data available) > > -- Jim > > > On 11/13/19, 4:36 PM, "ofiwg on behalf of Don Fry" > <[email protected] > on behalf of [email protected]> wrote: > > Here is another run with the output suggested by James Swaro > > Don > ________________________________________ > From: Don Fry > Sent: Wednesday, November 13, 2019 2:26 PM > To: Barrett, Brian; Hefty, Sean; Byrne, John (Labs); > [email protected] > Subject: Re: [ofiwg] noob questions > > attached is the output of mpirun with some of my debugging printf's > > Don > ________________________________________ > From: Barrett, Brian <[email protected]> > Sent: Wednesday, November 13, 2019 2:05 PM > To: Don Fry; Hefty, Sean; Byrne, John (Labs); [email protected] > Subject: Re: [ofiwg] noob questions > > That likely means that something failed in initializing the OFI provider. > Without > seeing the debugging output John mentioned, it's really hard to say *why* it > failed to > initialize. There are many reasons, including not being able to conform to a > bunch of > provider assumptions that Open MPI has on its providers. > > Brian > > -----Original Message----- > From: Don Fry <[email protected]> > Date: Wednesday, November 13, 2019 at 2:01 PM > To: "Barrett, Brian" <[email protected]>, "Hefty, Sean" > <[email protected]>, > "Byrne, John (Labs)" <[email protected]>, "[email protected]" > <[email protected]> > Subject: Re: [ofiwg] noob questions > > When I tried --mca pml cm it complains that "PML cm cannot be > selected". Maybe > I needed to enable cm when I configured openmpi? I didn't specifically > enable or > disable it. It could also be that my getinfo routine doesn't have a > capability set > properly. > > my latest command line was: > mpirun --mca pml cm --mca mtl ofi --mca mtl_ofi_provider_include > "lf;ofi_rxm" > ./mpi_latency (where lf is my provider) > > Thanks for the pointers, I will do some more debugging on my end. > > Don > ________________________________________ > From: Barrett, Brian <[email protected]> > Sent: Wednesday, November 13, 2019 12:53 PM > To: Hefty, Sean; Byrne, John (Labs); Don Fry; > [email protected] > Subject: Re: [ofiwg] noob questions > > You can force Open MPI to use libfabric as its transport by adding > "-mca pml cm > -mca mtl ofi" to the mpirun command line. > > Brian > > -----Original Message----- > From: ofiwg <[email protected]> on behalf of > "Hefty, Sean" > <[email protected]> > Date: Wednesday, November 13, 2019 at 12:52 PM > To: "Byrne, John (Labs)" <[email protected]>, Don Fry > <[email protected]>, > "[email protected]" <[email protected]> > Subject: Re: [ofiwg] noob questions > > My guess is that OpenMPI has an internal socket transport that it > is using. > You likely need to force MPI to use libfabric, but I don't know enough about > OMPI to do > that. > > Jeff (copied) likely knows the answer here, but you may need to > create him > a new meme for his assistance. > > - Sean > > > -----Original Message----- > > From: ofiwg <[email protected]> On Behalf Of > Byrne, > John (Labs) > > Sent: Wednesday, November 13, 2019 11:26 AM > > To: Don Fry <[email protected]>; [email protected] > > Subject: Re: [ofiwg] noob questions > > > > You only mention the dgram and msg types and the mtl_ofi > component wants > rdm. If you > > don’t support rdm, I would have expected your getinfo routine > to return > error -61. You > > can try using the ofi_rxm provider with your provider to add > rdm support, > replacing > > verbs in “--mca mtl_ofi_provider_include verbs;ofi_rxm” with > your > provider. > > > > > > > > openmpi transport selection is complex. Adding insane levels of > verbosity > can help you > > understand what is happening. I tend to use: --mca > mtl_base_verbose 100 - > -mca > > btl_base_verbose 100 --mca pml_base_verbose 100 > > > > > > > > John Byrne > > > > > > > > From: ofiwg [mailto:[email protected]] On > Behalf Of Don > Fry > > Sent: Wednesday, November 13, 2019 10:54 AM > > To: [email protected] > > Subject: [ofiwg] noob questions > > > > > > > > I have written a libfabric provider for our hardware and it > passes all > the fabtests I > > expect it to (dgram and msg). I am trying to run some MPI > tests using > libfabrics under > > openmpi (4.0.2). When I run a simple ping-pong test using > mpirun it > sends and receives > > the messages using the tcp/ip protocol. It does call my > fi_getinfo > routine, but > > doesn't use my provider send/receive routines. I have rebuilt > the > libfabric library > > disabling sockets, then again --disable-tcp, then > --disable-udp, and > fi_info reports > > fewer and fewer providers until it only lists my provider, but > each time > I run the mpi > > test, it still uses the ip protocol to exchange messages. > > > > > > > > When I configured openmpi I specified > --with-libfabric=/usr/local/ and > the libfabric > > library is being loaded and executed. > > > > > > > > I am probably doing something obviously wrong, but I don't know > enough > about MPI or > > maybe libfabric, so need some help. If this is the wrong list, > redirect > me. > > > > > > > > Any suggestions? > > > > Don > > _______________________________________________ > ofiwg mailing list > [email protected] > https://lists.openfabrics.org/mailman/listinfo/ofiwg > > > > >
info.dbg
Description: info.dbg
---
fi_info:
caps: [ FI_MSG, FI_RMA, FI_TAGGED, FI_READ, FI_WRITE, FI_RECV, FI_SEND,
FI_REMOTE_READ, FI_REMOTE_WRITE, FI_MULTI_RECV, FI_LOCAL_COMM, FI_REMOTE_COMM ]
mode: [ ]
addr_format: FI_SOCKADDR_IN
src_addrlen: 16
dest_addrlen: 0
src_addr: fi_sockaddr_in://192.168.1.35:0
dest_addr: (null)
handle: (nil)
fi_tx_attr:
caps: [ FI_MSG, FI_RMA, FI_TAGGED, FI_READ, FI_WRITE, FI_RECV, FI_SEND,
FI_REMOTE_READ, FI_REMOTE_WRITE, FI_SOURCE, FI_DIRECTED_RECV ]
mode: [ ]
op_flags: [ ]
msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR,
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, FI_ORDER_SAS ]
comp_order: [ FI_ORDER_NONE ]
inject_size: 16320
size: 1024
iov_limit: 4
rma_iov_limit: 4
fi_rx_attr:
caps: [ FI_MSG, FI_RMA, FI_TAGGED, FI_READ, FI_WRITE, FI_RECV, FI_SEND,
FI_REMOTE_READ, FI_REMOTE_WRITE, FI_MULTI_RECV, FI_SOURCE, FI_DIRECTED_RECV ]
mode: [ ]
op_flags: [ ]
msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR,
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, FI_ORDER_SAS ]
comp_order: [ FI_ORDER_NONE ]
total_buffered_recv: 0
size: 1024
iov_limit: 4
fi_ep_attr:
type: FI_EP_RDM
protocol: FI_PROTO_RXM
protocol_version: 1
max_msg_size: 6552800
msg_prefix_size: 0
max_order_raw_size: 65528
max_order_war_size: 65528
max_order_waw_size: 65528
mem_tag_format: 0xaaaaaaaaaaaaaaaa
tx_ctx_cnt: 1
rx_ctx_cnt: 1
auth_key_size: 0
fi_domain_attr:
domain: 0x0
name: lf
threading: FI_THREAD_SAFE
control_progress: FI_PROGRESS_AUTO
data_progress: FI_PROGRESS_MANUAL
resource_mgmt: FI_RM_ENABLED
av_type: FI_AV_UNSPEC
mr_mode: [ FI_MR_BASIC, FI_MR_SCALABLE ]
mr_key_size: 0
cq_data_size: 0
cq_cnt: 65536
ep_cnt: 32768
tx_ctx_cnt: 1
rx_ctx_cnt: 1
max_ep_tx_ctx: 1
max_ep_rx_ctx: 1
max_ep_stx_ctx: 0
max_ep_srx_ctx: 0
cntr_cnt: 0
mr_iov_limit: 1
caps: [ FI_LOCAL_COMM, FI_REMOTE_COMM ]
mode: [ ]
auth_key_size: 0
max_err_data: 0
mr_cnt: 0
fi_fabric_attr:
name: lf
prov_name: lf;ofi_rxm
prov_version: 1.0
api_version: 1.8
nic_fid: (nil)
---
fi_info:
caps: [ FI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC, FI_READ, FI_WRITE, FI_RECV,
FI_SEND, FI_REMOTE_READ, FI_REMOTE_WRITE, FI_MULTI_RECV, FI_LOCAL_COMM,
FI_REMOTE_COMM, FI_RMA_EVENT, FI_SOURCE, FI_DIRECTED_RECV ]
mode: [ ]
addr_format: FI_SOCKADDR_IN
src_addrlen: 16
dest_addrlen: 0
src_addr: fi_sockaddr_in://192.168.1.35:0
dest_addr: (null)
handle: (nil)
fi_tx_attr:
caps: [ FI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC, FI_READ, FI_WRITE,
FI_SEND, FI_MULTI_RECV, FI_RMA_EVENT, FI_SOURCE, FI_DIRECTED_RECV ]
mode: [ ]
op_flags: [ FI_COMPLETION, FI_INJECT, FI_INJECT_COMPLETE,
FI_TRANSMIT_COMPLETE, FI_DELIVERY_COMPLETE ]
msg_order: [ FI_ORDER_SAS ]
comp_order: [ FI_ORDER_NONE ]
inject_size: 3880
size: 1024
iov_limit: 4
rma_iov_limit: 4
fi_rx_attr:
caps: [ FI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC, FI_RECV, FI_REMOTE_READ,
FI_REMOTE_WRITE, FI_MULTI_RECV, FI_RMA_EVENT, FI_SOURCE, FI_DIRECTED_RECV ]
mode: [ ]
op_flags: [ FI_MULTI_RECV, FI_COMPLETION ]
msg_order: [ FI_ORDER_SAS ]
comp_order: [ FI_ORDER_NONE ]
total_buffered_recv: 0
size: 1024
iov_limit: 4
fi_ep_attr:
type: FI_EP_RDM
protocol: FI_PROTO_RXD
protocol_version: 1
max_msg_size: 18446744073709551615
msg_prefix_size: 0
max_order_raw_size: 18446744073709551615
max_order_war_size: 0
max_order_waw_size: 18446744073709551615
mem_tag_format: 0xaaaaaaaaaaaaaaaa
tx_ctx_cnt: 1
rx_ctx_cnt: 1
auth_key_size: 0
fi_domain_attr:
domain: 0x0
name: lf
threading: FI_THREAD_SAFE
control_progress: FI_PROGRESS_MANUAL
data_progress: FI_PROGRESS_MANUAL
resource_mgmt: FI_RM_ENABLED
av_type: FI_AV_UNSPEC
mr_mode: [ FI_MR_BASIC, FI_MR_SCALABLE ]
mr_key_size: 8
cq_data_size: 8
cq_cnt: 128
ep_cnt: 128
tx_ctx_cnt: 1
rx_ctx_cnt: 1
max_ep_tx_ctx: 1
max_ep_rx_ctx: 1
max_ep_stx_ctx: 0
max_ep_srx_ctx: 0
cntr_cnt: 0
mr_iov_limit: 1
caps: [ FI_LOCAL_COMM, FI_REMOTE_COMM ]
mode: [ ]
auth_key_size: 0
max_err_data: 0
mr_cnt: 0
fi_fabric_attr:
name: lf
prov_name: lf;ofi_rxd
prov_version: 1.0
api_version: 1.8
nic_fid: (nil)
---
fi_info:
caps: [ FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND ]
mode: [ ]
addr_format: FI_SOCKADDR_IN
src_addrlen: 16
dest_addrlen: 0
src_addr: fi_sockaddr_in://192.168.1.35:0
dest_addr: (null)
handle: (nil)
fi_tx_attr:
caps: [ FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_RECV, FI_SEND,
FI_REMOTE_READ, FI_REMOTE_WRITE, FI_MULTI_RECV, FI_SHARED_AV ]
mode: [ ]
op_flags: [ ]
msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR,
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, FI_ORDER_SAS ]
comp_order: [ FI_ORDER_STRICT ]
inject_size: 0
size: 1024
iov_limit: 4
rma_iov_limit: 4
fi_rx_attr:
caps: [ FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_RECV, FI_SEND,
FI_REMOTE_READ, FI_REMOTE_WRITE, FI_MULTI_RECV, FI_SHARED_AV ]
mode: [ ]
op_flags: [ ]
msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR,
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, FI_ORDER_SAS ]
comp_order: [ FI_ORDER_STRICT ]
total_buffered_recv: 0
size: 1024
iov_limit: 4
fi_ep_attr:
type: FI_EP_MSG
protocol: FI_PROTO_LF
protocol_version: 0
max_msg_size: 6552800
msg_prefix_size: 0
max_order_raw_size: 65528
max_order_war_size: 65528
max_order_waw_size: 65528
mem_tag_format: 0x0000000000000000
tx_ctx_cnt: 1
rx_ctx_cnt: 1
auth_key_size: 0
fi_domain_attr:
domain: 0x0
name: lf
threading: FI_THREAD_SAFE
control_progress: FI_PROGRESS_AUTO
data_progress: FI_PROGRESS_AUTO
resource_mgmt: FI_RM_ENABLED
av_type: FI_AV_UNSPEC
mr_mode: [ ]
mr_key_size: 0
cq_data_size: 0
cq_cnt: 256
ep_cnt: 256
tx_ctx_cnt: 256
rx_ctx_cnt: 256
max_ep_tx_ctx: 1
max_ep_rx_ctx: 1
max_ep_stx_ctx: 0
max_ep_srx_ctx: 0
cntr_cnt: 0
mr_iov_limit: 0
caps: [ ]
mode: [ ]
auth_key_size: 0
max_err_data: 0
mr_cnt: 0
fi_fabric_attr:
name: lf
prov_name: lf
prov_version: 0.1
api_version: 1.8
nic_fid: (nil)
---
fi_info:
caps: [ FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND ]
mode: [ ]
addr_format: FI_SOCKADDR_IN
src_addrlen: 16
dest_addrlen: 0
src_addr: fi_sockaddr_in://192.168.1.35:0
dest_addr: (null)
handle: (nil)
fi_tx_attr:
caps: [ FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND, FI_MULTI_RECV,
FI_SHARED_AV ]
mode: [ ]
op_flags: [ ]
msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR,
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, FI_ORDER_SAS ]
comp_order: [ FI_ORDER_STRICT ]
inject_size: 0
size: 1024
iov_limit: 4
rma_iov_limit: 0
fi_rx_attr:
caps: [ FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND, FI_MULTI_RECV,
FI_SHARED_AV ]
mode: [ ]
op_flags: [ ]
msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR,
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, FI_ORDER_SAS ]
comp_order: [ FI_ORDER_STRICT ]
total_buffered_recv: 65536
size: 1024
iov_limit: 4
fi_ep_attr:
type: FI_EP_DGRAM
protocol: FI_PROTO_LF
protocol_version: 0
max_msg_size: 6552800
msg_prefix_size: 0
max_order_raw_size: 65528
max_order_war_size: 65528
max_order_waw_size: 65528
mem_tag_format: 0x0000000000000000
tx_ctx_cnt: 1
rx_ctx_cnt: 1
auth_key_size: 0
fi_domain_attr:
domain: 0x0
name: lf
threading: FI_THREAD_SAFE
control_progress: FI_PROGRESS_AUTO
data_progress: FI_PROGRESS_AUTO
resource_mgmt: FI_RM_ENABLED
av_type: FI_AV_UNSPEC
mr_mode: [ ]
mr_key_size: 0
cq_data_size: 0
cq_cnt: 256
ep_cnt: 256
tx_ctx_cnt: 256
rx_ctx_cnt: 256
max_ep_tx_ctx: 1
max_ep_rx_ctx: 1
max_ep_stx_ctx: 0
max_ep_srx_ctx: 0
cntr_cnt: 0
mr_iov_limit: 0
caps: [ ]
mode: [ ]
auth_key_size: 0
max_err_data: 0
mr_cnt: 0
fi_fabric_attr:
name: lf
prov_name: lf
prov_version: 0.1
api_version: 1.8
nic_fid: (nil)
_______________________________________________ ofiwg mailing list [email protected] https://lists.openfabrics.org/mailman/listinfo/ofiwg
