Just pulling from your debug here, it looks like you have some requirements that your provider cannot satisfy for OpenMPI.
checking info in util_getinfo lf libfabric:20561:lf:core:ofi_check_info():998<info> Unsupported capabilities libfabric:20561:lf:core:ofi_check_info():999<info> Supported: FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND libfabric:20561:lf:core:ofi_check_info():999<info> Requested: FI_MSG, FI_RMA, FI_READ, FI_RECV, FI_SEND, FI_REMOTE_READ checking info in util_getinfo lf libfabric:20561:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint type libfabric:20561:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_DGRAM libfabric:20561:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_MSG libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf returned -61 (No data available) libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider ofi_rxm returned -61 (No data available) libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_rxm libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_rxd libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_mrail checking info in util_getinfo lf libfabric:20561:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint type libfabric:20561:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_MSG libfabric:20561:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_DGRAM checking info in util_getinfo lf libfabric:20561:lf:core:ofi_check_mr_mode():510<info> Invalid memory registration mode libfabric:20561:lf:core:ofi_check_mr_mode():511<info> Expected: libfabric:20561:lf:core:ofi_check_mr_mode():511<info> Given: libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf returned -61 (No data available) -- Jim On 11/13/19, 4:36 PM, "ofiwg on behalf of Don Fry" <[email protected] on behalf of [email protected]> wrote: Here is another run with the output suggested by James Swaro Don ________________________________________ From: Don Fry Sent: Wednesday, November 13, 2019 2:26 PM To: Barrett, Brian; Hefty, Sean; Byrne, John (Labs); [email protected] Subject: Re: [ofiwg] noob questions attached is the output of mpirun with some of my debugging printf's Don ________________________________________ From: Barrett, Brian <[email protected]> Sent: Wednesday, November 13, 2019 2:05 PM To: Don Fry; Hefty, Sean; Byrne, John (Labs); [email protected] Subject: Re: [ofiwg] noob questions That likely means that something failed in initializing the OFI provider. Without seeing the debugging output John mentioned, it's really hard to say *why* it failed to initialize. There are many reasons, including not being able to conform to a bunch of provider assumptions that Open MPI has on its providers. Brian -----Original Message----- From: Don Fry <[email protected]> Date: Wednesday, November 13, 2019 at 2:01 PM To: "Barrett, Brian" <[email protected]>, "Hefty, Sean" <[email protected]>, "Byrne, John (Labs)" <[email protected]>, "[email protected]" <[email protected]> Subject: Re: [ofiwg] noob questions When I tried --mca pml cm it complains that "PML cm cannot be selected". Maybe I needed to enable cm when I configured openmpi? I didn't specifically enable or disable it. It could also be that my getinfo routine doesn't have a capability set properly. my latest command line was: mpirun --mca pml cm --mca mtl ofi --mca mtl_ofi_provider_include "lf;ofi_rxm" ./mpi_latency (where lf is my provider) Thanks for the pointers, I will do some more debugging on my end. Don ________________________________________ From: Barrett, Brian <[email protected]> Sent: Wednesday, November 13, 2019 12:53 PM To: Hefty, Sean; Byrne, John (Labs); Don Fry; [email protected] Subject: Re: [ofiwg] noob questions You can force Open MPI to use libfabric as its transport by adding "-mca pml cm -mca mtl ofi" to the mpirun command line. Brian -----Original Message----- From: ofiwg <[email protected]> on behalf of "Hefty, Sean" <[email protected]> Date: Wednesday, November 13, 2019 at 12:52 PM To: "Byrne, John (Labs)" <[email protected]>, Don Fry <[email protected]>, "[email protected]" <[email protected]> Subject: Re: [ofiwg] noob questions My guess is that OpenMPI has an internal socket transport that it is using. You likely need to force MPI to use libfabric, but I don't know enough about OMPI to do that. Jeff (copied) likely knows the answer here, but you may need to create him a new meme for his assistance. - Sean > -----Original Message----- > From: ofiwg <[email protected]> On Behalf Of Byrne, John (Labs) > Sent: Wednesday, November 13, 2019 11:26 AM > To: Don Fry <[email protected]>; [email protected] > Subject: Re: [ofiwg] noob questions > > You only mention the dgram and msg types and the mtl_ofi component wants rdm. If you > don’t support rdm, I would have expected your getinfo routine to return error -61. You > can try using the ofi_rxm provider with your provider to add rdm support, replacing > verbs in “--mca mtl_ofi_provider_include verbs;ofi_rxm” with your provider. > > > > openmpi transport selection is complex. Adding insane levels of verbosity can help you > understand what is happening. I tend to use: --mca mtl_base_verbose 100 --mca > btl_base_verbose 100 --mca pml_base_verbose 100 > > > > John Byrne > > > > From: ofiwg [mailto:[email protected]] On Behalf Of Don Fry > Sent: Wednesday, November 13, 2019 10:54 AM > To: [email protected] > Subject: [ofiwg] noob questions > > > > I have written a libfabric provider for our hardware and it passes all the fabtests I > expect it to (dgram and msg). I am trying to run some MPI tests using libfabrics under > openmpi (4.0.2). When I run a simple ping-pong test using mpirun it sends and receives > the messages using the tcp/ip protocol. It does call my fi_getinfo routine, but > doesn't use my provider send/receive routines. I have rebuilt the libfabric library > disabling sockets, then again --disable-tcp, then --disable-udp, and fi_info reports > fewer and fewer providers until it only lists my provider, but each time I run the mpi > test, it still uses the ip protocol to exchange messages. > > > > When I configured openmpi I specified --with-libfabric=/usr/local/ and the libfabric > library is being loaded and executed. > > > > I am probably doing something obviously wrong, but I don't know enough about MPI or > maybe libfabric, so need some help. If this is the wrong list, redirect me. > > > > Any suggestions? > > Don _______________________________________________ ofiwg mailing list [email protected] https://lists.openfabrics.org/mailman/listinfo/ofiwg _______________________________________________ ofiwg mailing list [email protected] https://lists.openfabrics.org/mailman/listinfo/ofiwg
