Here is another run with the output suggested by James Swaro Don ________________________________________ From: Don Fry Sent: Wednesday, November 13, 2019 2:26 PM To: Barrett, Brian; Hefty, Sean; Byrne, John (Labs); [email protected] Subject: Re: [ofiwg] noob questions
attached is the output of mpirun with some of my debugging printf's Don ________________________________________ From: Barrett, Brian <[email protected]> Sent: Wednesday, November 13, 2019 2:05 PM To: Don Fry; Hefty, Sean; Byrne, John (Labs); [email protected] Subject: Re: [ofiwg] noob questions That likely means that something failed in initializing the OFI provider. Without seeing the debugging output John mentioned, it's really hard to say *why* it failed to initialize. There are many reasons, including not being able to conform to a bunch of provider assumptions that Open MPI has on its providers. Brian -----Original Message----- From: Don Fry <[email protected]> Date: Wednesday, November 13, 2019 at 2:01 PM To: "Barrett, Brian" <[email protected]>, "Hefty, Sean" <[email protected]>, "Byrne, John (Labs)" <[email protected]>, "[email protected]" <[email protected]> Subject: Re: [ofiwg] noob questions When I tried --mca pml cm it complains that "PML cm cannot be selected". Maybe I needed to enable cm when I configured openmpi? I didn't specifically enable or disable it. It could also be that my getinfo routine doesn't have a capability set properly. my latest command line was: mpirun --mca pml cm --mca mtl ofi --mca mtl_ofi_provider_include "lf;ofi_rxm" ./mpi_latency (where lf is my provider) Thanks for the pointers, I will do some more debugging on my end. Don ________________________________________ From: Barrett, Brian <[email protected]> Sent: Wednesday, November 13, 2019 12:53 PM To: Hefty, Sean; Byrne, John (Labs); Don Fry; [email protected] Subject: Re: [ofiwg] noob questions You can force Open MPI to use libfabric as its transport by adding "-mca pml cm -mca mtl ofi" to the mpirun command line. Brian -----Original Message----- From: ofiwg <[email protected]> on behalf of "Hefty, Sean" <[email protected]> Date: Wednesday, November 13, 2019 at 12:52 PM To: "Byrne, John (Labs)" <[email protected]>, Don Fry <[email protected]>, "[email protected]" <[email protected]> Subject: Re: [ofiwg] noob questions My guess is that OpenMPI has an internal socket transport that it is using. You likely need to force MPI to use libfabric, but I don't know enough about OMPI to do that. Jeff (copied) likely knows the answer here, but you may need to create him a new meme for his assistance. - Sean > -----Original Message----- > From: ofiwg <[email protected]> On Behalf Of Byrne, John (Labs) > Sent: Wednesday, November 13, 2019 11:26 AM > To: Don Fry <[email protected]>; [email protected] > Subject: Re: [ofiwg] noob questions > > You only mention the dgram and msg types and the mtl_ofi component wants rdm. If you > don’t support rdm, I would have expected your getinfo routine to return error -61. You > can try using the ofi_rxm provider with your provider to add rdm support, replacing > verbs in “--mca mtl_ofi_provider_include verbs;ofi_rxm” with your provider. > > > > openmpi transport selection is complex. Adding insane levels of verbosity can help you > understand what is happening. I tend to use: --mca mtl_base_verbose 100 --mca > btl_base_verbose 100 --mca pml_base_verbose 100 > > > > John Byrne > > > > From: ofiwg [mailto:[email protected]] On Behalf Of Don Fry > Sent: Wednesday, November 13, 2019 10:54 AM > To: [email protected] > Subject: [ofiwg] noob questions > > > > I have written a libfabric provider for our hardware and it passes all the fabtests I > expect it to (dgram and msg). I am trying to run some MPI tests using libfabrics under > openmpi (4.0.2). When I run a simple ping-pong test using mpirun it sends and receives > the messages using the tcp/ip protocol. It does call my fi_getinfo routine, but > doesn't use my provider send/receive routines. I have rebuilt the libfabric library > disabling sockets, then again --disable-tcp, then --disable-udp, and fi_info reports > fewer and fewer providers until it only lists my provider, but each time I run the mpi > test, it still uses the ip protocol to exchange messages. > > > > When I configured openmpi I specified --with-libfabric=/usr/local/ and the libfabric > library is being loaded and executed. > > > > I am probably doing something obviously wrong, but I don't know enough about MPI or > maybe libfabric, so need some help. If this is the wrong list, redirect me. > > > > Any suggestions? > > Don _______________________________________________ ofiwg mailing list [email protected] https://lists.openfabrics.org/mailman/listinfo/ofiwg
mpirun --mca pml cm --mca mtl ofi --mca mtl_ofi_provider_include lf;ofi_rxm --mca mtl_base_verbose 100 --mca btl_base_verbose 100 --mca pml_base_verbose 100 ./mpi_latency [rh3:20561] mca: base: components_register: registering framework btl components [rh3:20561] mca: base: components_register: found loaded component self [rh3:20561] mca: base: components_register: component self register function successful [rh3:20561] mca: base: components_register: found loaded component sm [rh3:20561] mca: base: components_register: found loaded component tcp [rh3:20561] mca: base: components_register: component tcp register function successful [rh3:20561] mca: base: components_register: found loaded component vader [rh3:20561] mca: base: components_register: component vader register function successful [rh3:20561] mca: base: components_open: opening btl components [rh3:20561] mca: base: components_open: found loaded component self [rh3:20561] mca: base: components_open: component self open function successful [rh3:20561] mca: base: components_open: found loaded component tcp [rh3:20561] mca: base: components_open: component tcp open function successful [rh3:20561] mca: base: components_open: found loaded component vader [rh3:20561] mca: base: components_open: component vader open function successful [rh3:20561] select: initializing btl component self [rh3:20561] select: init of component self returned success [rh3:20561] select: initializing btl component tcp [rh3:20561] btl: tcp: Searching for exclude address+prefix: 127.0.0.1 / 8 [rh3:20561] btl: tcp: Found match: 127.0.0.1 (lo) [rh3:20561] btl:tcp: Attempting to bind to AF_INET port 1024 [rh3:20561] btl:tcp: Successfully bound to AF_INET port 1024 [rh3:20561] btl:tcp: my listening v4 socket is 0.0.0.0:1024 [rh3:20561] btl:tcp: examining interface me1 [rh3:20561] btl:tcp: using ipv6 interface me1 [rh3:20561] btl:tcp: examining interface lf0 [rh3:20561] btl:tcp: using ipv6 interface lf0 [rh3:20561] select: init of component tcp returned success [rh3:20561] select: initializing btl component vader [rh3:20561] select: init of component vader returned failure [rh3:20561] mca: base: close: component vader closed [rh3:20561] mca: base: close: unloading component vader [rh3:20561] mca: base: components_register: registering framework pml components [rh3:20561] mca: base: components_register: found loaded component cm [rh3:20561] mca: base: components_register: component cm register function successful [rh3:20561] mca: base: components_open: opening pml components [rh3:20561] mca: base: components_open: found loaded component cm [rh3:20561] mca: base: components_register: registering framework mtl components [rh3:20561] mca: base: components_register: found loaded component ofi [rh3:20561] mca: base: components_register: component ofi register function successful [rh3:20561] mca: base: components_open: opening mtl components [rh3:20561] mca: base: components_open: found loaded component ofi [rh3:20561] mca: base: components_open: component ofi open function successful [rh3:20561] mca: base: components_open: component cm open function successful [rh3:20561] select: initializing pml component cm [rh3:20561] mca:base:select: Auto-selecting mtl components [rh3:20561] mca:base:select:( mtl) Querying component [ofi] [rh3:20561] mca:base:select:( mtl) Query of component [ofi] set priority to 25 [rh3:20561] mca:base:select:( mtl) Selected component [ofi] [rh3:20561] select: initializing mtl component ofi libfabric:20561:core:core:fi_param_define_():231<info> registered var perf_cntr libfabric:20561:core:core:fi_param_get_():280<info> variable perf_cntr=<not set> libfabric:20561:core:core:fi_param_define_():231<info> registered var hook libfabric:20561:core:core:fi_param_get_():280<info> variable hook=<not set> libfabric:20561:core:core:fi_param_define_():231<info> registered var mr_cache_max_size libfabric:20561:core:core:fi_param_define_():231<info> registered var mr_cache_max_count libfabric:20561:core:core:fi_param_define_():231<info> registered var mr_cache_merge_regions libfabric:20561:core:core:fi_param_define_():231<info> registered var mr_cache_monitor libfabric:20561:core:core:fi_param_get_():280<info> variable mr_cache_max_size=<not set> libfabric:20561:core:core:fi_param_get_():280<info> variable mr_cache_max_count=<not set> libfabric:20561:core:core:fi_param_get_():280<info> variable mr_cache_merge_regions=<not set> libfabric:20561:core:core:fi_param_get_():280<info> variable mr_cache_monitor=<not set> libfabric:20561:core:core:fi_param_define_():231<info> registered var provider libfabric:20561:core:core:fi_param_define_():231<info> registered var fork_unsafe libfabric:20561:core:core:fi_param_define_():231<info> registered var universe_size libfabric:20561:core:core:fi_param_get_():289<info> read string var provider=lf libfabric:20561:core:core:fi_param_define_():231<info> registered var provider_path libfabric:20561:core:core:fi_param_get_():280<info> variable provider_path=<not set> libfabric:20561:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:20561:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:20561:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:20561:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:20561:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:20561:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:20561:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:20561:core:core:ofi_register_provider():374<info> registering provider: shm (1.1) libfabric:20561:core:core:ofi_register_provider():405<info> "shm" filtered by provider include/exclude list, skipping libfabric:20561:ofi_rxm:core:fi_param_define_():231<info> registered var buffer_size libfabric:20561:ofi_rxm:core:fi_param_define_():231<info> registered var comp_per_progress libfabric:20561:ofi_rxm:core:fi_param_define_():231<info> registered var sar_limit libfabric:20561:ofi_rxm:core:fi_param_define_():231<info> registered var use_srx libfabric:20561:ofi_rxm:core:fi_param_define_():231<info> registered var tx_size libfabric:20561:ofi_rxm:core:fi_param_define_():231<info> registered var rx_size libfabric:20561:ofi_rxm:core:fi_param_define_():231<info> registered var msg_tx_size libfabric:20561:ofi_rxm:core:fi_param_define_():231<info> registered var msg_rx_size libfabric:20561:ofi_rxm:core:fi_param_define_():231<info> registered var cm_progress_interval libfabric:20561:ofi_rxm:core:fi_param_get_():280<info> variable tx_size=<not set> libfabric:20561:ofi_rxm:core:fi_param_get_():280<info> variable rx_size=<not set> libfabric:20561:ofi_rxm:core:fi_param_get_():280<info> variable msg_tx_size=<not set> libfabric:20561:ofi_rxm:core:fi_param_get_():280<info> variable msg_rx_size=<not set> libfabric:20561:core:core:fi_param_get_():280<info> variable universe_size=<not set> libfabric:20561:ofi_rxm:core:fi_param_get_():280<info> variable cm_progress_interval=<not set> libfabric:20561:ofi_rxm:core:fi_param_get_():280<info> variable buffer_size=<not set> libfabric:20561:core:core:ofi_register_provider():374<info> registering provider: ofi_rxm (1.0) libfabric:20561:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:20561:ofi_mrail:core:fi_param_define_():231<info> registered var config libfabric:20561:ofi_mrail:core:fi_param_get_():280<info> variable config=<not set> libfabric:20561:ofi_mrail:core:fi_param_define_():231<info> registered var addr_strc libfabric:20561:ofi_mrail:core:fi_param_get_():280<info> variable addr_strc=<not set> libfabric:20561:ofi_mrail:core:mrail_parse_env_vars():109<warn> Unable to read OFI_MRAIL_ADDR_STRC env variable libfabric:20561:core:core:ofi_register_provider():374<info> registering provider: ofi_mrail (1.0) libfabric:20561:ofi_rxd:core:fi_param_define_():231<info> registered var spin_count libfabric:20561:ofi_rxd:core:fi_param_define_():231<info> registered var retry libfabric:20561:ofi_rxd:core:fi_param_define_():231<info> registered var max_peers libfabric:20561:ofi_rxd:core:fi_param_define_():231<info> registered var max_unacked libfabric:20561:ofi_rxd:core:fi_param_get_():280<info> variable spin_count=<not set> libfabric:20561:ofi_rxd:core:fi_param_get_():280<info> variable retry=<not set> libfabric:20561:ofi_rxd:core:fi_param_get_():280<info> variable max_peers=<not set> libfabric:20561:ofi_rxd:core:fi_param_get_():280<info> variable max_unacked=<not set> libfabric:20561:core:core:ofi_register_provider():374<info> registering provider: ofi_rxd (1.0) libfabric:20561:efa:core:fi_param_define_():231<info> registered var rx_window_size libfabric:20561:efa:core:fi_param_define_():231<info> registered var tx_max_credits libfabric:20561:efa:core:fi_param_define_():231<info> registered var tx_min_credits libfabric:20561:efa:core:fi_param_define_():231<info> registered var tx_queue_size libfabric:20561:efa:core:fi_param_define_():231<info> registered var enable_sas_ordering libfabric:20561:efa:core:fi_param_define_():231<info> registered var recvwin_size libfabric:20561:efa:core:fi_param_define_():231<info> registered var cq_size libfabric:20561:efa:core:fi_param_define_():231<info> registered var mr_cache_enable libfabric:20561:efa:core:fi_param_define_():231<info> registered var mr_cache_merge_regions libfabric:20561:efa:core:fi_param_define_():231<info> registered var mr_max_cached_count libfabric:20561:efa:core:fi_param_define_():231<info> registered var mr_max_cached_size libfabric:20561:efa:core:fi_param_define_():231<info> registered var max_memcpy_size libfabric:20561:efa:core:fi_param_define_():231<info> registered var mtu_size libfabric:20561:efa:core:fi_param_define_():231<info> registered var tx_size libfabric:20561:efa:core:fi_param_define_():231<info> registered var rx_size libfabric:20561:efa:core:fi_param_define_():231<info> registered var tx_iov_limit libfabric:20561:efa:core:fi_param_define_():231<info> registered var rx_iov_limit libfabric:20561:efa:core:fi_param_define_():231<info> registered var rx_copy_unexp libfabric:20561:efa:core:fi_param_define_():231<info> registered var rx_copy_ooo libfabric:20561:efa:core:fi_param_define_():231<info> registered var max_timeout libfabric:20561:efa:core:fi_param_define_():231<info> registered var timeout_interval libfabric:20561:efa:core:fi_param_get_():280<info> variable rx_window_size=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable tx_max_credits=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable tx_min_credits=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable tx_queue_size=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable enable_sas_ordering=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable recvwin_size=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable cq_size=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable max_memcpy_size=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable mr_cache_enable=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable mr_cache_merge_regions=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable mr_max_cached_count=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable mr_max_cached_size=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable mtu_size=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable tx_size=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable rx_size=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable tx_iov_limit=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable rx_iov_limit=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable rx_copy_unexp=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable rx_copy_ooo=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable max_timeout=<not set> libfabric:20561:efa:core:fi_param_get_():280<info> variable timeout_interval=<not set> libfabric:20561:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:20561:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:20561:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:20561:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:20561:lf:core:fi_param_define_():231<info> registered var group_name libfabric:20561:lf:core:fi_param_define_():231<info> registered var hba_number libfabric:20561:lf:core:fi_param_define_():231<info> registered var use_pio libfabric:20561:lf:core:fi_param_define_():231<info> registered var priority libfabric:20561:lf:core:fi_param_define_():231<info> registered var group_size libfabric:20561:lf:core:fi_param_get_():280<info> variable group_name=<not set> libfabric:20561:lf:core:fi_param_get_():280<info> variable hba_number=<not set> libfabric:20561:lf:core:fi_param_get_():280<info> variable use_pio=<not set> libfabric:20561:lf:core:fi_param_get_():280<info> variable priority=<not set> libfabric:20561:lf:core:fi_param_get_():280<info> variable group_size=<not set> libfabric:20561:core:core:ofi_register_provider():374<info> registering provider: lf (0.1) libfabric:20561:core:core:ofi_register_provider():374<info> registering provider: ofi_hook_perf (1.0) libfabric:20561:core:core:ofi_register_provider():374<info> registering provider: ofi_hook_debug (1.0) libfabric:20561:core:core:ofi_register_provider():374<info> registering provider: ofi_hook_noop (1.0) libfabric:20561:ofi_rxm:core:fi_param_get_():280<info> variable use_srx=<not set> libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_rxm libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_rxd libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_mrail checking info in util_getinfo lf libfabric:20561:lf:core:ofi_check_info():998<info> Unsupported capabilities libfabric:20561:lf:core:ofi_check_info():999<info> Supported: FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND libfabric:20561:lf:core:ofi_check_info():999<info> Requested: FI_MSG, FI_RMA, FI_READ, FI_RECV, FI_SEND, FI_REMOTE_READ checking info in util_getinfo lf libfabric:20561:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint type libfabric:20561:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_DGRAM libfabric:20561:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_MSG libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf returned -61 (No data available) libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider ofi_rxm returned -61 (No data available) libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_rxm libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_rxd libfabric:20561:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_mrail checking info in util_getinfo lf libfabric:20561:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint type libfabric:20561:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_MSG libfabric:20561:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_DGRAM checking info in util_getinfo lf libfabric:20561:lf:core:ofi_check_mr_mode():510<info> Invalid memory registration mode libfabric:20561:lf:core:ofi_check_mr_mode():511<info> Expected: libfabric:20561:lf:core:ofi_check_mr_mode():511<info> Given: libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf returned -61 (No data available) libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider ofi_rxd returned -61 (No data available) libfabric:20561:ofi_mrail:fabric:mrail_get_core_info():277<warn> OFI_MRAIL_ADDR_STRC env variable not set! libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider ofi_mrail returned -61 (No data available) checking info in util_getinfo lf libfabric:20561:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint type libfabric:20561:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_MSG libfabric:20561:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_RDM checking info in util_getinfo lf libfabric:20561:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint type libfabric:20561:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_DGRAM libfabric:20561:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_RDM libfabric:20561:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf returned -61 (No data available) [rh3:20561] select: init returned failure for component ofi [rh3:20561] select: no component selected [rh3:20561] select: init returned failure for component cm -------------------------------------------------------------------------- No components were able to be opened in the pml framework. This typically means that either no components of this type were installed, or none of the installed components can be loaded. Sometimes this means that shared libraries required by these components are unable to be found/loaded. Host: rh3 Framework: pml -------------------------------------------------------------------------- [rh3:20561] PML cm cannot be selected [rh4:31074] mca: base: components_register: registering framework btl components [rh4:31074] mca: base: components_register: found loaded component self [rh4:31074] mca: base: components_register: component self register function successful [rh4:31074] mca: base: components_register: found loaded component sm [rh4:31074] mca: base: components_register: found loaded component tcp [rh4:31074] mca: base: components_register: component tcp register function successful [rh4:31074] mca: base: components_register: found loaded component vader [rh4:31074] mca: base: components_register: component vader register function successful [rh4:31074] mca: base: components_open: opening btl components [rh4:31074] mca: base: components_open: found loaded component self [rh4:31074] mca: base: components_open: component self open function successful [rh4:31074] mca: base: components_open: found loaded component tcp [rh4:31074] mca: base: components_open: component tcp open function successful [rh4:31074] mca: base: components_open: found loaded component vader [rh4:31074] mca: base: components_open: component vader open function successful [rh4:31074] select: initializing btl component self [rh4:31074] select: init of component self returned success [rh4:31074] select: initializing btl component tcp [rh4:31074] btl: tcp: Searching for exclude address+prefix: 127.0.0.1 / 8 [rh4:31074] btl: tcp: Found match: 127.0.0.1 (lo) [rh4:31074] btl:tcp: Attempting to bind to AF_INET port 1024 [rh4:31074] btl:tcp: Successfully bound to AF_INET port 1024 [rh4:31074] btl:tcp: my listening v4 socket is 0.0.0.0:1024 [rh4:31074] btl:tcp: examining interface me1 [rh4:31074] btl:tcp: using ipv6 interface me1 [rh4:31074] btl:tcp: examining interface lf0 [rh4:31074] btl:tcp: using ipv6 interface lf0 [rh4:31074] select: init of component tcp returned success [rh4:31074] select: initializing btl component vader [rh4:31074] select: init of component vader returned failure [rh4:31074] mca: base: close: component vader closed [rh4:31074] mca: base: close: unloading component vader [rh4:31074] mca: base: components_register: registering framework pml components [rh4:31074] mca: base: components_register: found loaded component cm [rh4:31074] mca: base: components_register: component cm register function successful [rh4:31074] mca: base: components_open: opening pml components [rh4:31074] mca: base: components_open: found loaded component cm [rh4:31074] mca: base: components_register: registering framework mtl components [rh4:31074] mca: base: components_register: found loaded component ofi [rh4:31074] mca: base: components_register: component ofi register function successful [rh4:31074] mca: base: components_open: opening mtl components [rh4:31074] mca: base: components_open: found loaded component ofi [rh4:31074] mca: base: components_open: component ofi open function successful [rh4:31074] mca: base: components_open: component cm open function successful [rh4:31074] select: initializing pml component cm [rh4:31074] mca:base:select: Auto-selecting mtl components [rh4:31074] mca:base:select:( mtl) Querying component [ofi] [rh4:31074] mca:base:select:( mtl) Query of component [ofi] set priority to 25 [rh4:31074] mca:base:select:( mtl) Selected component [ofi] [rh4:31074] select: initializing mtl component ofi libfabric:31074:core:core:fi_param_define_():231<info> registered var perf_cntr libfabric:31074:core:core:fi_param_get_():280<info> variable perf_cntr=<not set> libfabric:31074:core:core:fi_param_define_():231<info> registered var hook libfabric:31074:core:core:fi_param_get_():280<info> variable hook=<not set> libfabric:31074:core:core:fi_param_define_():231<info> registered var mr_cache_max_size libfabric:31074:core:core:fi_param_define_():231<info> registered var mr_cache_max_count libfabric:31074:core:core:fi_param_define_():231<info> registered var mr_cache_merge_regions libfabric:31074:core:core:fi_param_define_():231<info> registered var mr_cache_monitor libfabric:31074:core:core:fi_param_get_():280<info> variable mr_cache_max_size=<not set> libfabric:31074:core:core:fi_param_get_():280<info> variable mr_cache_max_count=<not set> libfabric:31074:core:core:fi_param_get_():280<info> variable mr_cache_merge_regions=<not set> libfabric:31074:core:core:fi_param_get_():280<info> variable mr_cache_monitor=<not set> libfabric:31074:core:core:fi_param_define_():231<info> registered var provider libfabric:31074:core:core:fi_param_define_():231<info> registered var fork_unsafe libfabric:31074:core:core:fi_param_define_():231<info> registered var universe_size libfabric:31074:core:core:fi_param_get_():289<info> read string var provider=lf libfabric:31074:core:core:fi_param_define_():231<info> registered var provider_path libfabric:31074:core:core:fi_param_get_():280<info> variable provider_path=<not set> libfabric:31074:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:31074:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:31074:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:31074:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:31074:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:31074:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:31074:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:31074:core:core:ofi_register_provider():374<info> registering provider: shm (1.1) libfabric:31074:core:core:ofi_register_provider():405<info> "shm" filtered by provider include/exclude list, skipping libfabric:31074:ofi_rxm:core:fi_param_define_():231<info> registered var buffer_size libfabric:31074:ofi_rxm:core:fi_param_define_():231<info> registered var comp_per_progress libfabric:31074:ofi_rxm:core:fi_param_define_():231<info> registered var sar_limit libfabric:31074:ofi_rxm:core:fi_param_define_():231<info> registered var use_srx libfabric:31074:ofi_rxm:core:fi_param_define_():231<info> registered var tx_size libfabric:31074:ofi_rxm:core:fi_param_define_():231<info> registered var rx_size libfabric:31074:ofi_rxm:core:fi_param_define_():231<info> registered var msg_tx_size libfabric:31074:ofi_rxm:core:fi_param_define_():231<info> registered var msg_rx_size libfabric:31074:ofi_rxm:core:fi_param_define_():231<info> registered var cm_progress_interval libfabric:31074:ofi_rxm:core:fi_param_get_():280<info> variable tx_size=<not set> libfabric:31074:ofi_rxm:core:fi_param_get_():280<info> variable rx_size=<not set> libfabric:31074:ofi_rxm:core:fi_param_get_():280<info> variable msg_tx_size=<not set> libfabric:31074:ofi_rxm:core:fi_param_get_():280<info> variable msg_rx_size=<not set> libfabric:31074:core:core:fi_param_get_():280<info> variable universe_size=<not set> libfabric:31074:ofi_rxm:core:fi_param_get_():280<info> variable cm_progress_interval=<not set> libfabric:31074:ofi_rxm:core:fi_param_get_():280<info> variable buffer_size=<not set> libfabric:31074:core:core:ofi_register_provider():374<info> registering provider: ofi_rxm (1.0) libfabric:31074:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:31074:ofi_mrail:core:fi_param_define_():231<info> registered var config libfabric:31074:ofi_mrail:core:fi_param_get_():280<info> variable config=<not set> libfabric:31074:ofi_mrail:core:fi_param_define_():231<info> registered var addr_strc libfabric:31074:ofi_mrail:core:fi_param_get_():280<info> variable addr_strc=<not set> libfabric:31074:ofi_mrail:core:mrail_parse_env_vars():109<warn> Unable to read OFI_MRAIL_ADDR_STRC env variable libfabric:31074:core:core:ofi_register_provider():374<info> registering provider: ofi_mrail (1.0) libfabric:31074:ofi_rxd:core:fi_param_define_():231<info> registered var spin_count libfabric:31074:ofi_rxd:core:fi_param_define_():231<info> registered var retry libfabric:31074:ofi_rxd:core:fi_param_define_():231<info> registered var max_peers libfabric:31074:ofi_rxd:core:fi_param_define_():231<info> registered var max_unacked libfabric:31074:ofi_rxd:core:fi_param_get_():280<info> variable spin_count=<not set> libfabric:31074:ofi_rxd:core:fi_param_get_():280<info> variable retry=<not set> libfabric:31074:ofi_rxd:core:fi_param_get_():280<info> variable max_peers=<not set> libfabric:31074:ofi_rxd:core:fi_param_get_():280<info> variable max_unacked=<not set> libfabric:31074:core:core:ofi_register_provider():374<info> registering provider: ofi_rxd (1.0) libfabric:31074:efa:core:fi_param_define_():231<info> registered var rx_window_size libfabric:31074:efa:core:fi_param_define_():231<info> registered var tx_max_credits libfabric:31074:efa:core:fi_param_define_():231<info> registered var tx_min_credits libfabric:31074:efa:core:fi_param_define_():231<info> registered var tx_queue_size libfabric:31074:efa:core:fi_param_define_():231<info> registered var enable_sas_ordering libfabric:31074:efa:core:fi_param_define_():231<info> registered var recvwin_size libfabric:31074:efa:core:fi_param_define_():231<info> registered var cq_size libfabric:31074:efa:core:fi_param_define_():231<info> registered var mr_cache_enable libfabric:31074:efa:core:fi_param_define_():231<info> registered var mr_cache_merge_regions libfabric:31074:efa:core:fi_param_define_():231<info> registered var mr_max_cached_count libfabric:31074:efa:core:fi_param_define_():231<info> registered var mr_max_cached_size libfabric:31074:efa:core:fi_param_define_():231<info> registered var max_memcpy_size libfabric:31074:efa:core:fi_param_define_():231<info> registered var mtu_size libfabric:31074:efa:core:fi_param_define_():231<info> registered var tx_size libfabric:31074:efa:core:fi_param_define_():231<info> registered var rx_size libfabric:31074:efa:core:fi_param_define_():231<info> registered var tx_iov_limit libfabric:31074:efa:core:fi_param_define_():231<info> registered var rx_iov_limit libfabric:31074:efa:core:fi_param_define_():231<info> registered var rx_copy_unexp libfabric:31074:efa:core:fi_param_define_():231<info> registered var rx_copy_ooo libfabric:31074:efa:core:fi_param_define_():231<info> registered var max_timeout libfabric:31074:efa:core:fi_param_define_():231<info> registered var timeout_interval libfabric:31074:efa:core:fi_param_get_():280<info> variable rx_window_size=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable tx_max_credits=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable tx_min_credits=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable tx_queue_size=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable enable_sas_ordering=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable recvwin_size=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable cq_size=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable max_memcpy_size=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable mr_cache_enable=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable mr_cache_merge_regions=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable mr_max_cached_count=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable mr_max_cached_size=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable mtu_size=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable tx_size=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable rx_size=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable tx_iov_limit=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable rx_iov_limit=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable rx_copy_unexp=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable rx_copy_ooo=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable max_timeout=<not set> libfabric:31074:efa:core:fi_param_get_():280<info> variable timeout_interval=<not set> libfabric:31074:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:31074:UDP:core:fi_param_define_():231<info> registered var iface libfabric:31074:core:core:ofi_register_provider():374<info> registering provider: UDP (1.1) libfabric:31074:core:core:ofi_register_provider():405<info> "UDP" filtered by provider include/exclude list, skipping libfabric:31074:core:core:ofi_register_provider():367<warn> no provider structure or name libfabric:31074:tcp:core:fi_param_define_():231<info> registered var iface libfabric:31074:tcp:core:fi_param_define_():231<info> registered var port_low_range libfabric:31074:tcp:core:fi_param_define_():231<info> registered var port_high_range libfabric:31074:tcp:core:fi_param_get_():280<info> variable port_high_range=<not set> libfabric:31074:tcp:core:fi_param_get_():280<info> variable port_low_range=<not set> libfabric:31074:core:core:ofi_register_provider():374<info> registering provider: tcp (1.0) libfabric:31074:core:core:ofi_register_provider():405<info> "tcp" filtered by provider include/exclude list, skipping libfabric:31074:lf:core:fi_param_define_():231<info> registered var group_name libfabric:31074:lf:core:fi_param_define_():231<info> registered var hba_number libfabric:31074:lf:core:fi_param_define_():231<info> registered var use_pio libfabric:31074:lf:core:fi_param_define_():231<info> registered var priority libfabric:31074:lf:core:fi_param_define_():231<info> registered var group_size libfabric:31074:lf:core:fi_param_get_():280<info> variable group_name=<not set> libfabric:31074:lf:core:fi_param_get_():280<info> variable hba_number=<not set> libfabric:31074:lf:core:fi_param_get_():280<info> variable use_pio=<not set> libfabric:31074:lf:core:fi_param_get_():280<info> variable priority=<not set> libfabric:31074:lf:core:fi_param_get_():280<info> variable group_size=<not set> libfabric:31074:core:core:ofi_register_provider():374<info> registering provider: lf (0.1) libfabric:31074:core:core:ofi_register_provider():374<info> registering provider: ofi_hook_perf (1.0) libfabric:31074:core:core:ofi_register_provider():374<info> registering provider: ofi_hook_debug (1.0) libfabric:31074:core:core:ofi_register_provider():374<info> registering provider: ofi_hook_noop (1.0) libfabric:31074:ofi_rxm:core:fi_param_get_():280<info> variable use_srx=<not set> libfabric:31074:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_rxm libfabric:31074:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_rxd libfabric:31074:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_mrail libfabric:31074:lf:core:ofi_check_info():998<info> Unsupported capabilities libfabric:31074:lf:core:ofi_check_info():999<info> Supported: FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND libfabric:31074:lf:core:ofi_check_info():999<info> Requested: FI_MSG, FI_RMA, FI_READ, FI_RECV, FI_SEND, FI_REMOTE_READ libfabric:31074:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint type libfabric:31074:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_DGRAM libfabric:31074:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_MSG libfabric:31074:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf returned -61 (No data available) libfabric:31074:core:core:fi_getinfo_():891<warn> fi_getinfo: provider ofi_rxm returned -61 (No data available) libfabric:31074:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_rxm libfabric:31074:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_rxd libfabric:31074:core:core:ofi_layering_ok():796<info> Need core provider, skipping ofi_mrail libfabric:31074:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint type libfabric:31074:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_MSG libfabric:31074:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_DGRAM libfabric:31074:lf:core:ofi_check_mr_mode():510<info> Invalid memory registration mode libfabric:31074:lf:core:ofi_check_mr_mode():511<info> Expected: libfabric:31074:lf:core:ofi_check_mr_mode():511<info> Given: libfabric:31074:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf returned -61 (No data available) libfabric:31074:core:core:fi_getinfo_():891<warn> fi_getinfo: provider ofi_rxd returned -61 (No data available) libfabric:31074:ofi_mrail:fabric:mrail_get_core_info():277<warn> OFI_MRAIL_ADDR_STRC env variable not set! libfabric:31074:core:core:fi_getinfo_():891<warn> fi_getinfo: provider ofi_mrail returned -61 (No data available) libfabric:31074:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint type libfabric:31074:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_MSG libfabric:31074:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_RDM libfabric:31074:lf:core:ofi_check_ep_type():629<info> Unsupported endpoint type libfabric:31074:lf:core:ofi_check_ep_type():630<info> Supported: FI_EP_DGRAM libfabric:31074:lf:core:ofi_check_ep_type():630<info> Requested: FI_EP_RDM libfabric:31074:core:core:fi_getinfo_():891<warn> fi_getinfo: provider lf returned -61 (No data available) [rh4:31074] select: init returned failure for component ofi [rh4:31074] select: no component selected [rh4:31074] select: init returned failure for component cm [rh4:31074] PML cm cannot be selected [rh3:20550] 1 more process has sent help message help-mca-base.txt / find-available:none found [rh3:20550] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
_______________________________________________ ofiwg mailing list [email protected] https://lists.openfabrics.org/mailman/listinfo/ofiwg
