Howard,
I don't have much time now to try with --enable-debug.
The RoCE device we have is FastLinQ QL41000 Series 10/25/40/50GbE Controller
The output of ibv_devinfo is:
hca_id: qedr0
transport: InfiniBand (0)
fw_ver: 8.20.0.0
node_guid: 2267:7cff:fe11:4a50
sys_image_guid: 2267:7cff:fe11:4a50
vendor_id: 0x1077
vendor_part_id: 32880
hw_ver: 0x0
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
hca_id: qedr1
transport: InfiniBand (0)
fw_ver: 8.20.0.0
node_guid: 2267:7cff:fe11:4a51
sys_image_guid: 2267:7cff:fe11:4a51
vendor_id: 0x1077
vendor_part_id: 32880
hw_ver: 0x0
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
Regarding UCX, we have tried with the latest version. Compilation goes through,
but the ucv_info command gives an error:
# Memory domain: qedr0
# Component: ib
# register: unlimited, cost: 180 nsec
# remote key: 8 bytes
# local memory handle is required for zcopy
#
# Transport: rc_verbs
# Device: qedr0:1
# Type: network
# System device: qedr0 (0)
[1643982133.674556] [kahan01:8217 :0] rc_iface.c:505 UCX ERROR
ibv_create_srq() failed: Function not implemented
# < failed to open interface >
#
# Transport: ud_verbs
# Device: qedr0:1
# Type: network
# System device: qedr0 (0)
[qelr_create_qp:545]create qp: failed on ibv_cmd_create_qp with 22
[1643982133.681169] [kahan01:8217 :0] ib_iface.c:994 UCX ERROR
iface=0x56074944bf10: failed to create UD QP TX wr:256 sge:6 inl:64 resp:0 RX
wr:4096 sge:1 resp:0: Invalid argument
# < failed to open interface >
#
# Memory domain: qedr1
# Component: ib
# register: unlimited, cost: 180 nsec
# remote key: 8 bytes
# local memory handle is required for zcopy
# < no supported devices found >
Any idea what the error in ibv_create_srq() means?
Thanks for your help.
Jose
> El 3 feb 2022, a las 17:52, Pritchard Jr., Howard <[email protected]> escribió:
>
> Hi Jose,
>
> A number of things.
>
> First for recent versions of Open MPI including the 4.1.x release stream,
> MPI_THREAD_MULTIPLE is supported by default. However, some transport options
> available when using MPI_Init may not be available when requesting
> MPI_THREAD_MULTIPLE.
> You may want to let Open MPI trundle along with tcp used for inter-node
> messaging and see if your application performs well enough. For a small
> system tcp may well suffice.
>
> Second, if you want to pursue this further, you want to rebuild Open MPI with
> --enable-debug. The debug output will be considerably more verbose and
> provides more info. I think you will get a message saying rdmacm CPC is
> excluded owing to the requested thread support level. There may be info
> about why udcm is not selected as well.
>
> Third, what sort of RoCE devices are available on your system? The output
> from ibv_devinfo may be useful.
>
> As for UCX, if it’s the version that came with your ubuntu release 18.0.4 it
> may be pretty old. It's likely that UCX has not been tested on the RoCE
> devices on your system.
>
> You can run
>
> ucx_info -v
>
> to check the version number of UCX that you are picking up.
>
> You can download the latest release of UCX at
>
> https://github.com/openucx/ucx/releases/tag/v1.12.0
>
> Instructions for how to build are in the README.md at
> https://github.com/openucx/ucx.
> You will want to configure with
>
> contrib/configure-release-mt --enable-gtest
>
> You want to add the --enable-gtest to the configure options so that you can
> run the ucx sanity checks. Note this takes quite a while to run but is
> pretty thorough at validating your UCX build.
> You'll want to run this test on one of the nodes with a RoCE device -
>
> ucx_info -d
>
> This will show which UCX transports/devices are available.
>
> See the Running internal unit tests section of the README.md
>
> Hope this helps,
>
> Howard
>
>
> On 2/3/22, 8:46 AM, "Jose E. Roman" <[email protected]> wrote:
>
> Thanks. The verbose output is:
>
> [kahan01.upvnet.upv.es:29732] mca: base: components_register: registering
> framework btl components
> [kahan01.upvnet.upv.es:29732] mca: base: components_register: found loaded
> component self
> [kahan01.upvnet.upv.es:29732] mca: base: components_register: component
> self register function successful
> [kahan01.upvnet.upv.es:29732] mca: base: components_register: found loaded
> component sm
> [kahan01.upvnet.upv.es:29732] mca: base: components_register: found loaded
> component openib
> [kahan01.upvnet.upv.es:29732] mca: base: components_register: component
> openib register function successful
> [kahan01.upvnet.upv.es:29732] mca: base: components_register: found loaded
> component vader
> [kahan01.upvnet.upv.es:29732] mca: base: components_register: component
> vader register function successful
> [kahan01.upvnet.upv.es:29732] mca: base: components_register: found loaded
> component tcp
> [kahan01.upvnet.upv.es:29732] mca: base: components_register: component
> tcp register function successful
> [kahan01.upvnet.upv.es:29732] mca: base: components_open: opening btl
> components
> [kahan01.upvnet.upv.es:29732] mca: base: components_open: found loaded
> component self
> [kahan01.upvnet.upv.es:29732] mca: base: components_open: component self
> open function successful
> [kahan01.upvnet.upv.es:29732] mca: base: components_open: found loaded
> component openib
> [kahan01.upvnet.upv.es:29732] mca: base: components_open: component openib
> open function successful
> [kahan01.upvnet.upv.es:29732] mca: base: components_open: found loaded
> component vader
> [kahan01.upvnet.upv.es:29732] mca: base: components_open: component vader
> open function successful
> [kahan01.upvnet.upv.es:29732] mca: base: components_open: found loaded
> component tcp
> [kahan01.upvnet.upv.es:29732] mca: base: components_open: component tcp
> open function successful
> [kahan01.upvnet.upv.es:29732] select: initializing btl component self
> [kahan01.upvnet.upv.es:29732] select: init of component self returned
> success
> [kahan01.upvnet.upv.es:29732] select: initializing btl component openib
> [kahan01.upvnet.upv.es:29732] Checking distance from this process to
> device=qedr0
> [kahan01.upvnet.upv.es:29732] hwloc_distances->nbobjs=4
> [kahan01.upvnet.upv.es:29732] hwloc_distances->values[0]=10
> [kahan01.upvnet.upv.es:29732] hwloc_distances->values[1]=16
> [kahan01.upvnet.upv.es:29732] hwloc_distances->values[2]=16
> [kahan01.upvnet.upv.es:29732] hwloc_distances->values[3]=16
> [kahan01.upvnet.upv.es:29732] ibv_obj->type set to NULL
> [kahan01.upvnet.upv.es:29732] Process is bound: distance to device is
> 0.000000
> [kahan01.upvnet.upv.es:29732] Checking distance from this process to
> device=qedr1
> [kahan01.upvnet.upv.es:29732] hwloc_distances->nbobjs=4
> [kahan01.upvnet.upv.es:29732] hwloc_distances->values[0]=10
> [kahan01.upvnet.upv.es:29732] hwloc_distances->values[1]=16
> [kahan01.upvnet.upv.es:29732] hwloc_distances->values[2]=16
> [kahan01.upvnet.upv.es:29732] hwloc_distances->values[3]=16
> [kahan01.upvnet.upv.es:29732] ibv_obj->type set to NULL
> [kahan01.upvnet.upv.es:29732] Process is bound: distance to device is
> 0.000000
> [kahan01.upvnet.upv.es:29732] openib BTL: rdmacm CPC unavailable for use
> on qedr0:1; skipped
> --------------------------------------------------------------------------
> No OpenFabrics connection schemes reported that they were able to be
> used on a specific port. As such, the openib BTL (OpenFabrics
> support) will be disabled for this port.
>
> Local host: kahan01
> Local device: qedr0
> Local port: 1
> CPCs attempted: rdmacm, udcm
> --------------------------------------------------------------------------
> [kahan01.upvnet.upv.es:29732] select: init of component openib returned
> failure
> [kahan01.upvnet.upv.es:29732] mca: base: close: component openib closed
> [kahan01.upvnet.upv.es:29732] mca: base: close: unloading component openib
> [kahan01.upvnet.upv.es:29732] select: initializing btl component vader
> [kahan01.upvnet.upv.es:29732] select: init of component vader returned
> failure
> [kahan01.upvnet.upv.es:29732] mca: base: close: component vader closed
> [kahan01.upvnet.upv.es:29732] mca: base: close: unloading component vader
> [kahan01.upvnet.upv.es:29732] select: initializing btl component tcp
> [kahan01.upvnet.upv.es:29732] btl: tcp: Searching for exclude
> address+prefix: 127.0.0.1 / 8
> [kahan01.upvnet.upv.es:29732] btl: tcp: Found match: 127.0.0.1 (lo)
> [kahan01.upvnet.upv.es:29732] btl:tcp: Attempting to bind to AF_INET port
> 1024
> [kahan01.upvnet.upv.es:29732] btl:tcp: Successfully bound to AF_INET port
> 1024
> [kahan01.upvnet.upv.es:29732] btl:tcp: my listening v4 socket is
> 0.0.0.0:1024
> [kahan01.upvnet.upv.es:29732] btl:tcp: examining interface eno1
> [kahan01.upvnet.upv.es:29732] btl:tcp: using ipv6 interface eno1
> [kahan01.upvnet.upv.es:29732] btl:tcp: examining interface eno5
> [kahan01.upvnet.upv.es:29732] btl:tcp: using ipv6 interface eno5
> [kahan01.upvnet.upv.es:29732] select: init of component tcp returned
> success
> [kahan01.upvnet.upv.es:29732] mca: bml: Using self btl for send to
> [[45435,1],0] on node kahan01
> Hello world from process 0 of 1, provided=1
> [kahan01.upvnet.upv.es:29732] mca: base: close: component self closed
> [kahan01.upvnet.upv.es:29732] mca: base: close: unloading component self
> [kahan01.upvnet.upv.es:29732] mca: base: close: component tcp closed
> [kahan01.upvnet.upv.es:29732] mca: base: close: unloading component tcp
>
>
> Regarding UCX, at some point I tried but IIRC the installation of UCX in
> this machine does not work for some reason. Is there an easy way to check if
> UCX works well before installing Open MPI?
>
> Jose
>
>
>
>> El 3 feb 2022, a las 16:38, Pritchard Jr., Howard <[email protected]>
>> escribió:
>>
>> Hello Jose,
>>
>> I suspect the issue here is that the OpenIB BTl isn't finding a connection
>> module when you are requesting MPI_THREAD_MULTIPLE.
>> The rdmacm connection is deselected if MPI_THREAD_MULTIPLE thread support
>> level is being requested.
>>
>> If you run the test in a shell with
>>
>> export OMPI_MCA_btl_base_verbose=100
>>
>> there may be some more info to help diagnose what's going on.
>>
>> Another option would be to build Open MPI with UCX support. That's the
>> better way to use Open MPI over IB/RoCE.
>>
>> Howard
>>
>> On 2/2/22, 10:52 AM, "users on behalf of Jose E. Roman via users"
>> <[email protected] on behalf of [email protected]>
>> wrote:
>>
>> Hi.
>>
>> I am using Open MPI 4.1.1 with the openib BTL on a 4-node cluster with
>> Ethernet 10/25Gb (RoCE). It is using libibverbs from Ubuntu 18.04 (kernel
>> 4.15.0-166-generic).
>>
>> With this hello world example:
>>
>> #include <stdio.h>
>> #include <mpi.h>
>> int main (int argc, char *argv[])
>> {
>> int rank, size, provided;
>> MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>> printf("Hello world from process %d of %d, provided=%d\n", rank, size,
>> provided);
>> MPI_Finalize();
>> return 0;
>> }
>>
>> I get the following output when run on one node:
>>
>> $ ./hellow
>> --------------------------------------------------------------------------
>> No OpenFabrics connection schemes reported that they were able to be
>> used on a specific port. As such, the openib BTL (OpenFabrics
>> support) will be disabled for this port.
>>
>> Local host: kahan01
>> Local device: qedr0
>> Local port: 1
>> CPCs attempted: rdmacm, udcm
>> --------------------------------------------------------------------------
>> Hello world from process 0 of 1, provided=1
>>
>>
>> The message does not appear if I run on the front-end (does not have RoCE
>> network) or if I run it on the node either using MPI_Init() instead of
>> MPI_Init_thread() or using MPI_THREAD_SINGLE instead of MPI_THREAD_FUNNELED.
>>
>> Is there any reason why MPI_Init_thread() is behaving differently to
>> MPI_Init()? Note that I am not using threads, and just one MPI process.
>>
>>
>> The question has a second part: is there a way to determine (without
>> running an MPI program) that MPI_Init_thread() won't work but MPI_Init()
>> will work? I am asking this because PETSc programs default to use
>> MPI_Init_thread() when PETSc's configure script finds the MPI_Init_thread()
>> symbol in the MPI library. But in situations like the one reported here, it
>> would be better to revert to MPI_Init() since MPI_Init_thread() will not
>> work as expected. [The configure script cannot run an MPI program due to
>> batch systems.]
>>
>> Thanks for your help.
>> Jose
>>
>
>