Howard,

I don't have much time now to try with --enable-debug.

The RoCE device we have is FastLinQ QL41000 Series 10/25/40/50GbE Controller
The output of ibv_devinfo is:
hca_id: qedr0
        transport:                      InfiniBand (0)
        fw_ver:                         8.20.0.0
        node_guid:                      2267:7cff:fe11:4a50
        sys_image_guid:                 2267:7cff:fe11:4a50
        vendor_id:                      0x1077
        vendor_part_id:                 32880
        hw_ver:                         0x0
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

hca_id: qedr1
        transport:                      InfiniBand (0)
        fw_ver:                         8.20.0.0
        node_guid:                      2267:7cff:fe11:4a51
        sys_image_guid:                 2267:7cff:fe11:4a51
        vendor_id:                      0x1077
        vendor_part_id:                 32880
        hw_ver:                         0x0
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

Regarding UCX, we have tried with the latest version. Compilation goes through, 
but the ucv_info command gives an error:

# Memory domain: qedr0
#     Component: ib
#             register: unlimited, cost: 180 nsec
#           remote key: 8 bytes
#           local memory handle is required for zcopy
#
#      Transport: rc_verbs
#         Device: qedr0:1
#           Type: network
#  System device: qedr0 (0)
[1643982133.674556] [kahan01:8217 :0]        rc_iface.c:505  UCX ERROR 
ibv_create_srq() failed: Function not implemented
#   < failed to open interface >
#
#      Transport: ud_verbs
#         Device: qedr0:1
#           Type: network
#  System device: qedr0 (0)
[qelr_create_qp:545]create qp: failed on ibv_cmd_create_qp with 22
[1643982133.681169] [kahan01:8217 :0]        ib_iface.c:994  UCX ERROR 
iface=0x56074944bf10: failed to create UD QP TX wr:256 sge:6 inl:64 resp:0 RX 
wr:4096 sge:1 resp:0: Invalid argument
#   < failed to open interface >
#
# Memory domain: qedr1
#     Component: ib
#             register: unlimited, cost: 180 nsec
#           remote key: 8 bytes
#           local memory handle is required for zcopy
#   < no supported devices found >


Any idea what the error in ibv_create_srq() means?

Thanks for your help.
Jose



> El 3 feb 2022, a las 17:52, Pritchard Jr., Howard <howa...@lanl.gov> escribió:
> 
> Hi Jose,
> 
> A number of things.  
> 
> First for recent versions of Open MPI including the 4.1.x release stream, 
> MPI_THREAD_MULTIPLE is supported by default.  However, some transport options 
> available when using MPI_Init may not be available when requesting 
> MPI_THREAD_MULTIPLE.
> You may want to let Open MPI trundle along with tcp used for inter-node 
> messaging and see if your application performs well enough. For a small 
> system tcp may well suffice. 
> 
> Second, if you want to pursue this further, you want to rebuild Open MPI with 
> --enable-debug.  The debug output will be considerably more verbose and 
> provides more info.  I think you will get  a message saying rdmacm CPC is 
> excluded owing to the requested thread support level.  There may be info 
> about why udcm is not selected as well.
> 
> Third, what sort of RoCE devices are available on your system?  The output 
> from ibv_devinfo may be useful. 
> 
> As for UCX,  if it’s the version that came with your ubuntu release 18.0.4 it 
> may be pretty old.  It's likely that UCX has not been tested on the RoCE 
> devices on your system.
> 
> You can run 
> 
> ucx_info -v
> 
> to check the version number of UCX that you are picking up.
> 
> You can download the latest release of UCX at
> 
> https://github.com/openucx/ucx/releases/tag/v1.12.0
> 
> Instructions for how to build are in the README.md at 
> https://github.com/openucx/ucx.
> You will want to configure with 
> 
> contrib/configure-release-mt --enable-gtest
> 
> You want to add the --enable-gtest to the configure options so that you can 
> run the ucx sanity checks.   Note this takes quite a while to run but is 
> pretty thorough at validating your UCX build. 
> You'll want to run this test on one of the nodes with a RoCE device -  
> 
> ucx_info -d
> 
> This will show which UCX transports/devices are available.
> 
> See the Running internal unit tests section of the README.md
> 
> Hope this helps,
> 
> Howard
> 
> 
> On 2/3/22, 8:46 AM, "Jose E. Roman" <jro...@dsic.upv.es> wrote:
> 
>    Thanks. The verbose output is:
> 
>    [kahan01.upvnet.upv.es:29732] mca: base: components_register: registering 
> framework btl components
>    [kahan01.upvnet.upv.es:29732] mca: base: components_register: found loaded 
> component self
>    [kahan01.upvnet.upv.es:29732] mca: base: components_register: component 
> self register function successful
>    [kahan01.upvnet.upv.es:29732] mca: base: components_register: found loaded 
> component sm
>    [kahan01.upvnet.upv.es:29732] mca: base: components_register: found loaded 
> component openib
>    [kahan01.upvnet.upv.es:29732] mca: base: components_register: component 
> openib register function successful
>    [kahan01.upvnet.upv.es:29732] mca: base: components_register: found loaded 
> component vader
>    [kahan01.upvnet.upv.es:29732] mca: base: components_register: component 
> vader register function successful
>    [kahan01.upvnet.upv.es:29732] mca: base: components_register: found loaded 
> component tcp
>    [kahan01.upvnet.upv.es:29732] mca: base: components_register: component 
> tcp register function successful
>    [kahan01.upvnet.upv.es:29732] mca: base: components_open: opening btl 
> components
>    [kahan01.upvnet.upv.es:29732] mca: base: components_open: found loaded 
> component self
>    [kahan01.upvnet.upv.es:29732] mca: base: components_open: component self 
> open function successful
>    [kahan01.upvnet.upv.es:29732] mca: base: components_open: found loaded 
> component openib
>    [kahan01.upvnet.upv.es:29732] mca: base: components_open: component openib 
> open function successful
>    [kahan01.upvnet.upv.es:29732] mca: base: components_open: found loaded 
> component vader
>    [kahan01.upvnet.upv.es:29732] mca: base: components_open: component vader 
> open function successful
>    [kahan01.upvnet.upv.es:29732] mca: base: components_open: found loaded 
> component tcp
>    [kahan01.upvnet.upv.es:29732] mca: base: components_open: component tcp 
> open function successful
>    [kahan01.upvnet.upv.es:29732] select: initializing btl component self
>    [kahan01.upvnet.upv.es:29732] select: init of component self returned 
> success
>    [kahan01.upvnet.upv.es:29732] select: initializing btl component openib
>    [kahan01.upvnet.upv.es:29732] Checking distance from this process to 
> device=qedr0
>    [kahan01.upvnet.upv.es:29732] hwloc_distances->nbobjs=4
>    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[0]=10
>    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[1]=16
>    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[2]=16
>    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[3]=16
>    [kahan01.upvnet.upv.es:29732] ibv_obj->type set to NULL
>    [kahan01.upvnet.upv.es:29732] Process is bound: distance to device is 
> 0.000000
>    [kahan01.upvnet.upv.es:29732] Checking distance from this process to 
> device=qedr1
>    [kahan01.upvnet.upv.es:29732] hwloc_distances->nbobjs=4
>    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[0]=10
>    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[1]=16
>    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[2]=16
>    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[3]=16
>    [kahan01.upvnet.upv.es:29732] ibv_obj->type set to NULL
>    [kahan01.upvnet.upv.es:29732] Process is bound: distance to device is 
> 0.000000
>    [kahan01.upvnet.upv.es:29732] openib BTL: rdmacm CPC unavailable for use 
> on qedr0:1; skipped
>    --------------------------------------------------------------------------
>    No OpenFabrics connection schemes reported that they were able to be
>    used on a specific port.  As such, the openib BTL (OpenFabrics
>    support) will be disabled for this port.
> 
>      Local host:           kahan01
>      Local device:         qedr0
>      Local port:           1
>      CPCs attempted:       rdmacm, udcm
>    --------------------------------------------------------------------------
>    [kahan01.upvnet.upv.es:29732] select: init of component openib returned 
> failure
>    [kahan01.upvnet.upv.es:29732] mca: base: close: component openib closed
>    [kahan01.upvnet.upv.es:29732] mca: base: close: unloading component openib
>    [kahan01.upvnet.upv.es:29732] select: initializing btl component vader
>    [kahan01.upvnet.upv.es:29732] select: init of component vader returned 
> failure
>    [kahan01.upvnet.upv.es:29732] mca: base: close: component vader closed
>    [kahan01.upvnet.upv.es:29732] mca: base: close: unloading component vader
>    [kahan01.upvnet.upv.es:29732] select: initializing btl component tcp
>    [kahan01.upvnet.upv.es:29732] btl: tcp: Searching for exclude 
> address+prefix: 127.0.0.1 / 8
>    [kahan01.upvnet.upv.es:29732] btl: tcp: Found match: 127.0.0.1 (lo)
>    [kahan01.upvnet.upv.es:29732] btl:tcp: Attempting to bind to AF_INET port 
> 1024
>    [kahan01.upvnet.upv.es:29732] btl:tcp: Successfully bound to AF_INET port 
> 1024
>    [kahan01.upvnet.upv.es:29732] btl:tcp: my listening v4 socket is 
> 0.0.0.0:1024
>    [kahan01.upvnet.upv.es:29732] btl:tcp: examining interface eno1
>    [kahan01.upvnet.upv.es:29732] btl:tcp: using ipv6 interface eno1
>    [kahan01.upvnet.upv.es:29732] btl:tcp: examining interface eno5
>    [kahan01.upvnet.upv.es:29732] btl:tcp: using ipv6 interface eno5
>    [kahan01.upvnet.upv.es:29732] select: init of component tcp returned 
> success
>    [kahan01.upvnet.upv.es:29732] mca: bml: Using self btl for send to 
> [[45435,1],0] on node kahan01
>    Hello world from process 0 of 1, provided=1
>    [kahan01.upvnet.upv.es:29732] mca: base: close: component self closed
>    [kahan01.upvnet.upv.es:29732] mca: base: close: unloading component self
>    [kahan01.upvnet.upv.es:29732] mca: base: close: component tcp closed
>    [kahan01.upvnet.upv.es:29732] mca: base: close: unloading component tcp
> 
> 
>    Regarding UCX, at some point I tried but IIRC the installation of UCX in 
> this machine does not work for some reason. Is there an easy way to check if 
> UCX works well before installing Open MPI?
> 
>    Jose
> 
> 
> 
>> El 3 feb 2022, a las 16:38, Pritchard Jr., Howard <howa...@lanl.gov> 
>> escribió:
>> 
>> Hello Jose,
>> 
>> I suspect the issue here is that the OpenIB BTl isn't finding a connection 
>> module when you are requesting MPI_THREAD_MULTIPLE.
>> The rdmacm connection is deselected if MPI_THREAD_MULTIPLE thread support 
>> level is being requested.
>> 
>> If you run the test in a shell with
>> 
>> export OMPI_MCA_btl_base_verbose=100
>> 
>> there may be some more info to help diagnose what's going on.
>> 
>> Another option would be to build Open MPI with UCX support.  That's the 
>> better way to use Open MPI over IB/RoCE.
>> 
>> Howard
>> 
>> On 2/2/22, 10:52 AM, "users on behalf of Jose E. Roman via users" 
>> <users-boun...@lists.open-mpi.org on behalf of users@lists.open-mpi.org> 
>> wrote:
>> 
>>   Hi.
>> 
>>   I am using Open MPI 4.1.1 with the openib BTL on a 4-node cluster with 
>> Ethernet 10/25Gb (RoCE). It is using libibverbs from Ubuntu 18.04 (kernel 
>> 4.15.0-166-generic).
>> 
>>   With this hello world example:
>> 
>>   #include <stdio.h>
>>   #include <mpi.h>
>>   int main (int argc, char *argv[])
>>   {
>>    int rank, size, provided;
>>    MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
>>    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>    MPI_Comm_size(MPI_COMM_WORLD, &size);
>>    printf("Hello world from process %d of %d, provided=%d\n", rank, size, 
>> provided);
>>    MPI_Finalize();
>>    return 0;
>>   }
>> 
>>   I get the following output when run on one node:
>> 
>>   $ ./hellow
>>   --------------------------------------------------------------------------
>>   No OpenFabrics connection schemes reported that they were able to be
>>   used on a specific port.  As such, the openib BTL (OpenFabrics
>>   support) will be disabled for this port.
>> 
>>    Local host:           kahan01
>>    Local device:         qedr0
>>    Local port:           1
>>    CPCs attempted:       rdmacm, udcm
>>   --------------------------------------------------------------------------
>>   Hello world from process 0 of 1, provided=1
>> 
>> 
>>   The message does not appear if I run on the front-end (does not have RoCE 
>> network) or if I run it on the node either using MPI_Init() instead of 
>> MPI_Init_thread() or using MPI_THREAD_SINGLE instead of MPI_THREAD_FUNNELED.
>> 
>>   Is there any reason why MPI_Init_thread() is behaving differently to 
>> MPI_Init()? Note that I am not using threads, and just one MPI process.
>> 
>> 
>>   The question has a second part: is there a way to determine (without 
>> running an MPI program) that MPI_Init_thread() won't work but MPI_Init() 
>> will work? I am asking this because PETSc programs default to use 
>> MPI_Init_thread() when PETSc's configure script finds the MPI_Init_thread() 
>> symbol in the MPI library. But in situations like the one reported here, it 
>> would be better to revert to MPI_Init() since MPI_Init_thread() will not 
>> work as expected. [The configure script cannot run an MPI program due to 
>> batch systems.]
>> 
>>   Thanks for your help.
>>   Jose
>> 
> 
> 

Reply via email to