HI Jose,

I bet this device has not been tested with ucx.  

You may want to join the ucx users mail list at

https://elist.ornl.gov/mailman/listinfo/ucx-group

and ask whether this Marvell device has been tested and workarounds for 
disabling features that this device doesn't support.

Again though, you really may want to first see if the TCP btl will be good 
enough for your cluster. 

Howard

On 2/4/22, 8:03 AM, "Jose E. Roman" <jro...@dsic.upv.es> wrote:

    Howard,

    I don't have much time now to try with --enable-debug.

    The RoCE device we have is FastLinQ QL41000 Series 10/25/40/50GbE Controller
    The output of ibv_devinfo is:
    hca_id:     qedr0
        transport:                      InfiniBand (0)
        fw_ver:                         8.20.0.0
        node_guid:                      2267:7cff:fe11:4a50
        sys_image_guid:                 2267:7cff:fe11:4a50
        vendor_id:                      0x1077
        vendor_part_id:                 32880
        hw_ver:                         0x0
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

    hca_id:     qedr1
        transport:                      InfiniBand (0)
        fw_ver:                         8.20.0.0
        node_guid:                      2267:7cff:fe11:4a51
        sys_image_guid:                 2267:7cff:fe11:4a51
        vendor_id:                      0x1077
        vendor_part_id:                 32880
        hw_ver:                         0x0
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

    Regarding UCX, we have tried with the latest version. Compilation goes 
through, but the ucv_info command gives an error:

    # Memory domain: qedr0
    #     Component: ib
    #             register: unlimited, cost: 180 nsec
    #           remote key: 8 bytes
    #           local memory handle is required for zcopy
    #
    #      Transport: rc_verbs
    #         Device: qedr0:1
    #           Type: network
    #  System device: qedr0 (0)
    [1643982133.674556] [kahan01:8217 :0]        rc_iface.c:505  UCX ERROR 
ibv_create_srq() failed: Function not implemented
    #   < failed to open interface >
    #
    #      Transport: ud_verbs
    #         Device: qedr0:1
    #           Type: network
    #  System device: qedr0 (0)
    [qelr_create_qp:545]create qp: failed on ibv_cmd_create_qp with 22
    [1643982133.681169] [kahan01:8217 :0]        ib_iface.c:994  UCX ERROR 
iface=0x56074944bf10: failed to create UD QP TX wr:256 sge:6 inl:64 resp:0 RX 
wr:4096 sge:1 resp:0: Invalid argument
    #   < failed to open interface >
    #
    # Memory domain: qedr1
    #     Component: ib
    #             register: unlimited, cost: 180 nsec
    #           remote key: 8 bytes
    #           local memory handle is required for zcopy
    #   < no supported devices found >


    Any idea what the error in ibv_create_srq() means?

    Thanks for your help.
    Jose



    > El 3 feb 2022, a las 17:52, Pritchard Jr., Howard <howa...@lanl.gov> 
escribió:
    > 
    > Hi Jose,
    > 
    > A number of things.  
    > 
    > First for recent versions of Open MPI including the 4.1.x release stream, 
MPI_THREAD_MULTIPLE is supported by default.  However, some transport options 
available when using MPI_Init may not be available when requesting 
MPI_THREAD_MULTIPLE.
    > You may want to let Open MPI trundle along with tcp used for inter-node 
messaging and see if your application performs well enough. For a small system 
tcp may well suffice. 
    > 
    > Second, if you want to pursue this further, you want to rebuild Open MPI 
with --enable-debug.  The debug output will be considerably more verbose and 
provides more info.  I think you will get  a message saying rdmacm CPC is 
excluded owing to the requested thread support level.  There may be info about 
why udcm is not selected as well.
    > 
    > Third, what sort of RoCE devices are available on your system?  The 
output from ibv_devinfo may be useful. 
    > 
    > As for UCX,  if it’s the version that came with your ubuntu release 
18.0.4 it may be pretty old.  It's likely that UCX has not been tested on the 
RoCE devices on your system.
    > 
    > You can run 
    > 
    > ucx_info -v
    > 
    > to check the version number of UCX that you are picking up.
    > 
    > You can download the latest release of UCX at
    > 
    > https://github.com/openucx/ucx/releases/tag/v1.12.0
    > 
    > Instructions for how to build are in the README.md at 
https://github.com/openucx/ucx.
    > You will want to configure with 
    > 
    > contrib/configure-release-mt --enable-gtest
    > 
    > You want to add the --enable-gtest to the configure options so that you 
can run the ucx sanity checks.   Note this takes quite a while to run but is 
pretty thorough at validating your UCX build. 
    > You'll want to run this test on one of the nodes with a RoCE device -  
    > 
    > ucx_info -d
    > 
    > This will show which UCX transports/devices are available.
    > 
    > See the Running internal unit tests section of the README.md
    > 
    > Hope this helps,
    > 
    > Howard
    > 
    > 
    > On 2/3/22, 8:46 AM, "Jose E. Roman" <jro...@dsic.upv.es> wrote:
    > 
    >    Thanks. The verbose output is:
    > 
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_register: 
registering framework btl components
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_register: found 
loaded component self
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_register: 
component self register function successful
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_register: found 
loaded component sm
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_register: found 
loaded component openib
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_register: 
component openib register function successful
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_register: found 
loaded component vader
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_register: 
component vader register function successful
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_register: found 
loaded component tcp
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_register: 
component tcp register function successful
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_open: opening btl 
components
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_open: found loaded 
component self
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_open: component 
self open function successful
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_open: found loaded 
component openib
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_open: component 
openib open function successful
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_open: found loaded 
component vader
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_open: component 
vader open function successful
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_open: found loaded 
component tcp
    >    [kahan01.upvnet.upv.es:29732] mca: base: components_open: component 
tcp open function successful
    >    [kahan01.upvnet.upv.es:29732] select: initializing btl component self
    >    [kahan01.upvnet.upv.es:29732] select: init of component self returned 
success
    >    [kahan01.upvnet.upv.es:29732] select: initializing btl component openib
    >    [kahan01.upvnet.upv.es:29732] Checking distance from this process to 
device=qedr0
    >    [kahan01.upvnet.upv.es:29732] hwloc_distances->nbobjs=4
    >    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[0]=10
    >    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[1]=16
    >    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[2]=16
    >    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[3]=16
    >    [kahan01.upvnet.upv.es:29732] ibv_obj->type set to NULL
    >    [kahan01.upvnet.upv.es:29732] Process is bound: distance to device is 
0.000000
    >    [kahan01.upvnet.upv.es:29732] Checking distance from this process to 
device=qedr1
    >    [kahan01.upvnet.upv.es:29732] hwloc_distances->nbobjs=4
    >    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[0]=10
    >    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[1]=16
    >    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[2]=16
    >    [kahan01.upvnet.upv.es:29732] hwloc_distances->values[3]=16
    >    [kahan01.upvnet.upv.es:29732] ibv_obj->type set to NULL
    >    [kahan01.upvnet.upv.es:29732] Process is bound: distance to device is 
0.000000
    >    [kahan01.upvnet.upv.es:29732] openib BTL: rdmacm CPC unavailable for 
use on qedr0:1; skipped
    >    
--------------------------------------------------------------------------
    >    No OpenFabrics connection schemes reported that they were able to be
    >    used on a specific port.  As such, the openib BTL (OpenFabrics
    >    support) will be disabled for this port.
    > 
    >      Local host:           kahan01
    >      Local device:         qedr0
    >      Local port:           1
    >      CPCs attempted:       rdmacm, udcm
    >    
--------------------------------------------------------------------------
    >    [kahan01.upvnet.upv.es:29732] select: init of component openib 
returned failure
    >    [kahan01.upvnet.upv.es:29732] mca: base: close: component openib closed
    >    [kahan01.upvnet.upv.es:29732] mca: base: close: unloading component 
openib
    >    [kahan01.upvnet.upv.es:29732] select: initializing btl component vader
    >    [kahan01.upvnet.upv.es:29732] select: init of component vader returned 
failure
    >    [kahan01.upvnet.upv.es:29732] mca: base: close: component vader closed
    >    [kahan01.upvnet.upv.es:29732] mca: base: close: unloading component 
vader
    >    [kahan01.upvnet.upv.es:29732] select: initializing btl component tcp
    >    [kahan01.upvnet.upv.es:29732] btl: tcp: Searching for exclude 
address+prefix: 127.0.0.1 / 8
    >    [kahan01.upvnet.upv.es:29732] btl: tcp: Found match: 127.0.0.1 (lo)
    >    [kahan01.upvnet.upv.es:29732] btl:tcp: Attempting to bind to AF_INET 
port 1024
    >    [kahan01.upvnet.upv.es:29732] btl:tcp: Successfully bound to AF_INET 
port 1024
    >    [kahan01.upvnet.upv.es:29732] btl:tcp: my listening v4 socket is 
0.0.0.0:1024
    >    [kahan01.upvnet.upv.es:29732] btl:tcp: examining interface eno1
    >    [kahan01.upvnet.upv.es:29732] btl:tcp: using ipv6 interface eno1
    >    [kahan01.upvnet.upv.es:29732] btl:tcp: examining interface eno5
    >    [kahan01.upvnet.upv.es:29732] btl:tcp: using ipv6 interface eno5
    >    [kahan01.upvnet.upv.es:29732] select: init of component tcp returned 
success
    >    [kahan01.upvnet.upv.es:29732] mca: bml: Using self btl for send to 
[[45435,1],0] on node kahan01
    >    Hello world from process 0 of 1, provided=1
    >    [kahan01.upvnet.upv.es:29732] mca: base: close: component self closed
    >    [kahan01.upvnet.upv.es:29732] mca: base: close: unloading component 
self
    >    [kahan01.upvnet.upv.es:29732] mca: base: close: component tcp closed
    >    [kahan01.upvnet.upv.es:29732] mca: base: close: unloading component tcp
    > 
    > 
    >    Regarding UCX, at some point I tried but IIRC the installation of UCX 
in this machine does not work for some reason. Is there an easy way to check if 
UCX works well before installing Open MPI?
    > 
    >    Jose
    > 
    > 
    > 
    >> El 3 feb 2022, a las 16:38, Pritchard Jr., Howard <howa...@lanl.gov> 
escribió:
    >> 
    >> Hello Jose,
    >> 
    >> I suspect the issue here is that the OpenIB BTl isn't finding a 
connection module when you are requesting MPI_THREAD_MULTIPLE.
    >> The rdmacm connection is deselected if MPI_THREAD_MULTIPLE thread 
support level is being requested.
    >> 
    >> If you run the test in a shell with
    >> 
    >> export OMPI_MCA_btl_base_verbose=100
    >> 
    >> there may be some more info to help diagnose what's going on.
    >> 
    >> Another option would be to build Open MPI with UCX support.  That's the 
better way to use Open MPI over IB/RoCE.
    >> 
    >> Howard
    >> 
    >> On 2/2/22, 10:52 AM, "users on behalf of Jose E. Roman via users" 
<users-boun...@lists.open-mpi.org on behalf of users@lists.open-mpi.org> wrote:
    >> 
    >>   Hi.
    >> 
    >>   I am using Open MPI 4.1.1 with the openib BTL on a 4-node cluster with 
Ethernet 10/25Gb (RoCE). It is using libibverbs from Ubuntu 18.04 (kernel 
4.15.0-166-generic).
    >> 
    >>   With this hello world example:
    >> 
    >>   #include <stdio.h>
    >>   #include <mpi.h>
    >>   int main (int argc, char *argv[])
    >>   {
    >>    int rank, size, provided;
    >>    MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
    >>    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    >>    MPI_Comm_size(MPI_COMM_WORLD, &size);
    >>    printf("Hello world from process %d of %d, provided=%d\n", rank, 
size, provided);
    >>    MPI_Finalize();
    >>    return 0;
    >>   }
    >> 
    >>   I get the following output when run on one node:
    >> 
    >>   $ ./hellow
    >>   
--------------------------------------------------------------------------
    >>   No OpenFabrics connection schemes reported that they were able to be
    >>   used on a specific port.  As such, the openib BTL (OpenFabrics
    >>   support) will be disabled for this port.
    >> 
    >>    Local host:           kahan01
    >>    Local device:         qedr0
    >>    Local port:           1
    >>    CPCs attempted:       rdmacm, udcm
    >>   
--------------------------------------------------------------------------
    >>   Hello world from process 0 of 1, provided=1
    >> 
    >> 
    >>   The message does not appear if I run on the front-end (does not have 
RoCE network) or if I run it on the node either using MPI_Init() instead of 
MPI_Init_thread() or using MPI_THREAD_SINGLE instead of MPI_THREAD_FUNNELED.
    >> 
    >>   Is there any reason why MPI_Init_thread() is behaving differently to 
MPI_Init()? Note that I am not using threads, and just one MPI process.
    >> 
    >> 
    >>   The question has a second part: is there a way to determine (without 
running an MPI program) that MPI_Init_thread() won't work but MPI_Init() will 
work? I am asking this because PETSc programs default to use MPI_Init_thread() 
when PETSc's configure script finds the MPI_Init_thread() symbol in the MPI 
library. But in situations like the one reported here, it would be better to 
revert to MPI_Init() since MPI_Init_thread() will not work as expected. [The 
configure script cannot run an MPI program due to batch systems.]
    >> 
    >>   Thanks for your help.
    >>   Jose
    >> 
    > 
    > 


Reply via email to