I came across this. openmpi-4.0.1 compiled with:

../openmpi-4.0.1/configure --disable-mpi-fortran --without-cuda
--disable-opencl --with-ucx=/path/to/ucx-1.5.1

The execution of the attached program (simple mpi_send / mpi_recv pair)
gives a segfault when the message size exceeds 2^30. I'm seeing the failure
on debian10 nodes connected with 1G ethernet and mellanox IB FDR
(ConnectX-3). On another cluster with omnipath interconnect, the test
passes fine. Both have ipoib configured.

node0 ~ $ mpiexec -machinefile /tmp/hosts -n 2 --mca btl tcp,self --mca mtl
ofi --mca pml ^ucx  ./a.out 1200000000

Maybe this btl/pml/mtl combination is nonsensical, I don't know. What
annoys me is that the following failure:
 1 - occurs only for large messages, not for smaller test runs
 2 - is not recoverable via MPI_ERRORS_RETURN


[node0:9791 :0:9791] Caught signal 11 (Segmentation fault: address not
mapped to object at address (nil))
==== backtrace ====
    0  /path/to/ucx-1.5.1/lib/libucs.so.0(+0x1dee0) [0x7f21e2b01ee0]
    1  /path/to/ucx-1.5.1/lib/libucs.so.0(+0x1e188) [0x7f21e2b02188]
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpiexec noticed that process rank 1 with PID 0 on node node0 exited on
signal 11 (Segmentation fault).

Running this under gdb, it seems that the backtrace just points to the ucs
signal handler, and that the cause of the segv is there
(ompi/mca/mtl/ofi/mtl_ofi.h:107) :
        } else if (OPAL_UNLIKELY(ret == -FI_EAVAIL)) {
             * An error occured and is being reported via the CQ.
             * Read the error and forward it to the upper layer.
            ret = ofi_req->error_callback(&error, ofi_req);

with ofi_req->error_callback being unfortunately NULL.

Is it really just me doing something absolutely silly, or is it something
that ought to be fixed ?


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>

int main(int argc, char * argv[])
    size_t chunk = 3<<29;
    if (argc > 1)
        chunk = atol(argv[1]);

    int rank;
    int size;

    MPI_Init(&argc, &argv);
    MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    void * data = malloc(chunk);
    memset(data, 0x42, chunk);
    if (rank == 0) {
        MPI_Send(data, chunk, MPI_BYTE, 1, 0xbeef, MPI_COMM_WORLD);
    } else {
        MPI_Recv(data, chunk, MPI_BYTE, 0, 0xbeef, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

