Hi.
I am using Open MPI 4.1.1 with the openib BTL on a 4-node cluster with Ethernet
10/25Gb (RoCE). It is using libibverbs from Ubuntu 18.04 (kernel
4.15.0-166-generic).
With this hello world example:
#include <stdio.h>
#include <mpi.h>
int main (int argc, char *argv[])
{
int rank, size, provided;
MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("Hello world from process %d of %d, provided=%d\n", rank, size,
provided);
MPI_Finalize();
return 0;
}
I get the following output when run on one node:
$ ./hellow
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: kahan01
Local device: qedr0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
Hello world from process 0 of 1, provided=1
The message does not appear if I run on the front-end (does not have RoCE
network) or if I run it on the node either using MPI_Init() instead of
MPI_Init_thread() or using MPI_THREAD_SINGLE instead of MPI_THREAD_FUNNELED.
Is there any reason why MPI_Init_thread() is behaving differently to
MPI_Init()? Note that I am not using threads, and just one MPI process.
The question has a second part: is there a way to determine (without running an
MPI program) that MPI_Init_thread() won't work but MPI_Init() will work? I am
asking this because PETSc programs default to use MPI_Init_thread() when
PETSc's configure script finds the MPI_Init_thread() symbol in the MPI library.
But in situations like the one reported here, it would be better to revert to
MPI_Init() since MPI_Init_thread() will not work as expected. [The configure
script cannot run an MPI program due to batch systems.]
Thanks for your help.
Jose