Alois,

Thanks for the report.

FWIW, I am not seeing any errors on my Mac with Open MPI from brew (4.1.3)

How many MPI tasks are you running?
Can you please confirm you can evidence the error with

mpirun -np <number_of_processes> ./mpi_test_suite -d MPI_TYPE_MIX_ARRAY -c
0 -t collective


Also, can you try the same command with
mpirun --mca pml ob1 --mca btl tcp,self ...

Cheers,

Gilles

On Tue, May 3, 2022 at 7:08 PM Alois Schlögl via users <
users@lists.open-mpi.org> wrote:

>
> Within our cluster (debian10/slurm16, debian11/slurm20), with
> infiniband, and we have several instances of openmpi installed through
> the Lmod module system. When testing the openmpi installations with the
> mpi-test-suite 1.1 [1], it shows errors like these
>
> ...
> Rank:0) tst_test_array[45]:Allreduce Min/Max with MPI_IN_PLACE
> (Rank:0) tst_test_array[46]:Allreduce Sum
> (Rank:0) tst_test_array[47]:Alltoall
> Number of failed tests: 130
> Summary of failed tests:
> ERROR class:P2P test:Ring Send Pack (7), comm Duplicated MPI_COMM_WORLD
> (4), type MPI_TYPE_MIX (27) number of values:1000
> ERROR class:P2P test:Ring Send Pack (7), comm Duplicated MPI_COMM_WORLD
> (4), type MPI_TYPE_MIX_ARRAY (28) number of values:1000
> ...
>
> when using openmpi/4.1.x (i tested with 4.1.1 and 4.1.3)  The number of
> errors may vary, but the first errors are always about
>     ERROR class:P2P test:Ring Send Pack (7), comm Duplicated MPI_COMM_WORLD
>
> When testing on openmpi/3.1.3, the tests runs successfully, and there
> are no failed tests.
>
> Typically, the openmpi/4.1.x installation is configured with
>          ./configure --prefix=${PREFIX} \
>                  --with-ucx=$UCX_HOME \
>                  --enable-orterun-prefix-by-default  \
>                  --enable-mpi-cxx \
>                  --with-hwloc \
>                  --with-pmi \
>                  --with-pmix \
>                  --with-cuda=$CUDA_HOME \
>                  --with-slurm
>
> but I've also tried different compilation options including w/ and w/o
> --enable-mpi1-compatibility, w/ and w/o ucx, using hwloc from the OS, or
> compiled from source. But I could not identify any pattern.
>
> Therefore, I'd like asking you what the issue might be. Specifically,
> I'm would like to know:
>
> - Am I right in assuming that mpi-test-suite [1] suitable for testing
> openmpi ?
> - what are possible causes for these type of errors ?
> - what would you recommend how to debug these issues ?
>
> Kind regards,
>    Alois
>
>
> [1] https://github.com/open-mpi/mpi-test-suite/t
>
>

Reply via email to