Hello all, The very simple code below returns mpiRC = 15.
const std::array< double, 2 > rangeMin { minX, minY }; std::array< double, 2 > rangeTempRecv { 0.0, 0.0 }; int mpiRC = MPI_Allreduce( rangeMin.data(), rangeTempRecv.data(), rangeMin.size(), MPI_DOUBLE, MPI_MIN, PETSC_COMM_WORLD ); Some information before my questions: 1. The environment I am running this code has hundreds of compute nodes, each node with 4 MPI ranks. 2. It is running in the cloud, so it is tricky to get extra information "on the fly". 3. I am using OpenMPI 4.1.2 + PETSc 3.16.5 + GNU compilers. 4. The error happens consistently at the same point in the execution, at ranks 1 and 2 only (out of hundreds of MPI ranks). 5. By the time the execution gets to the code above, the execution has already called PetscInitialize() and many MPI routines successfully 6. Before the call to MPI_Allreduce() above, the code calls MPI_Barrier(). So, all nodes call MPI_Allreduce() 7. At https://www.open-mpi.org/doc/current/man3/OpenMPI.3.php it is written "MPI_ERR_TRUNCATE 15 Message truncated on receive." 8. At https://www.open-mpi.org/doc/v4.1/man3/MPI_Allreduce.3.php, it is written "The reduction functions ( MPI_Op ) do not return an error value. As a result, if the functions detect an error, all they can do is either call MPI_Abort<https://www.open-mpi.org/doc/v4.1/man3/MPI_Abort.3.php> or silently skip the problem. Thus, if you change the error handler from MPI_ERRORS_ARE_FATAL to something else, for example, MPI_ERRORS_RETURN , then no error may be indicated." Questions: 1. Any ideas for what could be the cause for the return code 15? The code is pretty simple and the buffers have fixed size = 2. 2. In view of item (8), does it mean that the return code 15 in item (7) might not be informative? 3. Once I get a return code != MPI_SUCCESS, is there any routine I can call, in the application code, to get extra information on MPI? 4. Once the application aborts (I throw an exception once a return code is != MPI_SUCESS), is there some command line I can run on all nodes in order to get extra info? Thank you in advance, Ernesto. Schlumberger-Private