Ernesto, the coll/tuned module (that should handle collective subroutines by default) has a known issue when matching but non identical signatures are used: for example, one rank uses one vector of n bytes, and an other rank uses n bytes. Is there a chance your application might use this pattern?
You can give try disabling this component with mpirun --mca coll ^tuned ... I noted between the successful a) case and the unsuccessful b) case, you changed 3 parameters: - compiler vendor - Open MPI version - PETSc 3.10.4 so at this stage, it is not obvious which should be blamed for the failure. In order to get a better picture, I would first try - Intel compilers - Open MPI 4.1.2 - PETSc 3.10.4 => a failure would suggest a regression in Open MPI And then - Intel compilers - Open MPI 4.0.3 - PETSc 3.16.5 => a failure would either suggest a regression in PETSc, or PETSc doing something different but legit that evidences a bug in Open MPI. If you have time, you can also try - Intel compilers - MPICH (or a derivative such as Intel MPI) - PETSc 3.16.5 => a success would strongly point to Open MPI Cheers, Gilles On Mon, Mar 14, 2022 at 2:56 PM Ernesto Prudencio via users < users@lists.open-mpi.org> wrote: > Forgot to mention that in all 3 situations, mpirun is called as follows > (35 nodes, 4 MPI ranks per node): > > > > mpirun -x LD_LIBRARY_PATH=:<PATH1>:<PATH2>:… -hostfile /tmp/hostfile.txt > -np 140 -npernode 4 --mca btl_tcp_if_include eth0 <APPLICATION_PATH> > <APPLICATION OPTIONS> > > > > So I have a question 3) Should I add some extra option in the mpirun > command line in order to make situation 2 successful? > > > > Thanks, > > > > Ernesto. > > > > > > Schlumberger-Private > > *From:* users <users-boun...@lists.open-mpi.org> *On Behalf Of *Ernesto > Prudencio via users > *Sent:* Monday, March 14, 2022 12:39 AM > *To:* Open MPI Users <users@lists.open-mpi.org> > *Cc:* Ernesto Prudencio <epruden...@slb.com> > *Subject:* Re: [OMPI users] [Ext] Re: Call to MPI_Allreduce() returning > value 15 > > > > Thank you for the quick answer, George. I wanted to investigate the > problem further before replying. > > > > Below I show 3 situations of my C++ (and Fortran) application, which runs > on top of PETSc, OpenMPI, and MKL. All 3 situations use MKL 2019.0.5 > compiled with INTEL. > > > > At the end, I have 2 questions. > > > > Note: all codes are compiled in a certain set of nodes, and the execution > happens at _*another*_ set of nodes. > > > > +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - > > > > Situation 1) It has been successful for months now: > > > > a) Use INTEL compilers for OpenMPI 4.0.3, PETSc 3.10.4 , and application. > The configuration options for OpenMPI are: > > > > '--with-flux-pmi=no' '--enable-orterun-prefix-by-default' > '--prefix=/mnt/disks/intel-2018-3-222-blade-runtime-env-2018-1-07-08-2018-132838/openmpi_4.0.3_intel2019.5_gcc7.3.1' > 'FC=ifort' 'CC=gcc' > > > > b) At run time, each MPI rank prints this info: > > > > PATH = > /opt/openmpi_4.0.3/bin:/opt/openmpi_4.0.3/bin:/opt/openmpi_4.0.3/bin:/opt/rh/devtoolset-7/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin > > > > LD_LIBRARY_PATH = > /opt/openmpi_4.0.3/lib::/opt/rh/devtoolset-7/root/usr/lib64:/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7:/opt/petsc/lib:/opt/2019.5/compilers_and_libraries/linux/mkl/lib/intel64:/opt/openmpi_4.0.3/lib:/lib64:/lib:/usr/lib64:/usr/lib > > > > MPI version (compile time) = 4.0.3 > > MPI_Get_library_version() = Open MPI v4.0.3, package: Open MPI > root@<STRING1> > Distribution, ident: 4.0.3, repo rev: v4.0.3, Mar 03, 2020 > > PETSc version (compile time) = 3.10.4 > > > > c) A test of 20 minutes with 14 nodes, 4 MPI ranks per node, runs ok. > > > > d) A test of 2 hours with 35 nodes, 4 MPI ranks per node, runs ok. > > > > +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - > > > > Situation 2) This situation is the one failing during execution. > > > > a) Use GNU compilers for OpenMPI 4.1.2, PETSc 3.16.5 , and application. > The configuration options for OpenMPI are: > > > > '--with-flux-pmi=no' '--prefix=/appl-third-parties/openmpi-4.1.2' > '--enable-orterun-prefix-by-default' > > > > b) At run time, each MPI rank prints this info: > > > > PATH = /appl-third-parties/openmpi-4.1.2/bin > :/appl-third-parties/openmpi-4.1.2/bin:/appl-third-parties/openmpi-4.1.2/bin:/opt/rh/devtoolset-7/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin > > > > LD_LIBRARY_PATH = /appl-third-parties/openmpi-4.1.2/lib > ::/opt/rh/devtoolset-7/root/usr/lib64:/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7:/appl-third-parties/petsc-3.16.5/lib > > > :/opt/2019.5/compilers_and_libraries/linux/mkl/lib/intel64:/appl-third-parties/openmpi-4.1.2/lib:/lib64:/lib:/usr/lib64:/usr/lib > > > > MPI version (compile time) = 4.1.2 > > MPI_Get_library_version() = Open MPI v4.1.2, package: Open MPI > root@<STRING2> > Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021 > > PETSc version (compile time) = 3.16.5 > > PetscGetVersion() = Petsc Release Version 3.16.5, Mar > 04, 2022 > > PetscGetVersionNumber() = 3.16.5 > > > > c) Same as (1.c) > > > > d) Test with 35 nodes fails: > > d.1) The very first MPI call is a MPI_Allreduce() with MPI_MAX op: it > returns the right values only to rank 0, while all other ranks get value > 0. The routine returns MPI_SUCCESS, though. > > d.2) The second MPI call is a MPI_Allreduce() with MPI_SUM op: again, it > returns the right values only to rank 0, while all other ranks get wrong > values (mostly 0). The routine also returns MPI_SUCCESS, though. > > d.3) The third MPI call is a MPI_Allreduce() with MPI_MIN op: it returns 15 > = MPI_ERR_TRUNCATE. This is the error reported in my first e-mail of > March 9. > > > > +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - > > > > Situation 3) Runs ok!!! > > > > a) Same as (2.a), that is, I continue to compile everything with GNU. > > > > b) At run time, I only change the path of MPI to point to the "old" > /opt/openmpi_4.0.3 compiled with INTEL. Each MPI rank prints this info: > > > > PATH = /opt/openmpi_4.0.3/bin > :/opt/openmpi_4.0.3/bin:/opt/openmpi_4.0.3/bin:/opt/rh/devtoolset-7/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin > > > > LD_LIBRARY_PATH = /opt/openmpi_4.0.3/lib > ::/opt/rh/devtoolset-7/root/usr/lib64:/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7:/appl-third-parties/petsc-3.16.5/lib:/opt/2019.5/co > > > mpilers_and_libraries/linux/mkl/lib/intel64:/opt/openmpi_4.0.3/lib:/lib64:/lib:/lib64:/lib:/usr/lib64:/usr/lib > > > > MPI version (compile time) = 4.1.2 > > MPI_Get_library_version() = Open MPI v4.0.3, package: Open MPI > root@<STRING1> > Distribution, ident: 4.0.3, repo rev: v4.0.3, Mar 03, 2020 > > PETSc version (compile time) = 3.16.5 (my observation here: this PETSc > was compiled using OpenMPI 4.1.2) > > PetscGetVersion() = Petsc Release Version 3.16.5, Mar > 04, 2022 > > PetscGetVersionNumber() = 3.16.5 > > > > c) Same as (1.c) > > > > d) Same as (1.d) > > > > +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - > > > > Note: at run time, the nodes have both OpenMPI available (4.0.3 compiled > with INTEL, and 4.1.2 compiled with GNU). That is why I can apply the > “trick” of situation 3 above. > > > > Question 1) Am I missing some configuration option on OpenMPI? I have been > using the same OpenMPI configurations options of the stable situation 1. > > > > Question 2) In the failing situation 2, does OpenMPI expect to use some > /opt path, even though there is no PATH variable mentioning the “old” > /opt/openmpi_4.0.3? I mean, could the problem be that I am providing the > “new” OpenMPI 4.1.2 in a path (/appl-thrid-parties/…) that is NOT /opt? > > > > Thank you, > > > > Ernesto. > > > > *From:* George Bosilca <bosi...@icl.utk.edu> > *Sent:* Wednesday, March 9, 2022 1:46 PM > *To:* Open MPI Users <users@lists.open-mpi.org> > *Cc:* Ernesto Prudencio <epruden...@slb.com> > *Subject:* [Ext] Re: [OMPI users] Call to MPI_Allreduce() returning value > 15 > > > > There are two ways the MPI_Allreduce returns MPI_ERR_TRUNCATE: > > 1. it is propagated from one of the underlying point-to-point > communications, which means that at least one of the participants has an > input buffer with a larger size. I know you said the size is fixed, but it > only matters if all processes are in the same blocking MPI_Allreduce. > > 2. The code is not SPMD, and one of your processes calls a different > MPI_Allreduce on the same communicator. > > > > There is no simple way to get more information about this issue. If you > have a version of OMPI compiled in debug mode, you can increase the > verbosity of the collective framework to see if you get more interesting > information. > > > > George. > > > > > > On Wed, Mar 9, 2022 at 2:23 PM Ernesto Prudencio via users < > users@lists.open-mpi.org> wrote: > > Hello all, > > > > The very simple code below returns mpiRC = 15. > > > > const std::array< double, 2 > rangeMin { minX, minY }; > > std::array< double, 2 > rangeTempRecv { 0.0, 0.0 }; > > int mpiRC = MPI_Allreduce( rangeMin.data(), rangeTempRecv.data(), > rangeMin.size(), MPI_DOUBLE, MPI_MIN, PETSC_COMM_WORLD ); > > > > Some information before my questions: > > 1. The environment I am running this code has hundreds of compute > nodes, each node with 4 MPI ranks. > 2. It is running in the cloud, so it is tricky to get extra > information “on the fly”. > 3. I am using OpenMPI 4.1.2 + PETSc 3.16.5 + GNU compilers. > 4. The error happens consistently at the same point in the execution, > at ranks 1 and 2 only (out of hundreds of MPI ranks). > 5. By the time the execution gets to the code above, the execution has > already called PetscInitialize() and many MPI routines successfully > 6. Before the call to MPI_Allreduce() above, the code calls > MPI_Barrier(). So, all nodes call MPI_Allreduce() > 7. At https://www.open-mpi.org/doc/current/man3/OpenMPI.3.php > > <https://urldefense.com/v3/__https:/www.open-mpi.org/doc/current/man3/OpenMPI.3.php__;!!Kjv0uj3L4nM6H-I!wS37Nk1AtIBFQXXmEOtP8UEWGnLUdtL5BB5vOPisS0qoHGf7Pmq6bE3Eo-Xebw$> > it is written “MPI_ERR_TRUNCATE 15 Message truncated on > receive.” > 8. At https://www.open-mpi.org/doc/v4.1/man3/MPI_Allreduce.3.php > > <https://urldefense.com/v3/__https:/www.open-mpi.org/doc/v4.1/man3/MPI_Allreduce.3.php__;!!Kjv0uj3L4nM6H-I!wS37Nk1AtIBFQXXmEOtP8UEWGnLUdtL5BB5vOPisS0qoHGf7Pmq6bE2WQh4XoA$>, > it is written “The reduction functions ( *MPI_Op* ) do not return an > error value. As a result, if the functions detect an error, all they can do > is either call *MPI_Abort* > > <https://urldefense.com/v3/__https:/www.open-mpi.org/doc/v4.1/man3/MPI_Abort.3.php__;!!Kjv0uj3L4nM6H-I!wS37Nk1AtIBFQXXmEOtP8UEWGnLUdtL5BB5vOPisS0qoHGf7Pmq6bE19olVdVw$> > or > silently skip the problem. Thus, if you change the error handler from > *MPI_ERRORS_ARE_FATAL* to something else, for example, > *MPI_ERRORS_RETURN* , then no error may be indicated.” > > > > Questions: > > 1. Any ideas for what could be the cause for the return code 15? The > code is pretty simple and the buffers have fixed size = 2. > 2. In view of item (8), does it mean that the return code 15 in item > (7) might not be informative? > 3. Once I get a return code != MPI_SUCCESS, is there any routine I can > call, in the application code, to get extra information on MPI? > 4. Once the application aborts (I throw an exception once a return > code is != MPI_SUCESS), is there some command line I can run on all nodes > in order to get extra info? > > > > Thank you in advance, > > > > Ernesto. > > > > Schlumberger-Private > > > > Schlumberger-Private > >