Hi Pierre!
On 2021-03-13 3:17 a.m., Pierre Jolivet wrote:
Hello Eric,
I’ve made an “interesting” discovery, so I’ll put back the list in c/c.
It appears the following snippet of code which uses Allreduce() +
lambda function + MPI_IN_PLACE is:
- Valgrind-clean with MPICH;
- Valgrind-clean with OpenMPI 4.0.5;
- not Valgrind-clean with OpenMPI 4.1.0.
I’m not sure who is to blame here, I’ll need to look at the MPI
specification for what is required by the implementors and users in
that case.
In the meantime, I’ll do the following:
- update config/BuildSystem/config/packages/OpenMPI.py to use OpenMPI
4.1.0, see if any other error appears;
ok, I think it is a good idea since 4.1 is the "stable" version...
- provide a hotfix to bypass the segfaults;
If OpenMPI guys fix the issue, maybe it will be included in the 4.1.1
release (4.1.1rc1 is still open for modifications I think)...
An idea: I our code, we "block" compilations/usages of buggy MPIs
versions...
- look at the hypre issue and whether they should be deferred to the
hypre team.
Oh thanks for this! :)
Thank you for the Docker files, they were really useful.
If you want to avoid oversubscription failures, you can edit the file
/opt/openmpi-4.1.0/etc/openmpi-default-hostfile and append the line:
localhost slots=12
If you want to increase the timeout limit of PETSc test suite for each
test, you can add the extra flag in your command line TIMEOUT=180
(default is 60, units are seconds).
Noted, I will add this to my scripts and the dockerfiles too...
Thanks, I’ll ping you on GitLab when I’ve got something ready for you
to try,
Thank you for all your work!
Eric
Pierre
--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42