Re: [OMPI users] Need help for troubleshooting OpenMPI performances

Patrick Begou via users Thu, 24 Mar 2022 01:42:56 -0700

Le 28/02/2022 à 17:56, Patrick Begou via users a écrit :

Hi,
I meet a performance problem with OpenMPI on my cluster. In somesituation my parallel code is really slow (same binary running on adifferent mesh).
To investigate, the fortran code code is built with profiling option(mpifort -p -O3.....) and launched on 91 cores.
One mon.out file per process, they show a maximum cpu time of 20.4seconds for each processes (32.7 seconds on my old cluster) and thisis Ok.
But running on my new cluster requires near 3mn instead of 1mn on theold cluster (elapsed time).
New cluster is running OpenMPI 4.05 with HDR-100 connections.

Old cluster is running OpenMPI 3.1 with QDR connections.
Running Osu Collectives tests on 91 cores shows good latency values on91 cores and the point-to-points between nodes is correct.
How can I investigate this problem as it seams related to MPIcommunications in some situations that I can reproduce? Using Scalasca? Other tools ? OpenMPI is not built with special profiling options.
Thanks

Patrick

Just to provide an answer to this old thread, the problem has been found(but not solved). The application was rebuilt with OpenMP flag (hybridparallelism is implemented with MPI and OpenMP). Setting this flag, evenif we only use one thread and MPI only parallelism, change OpenMPIinitialisation from MPI_INIT to MPI_INIT_THREAD in our code and thiscreate the big slowdown of the application.


We have temporally removed the OpenMP flag to build the application.

Patrick

Re: [OMPI users] Need help for troubleshooting OpenMPI performances

Reply via email to