Hi,

I meet a performance problem with OpenMPI on my cluster. In some situation my parallel code is really slow (same binary running on a different mesh).

To investigate, the fortran code code is built with profiling option (mpifort -p -O3.....) and launched on 91 cores.

One mon.out file per process, they show a maximum cpu time of 20.4 seconds for each processes (32.7 seconds on my old cluster) and this is Ok.

But running on my new cluster requires near 3mn instead of 1mn on the old cluster (elapsed time).

New cluster is running OpenMPI 4.05 with HDR-100 connections.

Old cluster is running OpenMPI 3.1 with QDR connections.

Running Osu Collectives tests on 91 cores shows good latency values on 91 cores and the point-to-points between nodes is correct.

How can I investigate this problem as it seams related to MPI communications in some situations that I can reproduce? Using Scalasca ? Other tools ? OpenMPI is not built with special profiling options.

Thanks

Patrick


Reply via email to