Re: [OMPI users] Need help for troubleshooting OpenMPI performances

2022-04-07 Thread Patrick Begou via users

Thanks Gilles for these details.

I 'll check this as soon as possible (as we have a workaround and the 
cluster is heavily loaded) and post on openMPI Forum.


But Yes, the threads are using MPI calls. In this code OpenMP threads 
work as a classical MPI process and communications calls are wrapped, 
using memory copy if threads are on the same node or openMPI if they are 
on different nodes. It's just some optimization for using several 
thousands of processes, not a classical use of openMP.


I'm using UCX too and I must check if I have built it with multi 
threading support (many months ago).


Patrick

Le 24/03/2022 à 10:24, Gilles Gouaillardet via users a écrit :

Patrick,

In the worst case scenario, requiring MPI_THREAD_MULTIPLE support can 
disable some fast interconnect
and make your app fallback on  IPoIB or similar. And in that case, 
Open MPI might prefer a suboptimal

IP network which can impact the overall performances even more.

Which threading support does your app ask?
Many applications do not call MPI in the OpenMP regions at all, or 
only the master thread invokes MPI,

and in this case, MPI_THREAD_FUNNELED is enough.

Are you using UCX or the legacy openib btl?
If the former, is it built with multi threading support?
If the latter, I suggest you give UCX - built with multi threading 
support - a try and see how it goes



Cheers,

Gilles

On Thu, Mar 24, 2022 at 5:43 PM Patrick Begou via users 
 wrote:


Le 28/02/2022 à 17:56, Patrick Begou via users a écrit :
> Hi,
>
> I meet a performance problem with OpenMPI on my cluster. In some
> situation my parallel code is really slow (same binary running on a
> different mesh).
>
> To investigate, the fortran code code is built with profiling
option
> (mpifort -p -O3.) and launched on 91 cores.
>
> One mon.out file per process, they show a maximum cpu time of 20.4
> seconds for each processes (32.7 seconds on my old cluster) and
this
> is Ok.
>
> But running on my new cluster requires near 3mn instead of 1mn
on the
> old cluster (elapsed time).
>
> New cluster is running OpenMPI 4.05 with HDR-100 connections.
>
> Old cluster is running OpenMPI 3.1 with QDR connections.
>
> Running Osu Collectives tests on 91 cores shows good latency
values on
> 91 cores and the point-to-points between nodes is correct.
>
> How can I investigate this problem as it seams related to MPI
> communications in some situations that I can reproduce? Using
Scalasca
> ? Other tools ? OpenMPI is not built with special profiling options.
>
> Thanks
>
> Patrick
>
>
Just to provide an answer to this old thread, the problem has been
found
(but not solved). The application was rebuilt with OpenMP flag
(hybrid
parallelism is implemented with MPI and OpenMP). Setting this
flag, even
if we only use one thread and MPI only parallelism, change OpenMPI
initialisation from MPI_INIT to MPI_INIT_THREAD in our code and this
create the big slowdown of the application.

We have temporally removed the OpenMP flag to build the application.

Patrick




Re: [OMPI users] Need help for troubleshooting OpenMPI performances

2022-03-24 Thread Gilles Gouaillardet via users
Patrick,

In the worst case scenario, requiring MPI_THREAD_MULTIPLE support can
disable some fast interconnect
and make your app fallback on  IPoIB or similar. And in that case, Open MPI
might prefer a suboptimal
IP network which can impact the overall performances even more.

Which threading support does your app ask?
Many applications do not call MPI in the OpenMP regions at all, or only the
master thread invokes MPI,
and in this case, MPI_THREAD_FUNNELED is enough.

Are you using UCX or the legacy openib btl?
If the former, is it built with multi threading support?
If the latter, I suggest you give UCX - built with multi threading support
- a try and see how it goes


Cheers,

Gilles

On Thu, Mar 24, 2022 at 5:43 PM Patrick Begou via users <
users@lists.open-mpi.org> wrote:

> Le 28/02/2022 à 17:56, Patrick Begou via users a écrit :
> > Hi,
> >
> > I meet a performance problem with OpenMPI on my cluster. In some
> > situation my parallel code is really slow (same binary running on a
> > different mesh).
> >
> > To investigate, the fortran code code is built with profiling option
> > (mpifort -p -O3.) and launched on 91 cores.
> >
> > One mon.out file per process, they show a maximum cpu time of 20.4
> > seconds for each processes (32.7 seconds on my old cluster) and this
> > is Ok.
> >
> > But running on my new cluster requires near 3mn instead of 1mn on the
> > old cluster (elapsed time).
> >
> > New cluster is running OpenMPI 4.05 with HDR-100 connections.
> >
> > Old cluster is running OpenMPI 3.1 with QDR connections.
> >
> > Running Osu Collectives tests on 91 cores shows good latency values on
> > 91 cores and the point-to-points between nodes is correct.
> >
> > How can I investigate this problem as it seams related to MPI
> > communications in some situations that I can reproduce? Using Scalasca
> > ? Other tools ? OpenMPI is not built with special profiling options.
> >
> > Thanks
> >
> > Patrick
> >
> >
> Just to provide an answer to this old thread, the problem has been found
> (but not solved). The application was rebuilt with OpenMP flag (hybrid
> parallelism is implemented with MPI and OpenMP). Setting this flag, even
> if we only use one thread and MPI only parallelism, change OpenMPI
> initialisation from MPI_INIT to MPI_INIT_THREAD in our code and this
> create the big slowdown of the application.
>
> We have temporally removed the OpenMP flag to build the application.
>
> Patrick
>
>
>


Re: [OMPI users] Need help for troubleshooting OpenMPI performances

2022-03-24 Thread Patrick Begou via users

Le 28/02/2022 à 17:56, Patrick Begou via users a écrit :

Hi,

I meet a performance problem with OpenMPI on my cluster. In some 
situation my parallel code is really slow (same binary running on a 
different mesh).


To investigate, the fortran code code is built with profiling option 
(mpifort -p -O3.) and launched on 91 cores.


One mon.out file per process, they show a maximum cpu time of 20.4 
seconds for each processes (32.7 seconds on my old cluster) and this 
is Ok.


But running on my new cluster requires near 3mn instead of 1mn on the 
old cluster (elapsed time).


New cluster is running OpenMPI 4.05 with HDR-100 connections.

Old cluster is running OpenMPI 3.1 with QDR connections.

Running Osu Collectives tests on 91 cores shows good latency values on 
91 cores and the point-to-points between nodes is correct.


How can I investigate this problem as it seams related to MPI 
communications in some situations that I can reproduce? Using Scalasca 
? Other tools ? OpenMPI is not built with special profiling options.


Thanks

Patrick


Just to provide an answer to this old thread, the problem has been found 
(but not solved). The application was rebuilt with OpenMP flag (hybrid 
parallelism is implemented with MPI and OpenMP). Setting this flag, even 
if we only use one thread and MPI only parallelism, change OpenMPI 
initialisation from MPI_INIT to MPI_INIT_THREAD in our code and this 
create the big slowdown of the application.


We have temporally removed the OpenMP flag to build the application.

Patrick




[OMPI users] Need help for troubleshooting OpenMPI performances

2022-02-28 Thread Patrick Begou via users

Hi,

I meet a performance problem with OpenMPI on my cluster. In some 
situation my parallel code is really slow (same binary running on a 
different mesh).


To investigate, the fortran code code is built with profiling option 
(mpifort -p -O3.) and launched on 91 cores.


One mon.out file per process, they show a maximum cpu time of 20.4 
seconds for each processes (32.7 seconds on my old cluster) and this is Ok.


But running on my new cluster requires near 3mn instead of 1mn on the 
old cluster (elapsed time).


New cluster is running OpenMPI 4.05 with HDR-100 connections.

Old cluster is running OpenMPI 3.1 with QDR connections.

Running Osu Collectives tests on 91 cores shows good latency values on 
91 cores and the point-to-points between nodes is correct.


How can I investigate this problem as it seams related to MPI 
communications in some situations that I can reproduce? Using Scalasca ? 
Other tools ? OpenMPI is not built with special profiling options.


Thanks

Patrick