Re: [gmx-users] Various questions related to Gromacs performance tuning

2020-03-28 Thread Benson Muite


On Sat, Mar 28, 2020, at 9:32 PM, Kutzner, Carsten wrote:
> 
> 
> > Am 26.03.2020 um 17:00 schrieb Tobias Klöffel :
> > 
> > Hi Carsten,
> > 
> > 
> > On 3/24/20 9:02 PM, Kutzner, Carsten wrote:
> >> Hi,
> >> 
> >>> Am 24.03.2020 um 16:28 schrieb Tobias Klöffel :
> >>> 
> >>> Dear all,
> >>> I am very new to Gromacs so maybe some of my problems are very easy to 
> >>> fix:)
> >>> Currently I am trying to compile and benchmark gromacs on AMD rome cpus, 
> >>> the benchmarks are taken from:
> >>> https://www.mpibpc.mpg.de/grubmueller/bench
> >>> 
> >>> 1) OpenMP parallelization: Is it done via OpenMP tasks?
> >> Yes, all over the code loops are parallelized via OpenMP via #pragma omp 
> >> parallel for
> >> and similar directives.
> > Ok but that's not OpenMP tasking:)
> >> 
> >>> If the Intel toolchain is detected and  -DGMX_FFT_LIBRARY=mkl is 
> >>> set,-mkl=serial is used, even though -DGMX_OPENMP=on is set.
> >> GROMACS uses only the serial transposes - allowing mkl to open up its own 
> >> OpenMP threads
> >> would lead to oversubscription of cores and performance degradation.
> > Ah I see. But then it should be noted somewhere in the docu that all 
> > FFTW/MKL calls are inside a parallel region. Is there a specific reason for 
> > this? Normally you can achieve much better performance if you call a 
> > threaded library outside of a parallel region and let the library use its 
> > own threads.

Creating and destroying threads can  sometimes be slow, which is what threaded 
libraries do upon entry and exit. Thus if a progam is already using threads, it 
can be faster to have multiple threads call threadsafe versions of the serial 
library if this is what the library does - likely the case for FFTW.

> >>> 2) I am trying to use gmx_mpi tune_pme but I never got it to run. I do 
> >>> not really understand what I have to specify for -mdrun. I
> >> Normally you need a serial (read: non-mpi enabled) 'gmx' so that you can 
> >> call
> >> gmx tune_pme. Most queueing systems don't like it if one parallel program 
> >> calls
> >> another parallel program.
> >> 
> >>> tried -mdrun 'gmx_mpi mdrun' and export MPIRUN="mpirun -use-hwthread-cpus 
> >>> -np $tmpi -map-by ppr:$tnode:node:pe=$OMP_NUM_THREADS --report-bindings" 
> >>> But it just complains that mdrun is not working.
> >> There should be an output somewhere with the exact command line that
> >> tune_pme invoked to test whether mdrun works. That should shed some light
> >> on the issue.
> >> 
> >> Side note: Tuning is normally only useful on CPU-nodes. If your nodes also
> >> have GPUs, you will probably not want to do this kind of PME tuning.
> > Yes it's CPU only... I will tune pp:ppme procs manually. However, 
> most of the times it is failing with 'too large prime number' what is 
> considered to be 'too large'?
> I think 2, 3, 5, 7, 11, and 13 and multiples of these are ok, but not 
> larger prime numbers.
> So for a fixed number of procs only some of the combinations PP:PME 
> will actually work.
> The ones that don't work would not be wise to choose from a performance 
> point of view.
> 
> Best,
>  Carsten
> 
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before 
> posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or 
> send a mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Various questions related to Gromacs performance tuning

2020-03-28 Thread Kutzner, Carsten


> Am 26.03.2020 um 17:00 schrieb Tobias Klöffel :
> 
> Hi Carsten,
> 
> 
> On 3/24/20 9:02 PM, Kutzner, Carsten wrote:
>> Hi,
>> 
>>> Am 24.03.2020 um 16:28 schrieb Tobias Klöffel :
>>> 
>>> Dear all,
>>> I am very new to Gromacs so maybe some of my problems are very easy to fix:)
>>> Currently I am trying to compile and benchmark gromacs on AMD rome cpus, 
>>> the benchmarks are taken from:
>>> https://www.mpibpc.mpg.de/grubmueller/bench
>>> 
>>> 1) OpenMP parallelization: Is it done via OpenMP tasks?
>> Yes, all over the code loops are parallelized via OpenMP via #pragma omp 
>> parallel for
>> and similar directives.
> Ok but that's not OpenMP tasking:)
>> 
>>> If the Intel toolchain is detected and  -DGMX_FFT_LIBRARY=mkl is 
>>> set,-mkl=serial is used, even though -DGMX_OPENMP=on is set.
>> GROMACS uses only the serial transposes - allowing mkl to open up its own 
>> OpenMP threads
>> would lead to oversubscription of cores and performance degradation.
> Ah I see. But then it should be noted somewhere in the docu that all FFTW/MKL 
> calls are inside a parallel region. Is there a specific reason for this? 
> Normally you can achieve much better performance if you call a threaded 
> library outside of a parallel region and let the library use its own threads.
>>> 2) I am trying to use gmx_mpi tune_pme but I never got it to run. I do not 
>>> really understand what I have to specify for -mdrun. I
>> Normally you need a serial (read: non-mpi enabled) 'gmx' so that you can call
>> gmx tune_pme. Most queueing systems don't like it if one parallel program 
>> calls
>> another parallel program.
>> 
>>> tried -mdrun 'gmx_mpi mdrun' and export MPIRUN="mpirun -use-hwthread-cpus 
>>> -np $tmpi -map-by ppr:$tnode:node:pe=$OMP_NUM_THREADS --report-bindings" 
>>> But it just complains that mdrun is not working.
>> There should be an output somewhere with the exact command line that
>> tune_pme invoked to test whether mdrun works. That should shed some light
>> on the issue.
>> 
>> Side note: Tuning is normally only useful on CPU-nodes. If your nodes also
>> have GPUs, you will probably not want to do this kind of PME tuning.
> Yes it's CPU only... I will tune pp:ppme procs manually. However, most of the 
> times it is failing with 'too large prime number' what is considered to be 
> 'too large'?
I think 2, 3, 5, 7, 11, and 13 and multiples of these are ok, but not larger 
prime numbers.
So for a fixed number of procs only some of the combinations PP:PME will 
actually work.
The ones that don't work would not be wise to choose from a performance point 
of view.

Best,
 Carsten

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Various questions related to Gromacs performance tuning

2020-03-26 Thread Tobias Klöffel

Hi Carsten,


On 3/24/20 9:02 PM, Kutzner, Carsten wrote:

Hi,


Am 24.03.2020 um 16:28 schrieb Tobias Klöffel :

Dear all,
I am very new to Gromacs so maybe some of my problems are very easy to fix:)
Currently I am trying to compile and benchmark gromacs on AMD rome cpus, the 
benchmarks are taken from:
https://www.mpibpc.mpg.de/grubmueller/bench

1) OpenMP parallelization: Is it done via OpenMP tasks?

Yes, all over the code loops are parallelized via OpenMP via #pragma omp 
parallel for
and similar directives.

Ok but that's not OpenMP tasking:)



If the Intel toolchain is detected and  -DGMX_FFT_LIBRARY=mkl is 
set,-mkl=serial is used, even though -DGMX_OPENMP=on is set.

GROMACS uses only the serial transposes - allowing mkl to open up its own 
OpenMP threads
would lead to oversubscription of cores and performance degradation.
Ah I see. But then it should be noted somewhere in the docu that all 
FFTW/MKL calls are inside a parallel region. Is there a specific reason 
for this? Normally you can achieve much better performance if you call a 
threaded library outside of a parallel region and let the library use 
its own threads.

2) I am trying to use gmx_mpi tune_pme but I never got it to run. I do not 
really understand what I have to specify for -mdrun. I

Normally you need a serial (read: non-mpi enabled) 'gmx' so that you can call
gmx tune_pme. Most queueing systems don't like it if one parallel program calls
another parallel program.


tried -mdrun 'gmx_mpi mdrun' and export MPIRUN="mpirun -use-hwthread-cpus -np $tmpi 
-map-by ppr:$tnode:node:pe=$OMP_NUM_THREADS --report-bindings" But it just complains 
that mdrun is not working.

There should be an output somewhere with the exact command line that
tune_pme invoked to test whether mdrun works. That should shed some light
on the issue.

Side note: Tuning is normally only useful on CPU-nodes. If your nodes also
have GPUs, you will probably not want to do this kind of PME tuning.
Yes it's CPU only... I will tune pp:ppme procs manually. However, most 
of the times it is failing with 'too large prime number' what is 
considered to be 'too large'?


Thanks,
Tobias

Normal  execution via $MPIRUN gmx_mpi mdrun -s ... works


3) As far as I understood, most time of PME is spent in a 3d FFT and hence 
probably most time is spent in a mpi alltoall communication.

Yes, but that also depends a lot on the number of nodes you are running on.
Check for yourself: Do a 'normal' mdrun (without tune_pme) on the number of
nodes that you are interested and check the detailed timings at the end of
the log file. There you will find how much time is spent in various PME
routines.

Best,
   Carsten


For that reason I would like to place all PME tasks on a separate node via 
-ddorder pp_pme. If I do so, the calculations just hangs. Specifying -ddorder 
interleave or cartesian works without problems. Is this a known issue?

Kind regards,
Tobias Klöffel

--
M.Sc. Tobias Klöffel
===
HPC (High Performance Computing) group
Erlangen Regional Computing Center(RRZE)
Friedrich-Alexander-Universität Erlangen-Nürnberg
Martensstr. 1
91058 Erlangen

Room: 1.133
Phone: +49 (0) 9131 / 85 - 20101

===

E-mail: tobias.kloef...@fau.de

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/grubmueller/kutzner
http://www.mpibpc.mpg.de/grubmueller/sppexa



--
M.Sc. Tobias Klöffel
===
HPC (High Performance Computing) group
Erlangen Regional Computing Center(RRZE)
Friedrich-Alexander-Universität Erlangen-Nürnberg
Martensstr. 1
91058 Erlangen

Room: 1.133
Phone: +49 (0) 9131 / 85 - 20101

===

E-mail: tobias.kloef...@fau.de

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Various questions related to Gromacs performance tuning

2020-03-24 Thread Mark Abraham
Hi,

There could certainly be a bug with -ddorder pp_pme, as there's no testing
done of that. If you can reproduce with a recent version of GROMACS, please
do file a bug report. (Though this week we're moving to new infrastructure,
so leave it for a day or two before trying to report it!

Mark

On Tue, 24 Mar 2020 at 21:03, Kutzner, Carsten  wrote:

> Hi,
>
> > Am 24.03.2020 um 16:28 schrieb Tobias Klöffel :
> >
> > Dear all,
> > I am very new to Gromacs so maybe some of my problems are very easy to
> fix:)
> > Currently I am trying to compile and benchmark gromacs on AMD rome cpus,
> the benchmarks are taken from:
> > https://www.mpibpc.mpg.de/grubmueller/bench
> >
> > 1) OpenMP parallelization: Is it done via OpenMP tasks?
> Yes, all over the code loops are parallelized via OpenMP via #pragma omp
> parallel for
> and similar directives.
>
> > If the Intel toolchain is detected and  -DGMX_FFT_LIBRARY=mkl is
> set,-mkl=serial is used, even though -DGMX_OPENMP=on is set.
> GROMACS uses only the serial transposes - allowing mkl to open up its own
> OpenMP threads
> would lead to oversubscription of cores and performance degradation.
>
> > 2) I am trying to use gmx_mpi tune_pme but I never got it to run. I do
> not really understand what I have to specify for -mdrun. I
> Normally you need a serial (read: non-mpi enabled) 'gmx' so that you can
> call
> gmx tune_pme. Most queueing systems don't like it if one parallel program
> calls
> another parallel program.
>
> > tried -mdrun 'gmx_mpi mdrun' and export MPIRUN="mpirun
> -use-hwthread-cpus -np $tmpi -map-by ppr:$tnode:node:pe=$OMP_NUM_THREADS
> --report-bindings" But it just complains that mdrun is not working.
> There should be an output somewhere with the exact command line that
> tune_pme invoked to test whether mdrun works. That should shed some light
> on the issue.
>
> Side note: Tuning is normally only useful on CPU-nodes. If your nodes also
> have GPUs, you will probably not want to do this kind of PME tuning.
>
> > Normal  execution via $MPIRUN gmx_mpi mdrun -s ... works
> >
> >
> > 3) As far as I understood, most time of PME is spent in a 3d FFT and
> hence probably most time is spent in a mpi alltoall communication.
> Yes, but that also depends a lot on the number of nodes you are running on.
> Check for yourself: Do a 'normal' mdrun (without tune_pme) on the number
> of
> nodes that you are interested and check the detailed timings at the end of
> the log file. There you will find how much time is spent in various PME
> routines.
>
> Best,
>   Carsten
>
> > For that reason I would like to place all PME tasks on a separate node
> via -ddorder pp_pme. If I do so, the calculations just hangs. Specifying
> -ddorder interleave or cartesian works without problems. Is this a known
> issue?
> >
> > Kind regards,
> > Tobias Klöffel
> >
> > --
> > M.Sc. Tobias Klöffel
> > ===
> > HPC (High Performance Computing) group
> > Erlangen Regional Computing Center(RRZE)
> > Friedrich-Alexander-Universität Erlangen-Nürnberg
> > Martensstr. 1
> > 91058 Erlangen
> >
> > Room: 1.133
> > Phone: +49 (0) 9131 / 85 - 20101
> >
> > ===
> >
> > E-mail: tobias.kloef...@fau.de
> >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
> --
> Dr. Carsten Kutzner
> Max Planck Institute for Biophysical Chemistry
> Theoretical and Computational Biophysics
> Am Fassberg 11, 37077 Goettingen, Germany
> Tel. +49-551-2012313, Fax: +49-551-2012302
> http://www.mpibpc.mpg.de/grubmueller/kutzner
> http://www.mpibpc.mpg.de/grubmueller/sppexa
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Various questions related to Gromacs performance tuning

2020-03-24 Thread Kutzner, Carsten
Hi,

> Am 24.03.2020 um 16:28 schrieb Tobias Klöffel :
> 
> Dear all,
> I am very new to Gromacs so maybe some of my problems are very easy to fix:)
> Currently I am trying to compile and benchmark gromacs on AMD rome cpus, the 
> benchmarks are taken from:
> https://www.mpibpc.mpg.de/grubmueller/bench
> 
> 1) OpenMP parallelization: Is it done via OpenMP tasks?
Yes, all over the code loops are parallelized via OpenMP via #pragma omp 
parallel for
and similar directives.

> If the Intel toolchain is detected and  -DGMX_FFT_LIBRARY=mkl is 
> set,-mkl=serial is used, even though -DGMX_OPENMP=on is set.
GROMACS uses only the serial transposes - allowing mkl to open up its own 
OpenMP threads
would lead to oversubscription of cores and performance degradation.

> 2) I am trying to use gmx_mpi tune_pme but I never got it to run. I do not 
> really understand what I have to specify for -mdrun. I 
Normally you need a serial (read: non-mpi enabled) 'gmx' so that you can call
gmx tune_pme. Most queueing systems don't like it if one parallel program calls
another parallel program.

> tried -mdrun 'gmx_mpi mdrun' and export MPIRUN="mpirun -use-hwthread-cpus -np 
> $tmpi -map-by ppr:$tnode:node:pe=$OMP_NUM_THREADS --report-bindings" But it 
> just complains that mdrun is not working.
There should be an output somewhere with the exact command line that
tune_pme invoked to test whether mdrun works. That should shed some light
on the issue.

Side note: Tuning is normally only useful on CPU-nodes. If your nodes also
have GPUs, you will probably not want to do this kind of PME tuning.

> Normal  execution via $MPIRUN gmx_mpi mdrun -s ... works
> 
> 
> 3) As far as I understood, most time of PME is spent in a 3d FFT and hence 
> probably most time is spent in a mpi alltoall communication.
Yes, but that also depends a lot on the number of nodes you are running on.
Check for yourself: Do a 'normal' mdrun (without tune_pme) on the number of 
nodes that you are interested and check the detailed timings at the end of
the log file. There you will find how much time is spent in various PME
routines.

Best,
  Carsten

> For that reason I would like to place all PME tasks on a separate node via 
> -ddorder pp_pme. If I do so, the calculations just hangs. Specifying -ddorder 
> interleave or cartesian works without problems. Is this a known issue?
> 
> Kind regards,
> Tobias Klöffel
> 
> -- 
> M.Sc. Tobias Klöffel
> ===
> HPC (High Performance Computing) group
> Erlangen Regional Computing Center(RRZE)
> Friedrich-Alexander-Universität Erlangen-Nürnberg
> Martensstr. 1
> 91058 Erlangen
> 
> Room: 1.133
> Phone: +49 (0) 9131 / 85 - 20101
> 
> ===
> 
> E-mail: tobias.kloef...@fau.de
> 
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.

--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/grubmueller/kutzner
http://www.mpibpc.mpg.de/grubmueller/sppexa

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.