Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-03-09 Thread Szilárd Páll
Hi Andreas,

Sorry for the delay.

I can confirm the regression. This affects the energy calculation steps
where the GPU bonded computational did get significantly slower (as a
side-effect of optimizations that mainly targeted the force-only kernels).

Can you please file an issue on redmine.gromacs.org and upload the data you
shared with me?

As a workaround you should consider using an nstcalcenergy>1; bumping it to
just ~10 would eliminate most of the regression and would improve the
performance of other computation too (the nonbonded F-only kernels are also
at least 1.5x faster than the force+energy kernels).
Alternatively, I recall you had decent CPU, so you could run the bonded
interactions on the CPU

Side-note: you are using an overly fine PME grid that you did not scale
along the (overly accurate) the rather long cut-offs (see
http://manual.gromacs.org/documentation/current/user-guide/mdp-options.html#mdp-fourierspacing
).

Cheers,
--
Szilárd


On Fri, Feb 28, 2020 at 11:10 AM Andreas Baer  wrote:

> Hi,
>
> sorry for it!
>
> https://faubox.rrze.uni-erlangen.de/getlink/fiUpELsXokQr3a7vyeDSKdY3/benchmarks_2019-2020_all
>
> Cheers,
> Andreas
>
> On 27.02.20 17:59, Szilárd Páll wrote:
>
> On Thu, Feb 27, 2020 at 1:08 PM Andreas Baer  wrote:
>
>> Hi,
>>
>> On 27.02.20 12:34, Szilárd Páll wrote:
>> > Hi
>> >
>> > On Thu, Feb 27, 2020 at 11:31 AM Andreas Baer 
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> with the link below, additional log files for runs with 1 GPU should be
>> >> accessible now.
>> >>
>> > I meant to ask you to run single-rank GPU runs, i.e. gmx mdrun -ntmpi 1.
>> >
>> > It would also help if you could share some input files in case if
>> further
>> > testing is needed.
>> Ok, there is now also an additional benchmark with `-ntmpi 1 -ntomp 4
>> -bonded gpu -update gpu` as parameters. However, it is run on the same
>> machine with smt disabled.
>> With the following link, I provide all the tests on this machine, I did
>> by now, along with a summary of the performance for the several input
>> parameters (both in `logfiles`), as well as input files (`C60xh.7z`) and
>> the scripts to run these.
>>
>
> Links seems to be missing.
> --
> Szilárd
>
>
>> I hope, this helps. If there is anything else, I can do to help, please
>> let me know!
>> >
>> >
>> >> Thank you for the comment with the rlist, I did not know, that this
>> will
>> >> affect the performance negatively.
>> >
>> > It does in multiple ways. First, you are using a rather long list buffer
>> > which will make the nonbonded pair-interaction calculation more
>> > computational expensive than it could be if you just used a tolerance
>> and
>> > let the buffer be calculated. Secondly, as setting a manual rlist
>> disables
>> > the automated verlet buffer calculation, it prevents mdrun from using a
>> > dual pairl-list setup (see
>> >
>> http://manual.gromacs.org/documentation/2018.1/release-notes/2018/major/features.html#dual-pair-list-buffer-with-dynamic-pruning
>> )
>> > which has additional performance benefits.
>> Ok, thank you for the explanation!
>> >
>> > Cheers,
>> > --
>> > Szilárd
>> Cheers,
>> Andreas
>> >
>> >
>> >
>> >> I know, about the nstcalcenergy, but
>> >> I need it for several of my simulations.
>> > Cheers,
>> >> Andreas
>> >>
>> >> On 26.02.20 16:50, Szilárd Páll wrote:
>> >>> Hi,
>> >>>
>> >>> Can you please check the performance when running on a single GPU
>> 2019 vs
>> >>> 2020 with your inputs?
>> >>>
>> >>> Also note that you are using some peculiar settings that will have an
>> >>> adverse effect on performance (like manually set rlist disallowing the
>> >> dual
>> >>> pair-list setup, and nstcalcenergy=1).
>> >>>
>> >>> Cheers,
>> >>>
>> >>> --
>> >>> Szilárd
>> >>>
>> >>>
>> >>> On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer 
>> >> wrote:
>>  Hello,
>> 
>>  here is a link to the logfiles.
>> 
>> 
>> >>
>> https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020
>>  If necessary, I can also provide some more log or tpr/gro/... files.
>> 
>>  Cheers,
>>  Andreas
>> 
>> 
>>  On 26.02.20 16:09, Paul bauer wrote:
>> > Hello,
>> >
>> > you can't add attachments to the list, please upload the files
>> > somewhere to share them.
>> > This might be quite important to us, because the performance
>> > regression is not expected by us.
>> >
>> > Cheers
>> >
>> > Paul
>> >
>> > On 26/02/2020 15:54, Andreas Baer wrote:
>> >> Hello,
>> >>
>> >> from a set of benchmark tests with large systems using Gromacs
>> >> versions 2019.5 and 2020, I obtained some unexpected results:
>> >> With the same set of parameters and the 2020 version, I obtain a
>> >> performance that is about 2/3 of the 2019.5 version. Interestingly,
>> >> according to nvidia-smi, the GPU usage is about 20% higher for the
>> >> 2020 version.
>> >> Also from the log files it seems, that the 2020 

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-28 Thread Andreas Baer

Hi,

sorry for it!
https://faubox.rrze.uni-erlangen.de/getlink/fiUpELsXokQr3a7vyeDSKdY3/benchmarks_2019-2020_all

Cheers,
Andreas

On 27.02.20 17:59, Szilárd Páll wrote:
On Thu, Feb 27, 2020 at 1:08 PM Andreas Baer > wrote:


Hi,

On 27.02.20 12:34, Szilárd Páll wrote:
> Hi
>
> On Thu, Feb 27, 2020 at 11:31 AM Andreas Baer
mailto:andreas.b...@fau.de>> wrote:
>
>> Hi,
>>
>> with the link below, additional log files for runs with 1 GPU
should be
>> accessible now.
>>
> I meant to ask you to run single-rank GPU runs, i.e. gmx mdrun
-ntmpi 1.
>
> It would also help if you could share some input files in case
if further
> testing is needed.
Ok, there is now also an additional benchmark with `-ntmpi 1 -ntomp 4
-bonded gpu -update gpu` as parameters. However, it is run on the
same
machine with smt disabled.
With the following link, I provide all the tests on this machine,
I did
by now, along with a summary of the performance for the several input
parameters (both in `logfiles`), as well as input files
(`C60xh.7z`) and
the scripts to run these.


Links seems to be missing.
--
Szilárd

I hope, this helps. If there is anything else, I can do to help,
please
let me know!
>
>
>> Thank you for the comment with the rlist, I did not know, that
this will
>> affect the performance negatively.
>
> It does in multiple ways. First, you are using a rather long
list buffer
> which will make the nonbonded pair-interaction calculation more
> computational expensive than it could be if you just used a
tolerance and
> let the buffer be calculated. Secondly, as setting a manual
rlist disables
> the automated verlet buffer calculation, it prevents mdrun from
using a
> dual pairl-list setup (see
>

http://manual.gromacs.org/documentation/2018.1/release-notes/2018/major/features.html#dual-pair-list-buffer-with-dynamic-pruning)
> which has additional performance benefits.
Ok, thank you for the explanation!
>
> Cheers,
> --
> Szilárd
Cheers,
Andreas
>
>
>
>> I know, about the nstcalcenergy, but
>> I need it for several of my simulations.
> Cheers,
>> Andreas
>>
>> On 26.02.20 16:50, Szilárd Páll wrote:
>>> Hi,
>>>
>>> Can you please check the performance when running on a single
GPU 2019 vs
>>> 2020 with your inputs?
>>>
>>> Also note that you are using some peculiar settings that will
have an
>>> adverse effect on performance (like manually set rlist
disallowing the
>> dual
>>> pair-list setup, and nstcalcenergy=1).
>>>
>>> Cheers,
>>>
>>> --
>>> Szilárd
>>>
>>>
>>> On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer
mailto:andreas.b...@fau.de>>
>> wrote:
 Hello,

 here is a link to the logfiles.


>>

https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020
 If necessary, I can also provide some more log or tpr/gro/...
files.

 Cheers,
 Andreas


 On 26.02.20 16:09, Paul bauer wrote:
> Hello,
>
> you can't add attachments to the list, please upload the files
> somewhere to share them.
> This might be quite important to us, because the performance
> regression is not expected by us.
>
> Cheers
>
> Paul
>
> On 26/02/2020 15:54, Andreas Baer wrote:
>> Hello,
>>
>> from a set of benchmark tests with large systems using Gromacs
>> versions 2019.5 and 2020, I obtained some unexpected results:
>> With the same set of parameters and the 2020 version, I
obtain a
>> performance that is about 2/3 of the 2019.5 version.
Interestingly,
>> according to nvidia-smi, the GPU usage is about 20% higher
for the
>> 2020 version.
>> Also from the log files it seems, that the 2020 version
does the
>> computations more efficiently, but spends so much more time
waiting,
>> that the overall performance drops.
>>
>> Some background info on the benchmarks:
>> - System contains about 2.1 million atoms.
>> - Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz =
16 cores +
>> SMT; 4x NVIDIA Tesla V100
>>     (similar results with less significant performance drop
(~15%) on a
>> different machine: 2 or 4 nodes with each [2x Intel Xeon
2660v2 („Ivy
>> Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
>> - Several options for -ntmpi, -ntomp, -bonded, -pme are
used to find
>> the optimal set. However the performance drop seems to be
persistent

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-27 Thread Szilárd Páll
On Thu, Feb 27, 2020 at 1:08 PM Andreas Baer  wrote:

> Hi,
>
> On 27.02.20 12:34, Szilárd Páll wrote:
> > Hi
> >
> > On Thu, Feb 27, 2020 at 11:31 AM Andreas Baer 
> wrote:
> >
> >> Hi,
> >>
> >> with the link below, additional log files for runs with 1 GPU should be
> >> accessible now.
> >>
> > I meant to ask you to run single-rank GPU runs, i.e. gmx mdrun -ntmpi 1.
> >
> > It would also help if you could share some input files in case if further
> > testing is needed.
> Ok, there is now also an additional benchmark with `-ntmpi 1 -ntomp 4
> -bonded gpu -update gpu` as parameters. However, it is run on the same
> machine with smt disabled.
> With the following link, I provide all the tests on this machine, I did
> by now, along with a summary of the performance for the several input
> parameters (both in `logfiles`), as well as input files (`C60xh.7z`) and
> the scripts to run these.
>

Links seems to be missing.
--
Szilárd


> I hope, this helps. If there is anything else, I can do to help, please
> let me know!
> >
> >
> >> Thank you for the comment with the rlist, I did not know, that this will
> >> affect the performance negatively.
> >
> > It does in multiple ways. First, you are using a rather long list buffer
> > which will make the nonbonded pair-interaction calculation more
> > computational expensive than it could be if you just used a tolerance and
> > let the buffer be calculated. Secondly, as setting a manual rlist
> disables
> > the automated verlet buffer calculation, it prevents mdrun from using a
> > dual pairl-list setup (see
> >
> http://manual.gromacs.org/documentation/2018.1/release-notes/2018/major/features.html#dual-pair-list-buffer-with-dynamic-pruning
> )
> > which has additional performance benefits.
> Ok, thank you for the explanation!
> >
> > Cheers,
> > --
> > Szilárd
> Cheers,
> Andreas
> >
> >
> >
> >> I know, about the nstcalcenergy, but
> >> I need it for several of my simulations.
> > Cheers,
> >> Andreas
> >>
> >> On 26.02.20 16:50, Szilárd Páll wrote:
> >>> Hi,
> >>>
> >>> Can you please check the performance when running on a single GPU 2019
> vs
> >>> 2020 with your inputs?
> >>>
> >>> Also note that you are using some peculiar settings that will have an
> >>> adverse effect on performance (like manually set rlist disallowing the
> >> dual
> >>> pair-list setup, and nstcalcenergy=1).
> >>>
> >>> Cheers,
> >>>
> >>> --
> >>> Szilárd
> >>>
> >>>
> >>> On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer 
> >> wrote:
>  Hello,
> 
>  here is a link to the logfiles.
> 
> 
> >>
> https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020
>  If necessary, I can also provide some more log or tpr/gro/... files.
> 
>  Cheers,
>  Andreas
> 
> 
>  On 26.02.20 16:09, Paul bauer wrote:
> > Hello,
> >
> > you can't add attachments to the list, please upload the files
> > somewhere to share them.
> > This might be quite important to us, because the performance
> > regression is not expected by us.
> >
> > Cheers
> >
> > Paul
> >
> > On 26/02/2020 15:54, Andreas Baer wrote:
> >> Hello,
> >>
> >> from a set of benchmark tests with large systems using Gromacs
> >> versions 2019.5 and 2020, I obtained some unexpected results:
> >> With the same set of parameters and the 2020 version, I obtain a
> >> performance that is about 2/3 of the 2019.5 version. Interestingly,
> >> according to nvidia-smi, the GPU usage is about 20% higher for the
> >> 2020 version.
> >> Also from the log files it seems, that the 2020 version does the
> >> computations more efficiently, but spends so much more time waiting,
> >> that the overall performance drops.
> >>
> >> Some background info on the benchmarks:
> >> - System contains about 2.1 million atoms.
> >> - Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores
> +
> >> SMT; 4x NVIDIA Tesla V100
> >> (similar results with less significant performance drop (~15%)
> on a
> >> different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2
> („Ivy
> >> Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
> >> - Several options for -ntmpi, -ntomp, -bonded, -pme are used to find
> >> the optimal set. However the performance drop seems to be persistent
> >> for all such options.
> >>
> >> Two representative log files are attached.
> >> Does anyone have an idea, where this drop comes from, and how to
> >> choose the parameters for the 2020 version to circumvent this?
> >>
> >> Regards,
> >> Andreas
> >>
>  --
>  Gromacs Users mailing list
> 
>  * Please search the archive at
>  http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>  posting!
> 
>  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
>  * For (un)subscribe requests visit

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-27 Thread Andreas Baer

Hi,

On 27.02.20 12:34, Szilárd Páll wrote:

Hi

On Thu, Feb 27, 2020 at 11:31 AM Andreas Baer  wrote:


Hi,

with the link below, additional log files for runs with 1 GPU should be
accessible now.


I meant to ask you to run single-rank GPU runs, i.e. gmx mdrun -ntmpi 1.

It would also help if you could share some input files in case if further
testing is needed.
Ok, there is now also an additional benchmark with `-ntmpi 1 -ntomp 4 
-bonded gpu -update gpu` as parameters. However, it is run on the same 
machine with smt disabled.
With the following link, I provide all the tests on this machine, I did 
by now, along with a summary of the performance for the several input 
parameters (both in `logfiles`), as well as input files (`C60xh.7z`) and 
the scripts to run these.
I hope, this helps. If there is anything else, I can do to help, please 
let me know!




Thank you for the comment with the rlist, I did not know, that this will
affect the performance negatively.


It does in multiple ways. First, you are using a rather long list buffer
which will make the nonbonded pair-interaction calculation more
computational expensive than it could be if you just used a tolerance and
let the buffer be calculated. Secondly, as setting a manual rlist disables
the automated verlet buffer calculation, it prevents mdrun from using a
dual pairl-list setup (see
http://manual.gromacs.org/documentation/2018.1/release-notes/2018/major/features.html#dual-pair-list-buffer-with-dynamic-pruning)
which has additional performance benefits.

Ok, thank you for the explanation!


Cheers,
--
Szilárd

Cheers,
Andreas





I know, about the nstcalcenergy, but
I need it for several of my simulations.

Cheers,

Andreas

On 26.02.20 16:50, Szilárd Páll wrote:

Hi,

Can you please check the performance when running on a single GPU 2019 vs
2020 with your inputs?

Also note that you are using some peculiar settings that will have an
adverse effect on performance (like manually set rlist disallowing the

dual

pair-list setup, and nstcalcenergy=1).

Cheers,

--
Szilárd


On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer 

wrote:

Hello,

here is a link to the logfiles.



https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020

If necessary, I can also provide some more log or tpr/gro/... files.

Cheers,
Andreas


On 26.02.20 16:09, Paul bauer wrote:

Hello,

you can't add attachments to the list, please upload the files
somewhere to share them.
This might be quite important to us, because the performance
regression is not expected by us.

Cheers

Paul

On 26/02/2020 15:54, Andreas Baer wrote:

Hello,

from a set of benchmark tests with large systems using Gromacs
versions 2019.5 and 2020, I obtained some unexpected results:
With the same set of parameters and the 2020 version, I obtain a
performance that is about 2/3 of the 2019.5 version. Interestingly,
according to nvidia-smi, the GPU usage is about 20% higher for the
2020 version.
Also from the log files it seems, that the 2020 version does the
computations more efficiently, but spends so much more time waiting,
that the overall performance drops.

Some background info on the benchmarks:
- System contains about 2.1 million atoms.
- Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores +
SMT; 4x NVIDIA Tesla V100
(similar results with less significant performance drop (~15%) on a
different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy
Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
- Several options for -ntmpi, -ntomp, -bonded, -pme are used to find
the optimal set. However the performance drop seems to be persistent
for all such options.

Two representative log files are attached.
Does anyone have an idea, where this drop comes from, and how to
choose the parameters for the 2020 version to circumvent this?

Regards,
Andreas


--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.

--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.


--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-27 Thread Szilárd Páll
Hi

On Thu, Feb 27, 2020 at 11:31 AM Andreas Baer  wrote:

> Hi,
>
> with the link below, additional log files for runs with 1 GPU should be
> accessible now.
>

I meant to ask you to run single-rank GPU runs, i.e. gmx mdrun -ntmpi 1.

It would also help if you could share some input files in case if further
testing is needed.


> Thank you for the comment with the rlist, I did not know, that this will
> affect the performance negatively.


It does in multiple ways. First, you are using a rather long list buffer
which will make the nonbonded pair-interaction calculation more
computational expensive than it could be if you just used a tolerance and
let the buffer be calculated. Secondly, as setting a manual rlist disables
the automated verlet buffer calculation, it prevents mdrun from using a
dual pairl-list setup (see
http://manual.gromacs.org/documentation/2018.1/release-notes/2018/major/features.html#dual-pair-list-buffer-with-dynamic-pruning)
which has additional performance benefits.

Cheers,
--
Szilárd



> I know, about the nstcalcenergy, but
> I need it for several of my simulations.

Cheers,
> Andreas
>
> On 26.02.20 16:50, Szilárd Páll wrote:
> > Hi,
> >
> > Can you please check the performance when running on a single GPU 2019 vs
> > 2020 with your inputs?
> >
> > Also note that you are using some peculiar settings that will have an
> > adverse effect on performance (like manually set rlist disallowing the
> dual
> > pair-list setup, and nstcalcenergy=1).
> >
> > Cheers,
> >
> > --
> > Szilárd
> >
> >
> > On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer 
> wrote:
> >
> >> Hello,
> >>
> >> here is a link to the logfiles.
> >>
> >>
> https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020
> >>
> >> If necessary, I can also provide some more log or tpr/gro/... files.
> >>
> >> Cheers,
> >> Andreas
> >>
> >>
> >> On 26.02.20 16:09, Paul bauer wrote:
> >>> Hello,
> >>>
> >>> you can't add attachments to the list, please upload the files
> >>> somewhere to share them.
> >>> This might be quite important to us, because the performance
> >>> regression is not expected by us.
> >>>
> >>> Cheers
> >>>
> >>> Paul
> >>>
> >>> On 26/02/2020 15:54, Andreas Baer wrote:
>  Hello,
> 
>  from a set of benchmark tests with large systems using Gromacs
>  versions 2019.5 and 2020, I obtained some unexpected results:
>  With the same set of parameters and the 2020 version, I obtain a
>  performance that is about 2/3 of the 2019.5 version. Interestingly,
>  according to nvidia-smi, the GPU usage is about 20% higher for the
>  2020 version.
>  Also from the log files it seems, that the 2020 version does the
>  computations more efficiently, but spends so much more time waiting,
>  that the overall performance drops.
> 
>  Some background info on the benchmarks:
>  - System contains about 2.1 million atoms.
>  - Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores +
>  SMT; 4x NVIDIA Tesla V100
> (similar results with less significant performance drop (~15%) on a
>  different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy
>  Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
>  - Several options for -ntmpi, -ntomp, -bonded, -pme are used to find
>  the optimal set. However the performance drop seems to be persistent
>  for all such options.
> 
>  Two representative log files are attached.
>  Does anyone have an idea, where this drop comes from, and how to
>  choose the parameters for the 2020 version to circumvent this?
> 
>  Regards,
>  Andreas
> 
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >> send a mail to gmx-users-requ...@gromacs.org.
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-27 Thread Andreas Baer

Hi,

with the link below, additional log files for runs with 1 GPU should be 
accessible now.


Thank you for the comment with the rlist, I did not know, that this will 
affect the performance negatively. I know, about the nstcalcenergy, but 
I need it for several of my simulations.


Cheers,
Andreas

On 26.02.20 16:50, Szilárd Páll wrote:

Hi,

Can you please check the performance when running on a single GPU 2019 vs
2020 with your inputs?

Also note that you are using some peculiar settings that will have an
adverse effect on performance (like manually set rlist disallowing the dual
pair-list setup, and nstcalcenergy=1).

Cheers,

--
Szilárd


On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer  wrote:


Hello,

here is a link to the logfiles.

https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020

If necessary, I can also provide some more log or tpr/gro/... files.

Cheers,
Andreas


On 26.02.20 16:09, Paul bauer wrote:

Hello,

you can't add attachments to the list, please upload the files
somewhere to share them.
This might be quite important to us, because the performance
regression is not expected by us.

Cheers

Paul

On 26/02/2020 15:54, Andreas Baer wrote:

Hello,

from a set of benchmark tests with large systems using Gromacs
versions 2019.5 and 2020, I obtained some unexpected results:
With the same set of parameters and the 2020 version, I obtain a
performance that is about 2/3 of the 2019.5 version. Interestingly,
according to nvidia-smi, the GPU usage is about 20% higher for the
2020 version.
Also from the log files it seems, that the 2020 version does the
computations more efficiently, but spends so much more time waiting,
that the overall performance drops.

Some background info on the benchmarks:
- System contains about 2.1 million atoms.
- Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores +
SMT; 4x NVIDIA Tesla V100
   (similar results with less significant performance drop (~15%) on a
different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy
Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
- Several options for -ntmpi, -ntomp, -bonded, -pme are used to find
the optimal set. However the performance drop seems to be persistent
for all such options.

Two representative log files are attached.
Does anyone have an idea, where this drop comes from, and how to
choose the parameters for the 2020 version to circumvent this?

Regards,
Andreas


--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.


--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-26 Thread Szilárd Páll
Hi,

Can you please check the performance when running on a single GPU 2019 vs
2020 with your inputs?

Also note that you are using some peculiar settings that will have an
adverse effect on performance (like manually set rlist disallowing the dual
pair-list setup, and nstcalcenergy=1).

Cheers,

--
Szilárd


On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer  wrote:

> Hello,
>
> here is a link to the logfiles.
>
> https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020
>
> If necessary, I can also provide some more log or tpr/gro/... files.
>
> Cheers,
> Andreas
>
>
> On 26.02.20 16:09, Paul bauer wrote:
> > Hello,
> >
> > you can't add attachments to the list, please upload the files
> > somewhere to share them.
> > This might be quite important to us, because the performance
> > regression is not expected by us.
> >
> > Cheers
> >
> > Paul
> >
> > On 26/02/2020 15:54, Andreas Baer wrote:
> >> Hello,
> >>
> >> from a set of benchmark tests with large systems using Gromacs
> >> versions 2019.5 and 2020, I obtained some unexpected results:
> >> With the same set of parameters and the 2020 version, I obtain a
> >> performance that is about 2/3 of the 2019.5 version. Interestingly,
> >> according to nvidia-smi, the GPU usage is about 20% higher for the
> >> 2020 version.
> >> Also from the log files it seems, that the 2020 version does the
> >> computations more efficiently, but spends so much more time waiting,
> >> that the overall performance drops.
> >>
> >> Some background info on the benchmarks:
> >> - System contains about 2.1 million atoms.
> >> - Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores +
> >> SMT; 4x NVIDIA Tesla V100
> >>   (similar results with less significant performance drop (~15%) on a
> >> different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy
> >> Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
> >> - Several options for -ntmpi, -ntomp, -bonded, -pme are used to find
> >> the optimal set. However the performance drop seems to be persistent
> >> for all such options.
> >>
> >> Two representative log files are attached.
> >> Does anyone have an idea, where this drop comes from, and how to
> >> choose the parameters for the 2020 version to circumvent this?
> >>
> >> Regards,
> >> Andreas
> >>
> >
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-26 Thread Andreas Baer

Hello,

here is a link to the logfiles.
https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020

If necessary, I can also provide some more log or tpr/gro/... files.

Cheers,
Andreas


On 26.02.20 16:09, Paul bauer wrote:

Hello,

you can't add attachments to the list, please upload the files 
somewhere to share them.
This might be quite important to us, because the performance 
regression is not expected by us.


Cheers

Paul

On 26/02/2020 15:54, Andreas Baer wrote:

Hello,

from a set of benchmark tests with large systems using Gromacs 
versions 2019.5 and 2020, I obtained some unexpected results:
With the same set of parameters and the 2020 version, I obtain a 
performance that is about 2/3 of the 2019.5 version. Interestingly, 
according to nvidia-smi, the GPU usage is about 20% higher for the 
2020 version.
Also from the log files it seems, that the 2020 version does the 
computations more efficiently, but spends so much more time waiting, 
that the overall performance drops.


Some background info on the benchmarks:
- System contains about 2.1 million atoms.
- Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores + 
SMT; 4x NVIDIA Tesla V100
  (similar results with less significant performance drop (~15%) on a 
different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy 
Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
- Several options for -ntmpi, -ntomp, -bonded, -pme are used to find 
the optimal set. However the performance drop seems to be persistent 
for all such options.


Two representative log files are attached.
Does anyone have an idea, where this drop comes from, and how to 
choose the parameters for the 2020 version to circumvent this?


Regards,
Andreas





--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-26 Thread Paul bauer

Hello,

you can't add attachments to the list, please upload the files somewhere 
to share them.
This might be quite important to us, because the performance regression 
is not expected by us.


Cheers

Paul

On 26/02/2020 15:54, Andreas Baer wrote:

Hello,

from a set of benchmark tests with large systems using Gromacs 
versions 2019.5 and 2020, I obtained some unexpected results:
With the same set of parameters and the 2020 version, I obtain a 
performance that is about 2/3 of the 2019.5 version. Interestingly, 
according to nvidia-smi, the GPU usage is about 20% higher for the 
2020 version.
Also from the log files it seems, that the 2020 version does the 
computations more efficiently, but spends so much more time waiting, 
that the overall performance drops.


Some background info on the benchmarks:
- System contains about 2.1 million atoms.
- Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores + 
SMT; 4x NVIDIA Tesla V100
  (similar results with less significant performance drop (~15%) on a 
different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy 
Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
- Several options for -ntmpi, -ntomp, -bonded, -pme are used to find 
the optimal set. However the performance drop seems to be persistent 
for all such options.


Two representative log files are attached.
Does anyone have an idea, where this drop comes from, and how to 
choose the parameters for the 2020 version to circumvent this?


Regards,
Andreas



--
Paul Bauer, PhD
GROMACS Development Manager
KTH Stockholm, SciLifeLab
0046737308594

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

2020-02-26 Thread Andreas Baer

Hello,

from a set of benchmark tests with large systems using Gromacs versions 
2019.5 and 2020, I obtained some unexpected results:
With the same set of parameters and the 2020 version, I obtain a 
performance that is about 2/3 of the 2019.5 version. Interestingly, 
according to nvidia-smi, the GPU usage is about 20% higher for the 2020 
version.
Also from the log files it seems, that the 2020 version does the 
computations more efficiently, but spends so much more time waiting, 
that the overall performance drops.


Some background info on the benchmarks:
- System contains about 2.1 million atoms.
- Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores + 
SMT; 4x NVIDIA Tesla V100
  (similar results with less significant performance drop (~15%) on a 
different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy 
Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
- Several options for -ntmpi, -ntomp, -bonded, -pme are used to find the 
optimal set. However the performance drop seems to be persistent for all 
such options.


Two representative log files are attached.
Does anyone have an idea, where this drop comes from, and how to choose 
the parameters for the 2020 version to circumvent this?


Regards,
Andreas
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.