Re: [gmx-users] Gromacs 2019 - Ryzen Architecture

2020-01-09 Thread Szilárd Páll
Good catch Kevin, that is likely an issue -- at least part of it.

Note that you can also use the mdrun -multidir functionality to avoid
having to manually manage mdrun process placement and pinning.

Another aspect is that if you leave half of the CPU cores unused, the cores
in use can boost to a higher clock rate and therefore can complete the work
on the CPU quicker which, as part of this work does not overlap with the
GPU, will impact the fraction of the time the GPU will be idle (and hence
also the time the GPU will be busy). For a fair comparison, run something
on those otherwise idle cores (at least a "stress -c 8" or possibly a
CPU-only mdrun); generally this is how we evaluate performance as a
function of CPU cores per GPU).

Cheers,
--
Szilárd


On Sat, Jan 4, 2020 at 9:11 PM Kevin Boyd  wrote:

> Hi,
>
> A few things besides any Ryzen-specific issues. First, your pinoffset for
> the second one should be 16, not 17. The way yours is set up, you're
> running on cores 0-15, then Gromacs will detect that your second
> simulation parameters are invalid (because from cores 17-32, core 32 does
> not exist) and turn off core pinning. You can verify that in the log file.
>
> Second, 16 threads per simulation is overkill, and you can get gains from
> stealing from GPU down-time by running 2 simulations per GPU. So I would
> suggest something like
>
> mdrun -nt 8 -pin on -pinoffset 0 -gpu_id 0 &
> mdrun -nt 8 -pin on -pinoffset 8 -gpu_id 0 &
> mdrun -nt 8 -pin on -pinoffset 16 -gpu_id 1 &
> mdrun -nt 8 -pin on -pinoffset 24 -gpu_id 1
>
> might give you close to optimal performance.
>
> On Thu, Jan 2, 2020 at 5:32 AM Paul bauer  wrote:
>
> > Hello,
> >
> > we only added full detection and support for the newer Rizen chip-sets
> > with GROMACS 2019.5, so please try if the update to this version solves
> > your issue.
> > If not, please open an issue on redmine.gromacs.org so we can track the
> > problem and try to solve it.
> >
> > Cheers
> >
> > Paul
> >
> > On 02/01/2020 13:26, Sandro Wrzalek wrote:
> > > Hi,
> > >
> > > happy new year!
> > >
> > > Now to my problem:
> > >
> > > I use Gromacs 2019.3 and to try to run some simulations (roughly 30k
> > > atoms per system) on my PC which has the following configuration:
> > >
> > > CPU: Ryzen 3950X (overclocked to 4.1 GHz)
> > >
> > > GPU #1: Nvidia RTX 2080 Ti
> > >
> > > GPU #2: Nvidia RTX 2080 Ti
> > >
> > > RAM: 64 GB
> > >
> > > PSU: 1600 Watts
> > >
> > >
> > > Each run uses one GPU and 16 of 32 logical cores. Doing only one run
> > > at time (gmx mdrun -deffnm rna0 -gpu_id 0 -nb gpu -pme gpu) the GPU
> > > utilization is roughly around 84% but if I add a second run, the
> > > utilization of both GPUs drops to roughly 20%, while leaving logical
> > > cores 17-32 idle (I changed parameter gpu_id, accordingly).
> > >
> > > Adding additional parameters for each run:
> > >
> > > gmx mdrun -deffnm rna0 -nt 16 -pin on -pinoffset 0 -gpu_id 0 -nb gpu
> > > -pme gpu
> > >
> > > gmx mdrun -deffnm rna0 -nt 16 -pin on -pinoffset 17 -gpu_id 1 -nb gpu
> > > -pme gpu
> > >
> > > I get a utilization of 78% per GPU, which is nice but not near the 84%
> > > I got with only one run. In theory, however, it should come at least
> > > close to that utilization.
> > >
> > > I suspect, the Ryzen Chiplet design as the culprit since Gromacs seems
> > > to prefer the the first Chiplet, even if two simultaneous simulations
> > > have the resources to occupy both. The reason for the 78% utilization
> > > could be because of overhead between the two Chiplets via the infinity
> > > band. However, I have no proof, nor am I able to explain why gmx mdrun
> > > -deffnm rna0 -nt 16 -gpu_id 0 & 1 -nb gpu -pme gpu works as well -
> > > seems to occupy free logical cores then.
> > >
> > > Long story short:
> > >
> > > Are there any workarounds to squeeze the last bit out of my setup? Is
> > > it possible to choose the logical cores manually (I did not found
> > > anything in the docs so far)?
> > >
> > >
> > > Thank you for your help!
> > >
> > >
> > > Best,
> > >
> > > Sandro
> > >
> >
> > --
> > Paul Bauer, PhD
> > GROMACS Development Manager
> > KTH Stockholm, SciLifeLab
> > 0046737308594
> >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-requ...@gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing 

Re: [gmx-users] Gromacs 2019 - Ryzen Architecture

2020-01-04 Thread Kevin Boyd
Hi,

A few things besides any Ryzen-specific issues. First, your pinoffset for
the second one should be 16, not 17. The way yours is set up, you're
running on cores 0-15, then Gromacs will detect that your second
simulation parameters are invalid (because from cores 17-32, core 32 does
not exist) and turn off core pinning. You can verify that in the log file.

Second, 16 threads per simulation is overkill, and you can get gains from
stealing from GPU down-time by running 2 simulations per GPU. So I would
suggest something like

mdrun -nt 8 -pin on -pinoffset 0 -gpu_id 0 &
mdrun -nt 8 -pin on -pinoffset 8 -gpu_id 0 &
mdrun -nt 8 -pin on -pinoffset 16 -gpu_id 1 &
mdrun -nt 8 -pin on -pinoffset 24 -gpu_id 1

might give you close to optimal performance.

On Thu, Jan 2, 2020 at 5:32 AM Paul bauer  wrote:

> Hello,
>
> we only added full detection and support for the newer Rizen chip-sets
> with GROMACS 2019.5, so please try if the update to this version solves
> your issue.
> If not, please open an issue on redmine.gromacs.org so we can track the
> problem and try to solve it.
>
> Cheers
>
> Paul
>
> On 02/01/2020 13:26, Sandro Wrzalek wrote:
> > Hi,
> >
> > happy new year!
> >
> > Now to my problem:
> >
> > I use Gromacs 2019.3 and to try to run some simulations (roughly 30k
> > atoms per system) on my PC which has the following configuration:
> >
> > CPU: Ryzen 3950X (overclocked to 4.1 GHz)
> >
> > GPU #1: Nvidia RTX 2080 Ti
> >
> > GPU #2: Nvidia RTX 2080 Ti
> >
> > RAM: 64 GB
> >
> > PSU: 1600 Watts
> >
> >
> > Each run uses one GPU and 16 of 32 logical cores. Doing only one run
> > at time (gmx mdrun -deffnm rna0 -gpu_id 0 -nb gpu -pme gpu) the GPU
> > utilization is roughly around 84% but if I add a second run, the
> > utilization of both GPUs drops to roughly 20%, while leaving logical
> > cores 17-32 idle (I changed parameter gpu_id, accordingly).
> >
> > Adding additional parameters for each run:
> >
> > gmx mdrun -deffnm rna0 -nt 16 -pin on -pinoffset 0 -gpu_id 0 -nb gpu
> > -pme gpu
> >
> > gmx mdrun -deffnm rna0 -nt 16 -pin on -pinoffset 17 -gpu_id 1 -nb gpu
> > -pme gpu
> >
> > I get a utilization of 78% per GPU, which is nice but not near the 84%
> > I got with only one run. In theory, however, it should come at least
> > close to that utilization.
> >
> > I suspect, the Ryzen Chiplet design as the culprit since Gromacs seems
> > to prefer the the first Chiplet, even if two simultaneous simulations
> > have the resources to occupy both. The reason for the 78% utilization
> > could be because of overhead between the two Chiplets via the infinity
> > band. However, I have no proof, nor am I able to explain why gmx mdrun
> > -deffnm rna0 -nt 16 -gpu_id 0 & 1 -nb gpu -pme gpu works as well -
> > seems to occupy free logical cores then.
> >
> > Long story short:
> >
> > Are there any workarounds to squeeze the last bit out of my setup? Is
> > it possible to choose the logical cores manually (I did not found
> > anything in the docs so far)?
> >
> >
> > Thank you for your help!
> >
> >
> > Best,
> >
> > Sandro
> >
>
> --
> Paul Bauer, PhD
> GROMACS Development Manager
> KTH Stockholm, SciLifeLab
> 0046737308594
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Gromacs 2019 - Ryzen Architecture

2020-01-02 Thread Paul bauer

Hello,

we only added full detection and support for the newer Rizen chip-sets 
with GROMACS 2019.5, so please try if the update to this version solves 
your issue.
If not, please open an issue on redmine.gromacs.org so we can track the 
problem and try to solve it.


Cheers

Paul

On 02/01/2020 13:26, Sandro Wrzalek wrote:

Hi,

happy new year!

Now to my problem:

I use Gromacs 2019.3 and to try to run some simulations (roughly 30k 
atoms per system) on my PC which has the following configuration:


CPU: Ryzen 3950X (overclocked to 4.1 GHz)

GPU #1: Nvidia RTX 2080 Ti

GPU #2: Nvidia RTX 2080 Ti

RAM: 64 GB

PSU: 1600 Watts


Each run uses one GPU and 16 of 32 logical cores. Doing only one run 
at time (gmx mdrun -deffnm rna0 -gpu_id 0 -nb gpu -pme gpu) the GPU 
utilization is roughly around 84% but if I add a second run, the 
utilization of both GPUs drops to roughly 20%, while leaving logical 
cores 17-32 idle (I changed parameter gpu_id, accordingly).


Adding additional parameters for each run:

gmx mdrun -deffnm rna0 -nt 16 -pin on -pinoffset 0 -gpu_id 0 -nb gpu 
-pme gpu


gmx mdrun -deffnm rna0 -nt 16 -pin on -pinoffset 17 -gpu_id 1 -nb gpu 
-pme gpu


I get a utilization of 78% per GPU, which is nice but not near the 84% 
I got with only one run. In theory, however, it should come at least 
close to that utilization.


I suspect, the Ryzen Chiplet design as the culprit since Gromacs seems 
to prefer the the first Chiplet, even if two simultaneous simulations 
have the resources to occupy both. The reason for the 78% utilization 
could be because of overhead between the two Chiplets via the infinity 
band. However, I have no proof, nor am I able to explain why gmx mdrun 
-deffnm rna0 -nt 16 -gpu_id 0 & 1 -nb gpu -pme gpu works as well - 
seems to occupy free logical cores then.


Long story short:

Are there any workarounds to squeeze the last bit out of my setup? Is 
it possible to choose the logical cores manually (I did not found 
anything in the docs so far)?



Thank you for your help!


Best,

Sandro



--
Paul Bauer, PhD
GROMACS Development Manager
KTH Stockholm, SciLifeLab
0046737308594

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


[gmx-users] Gromacs 2019 - Ryzen Architecture

2020-01-02 Thread Sandro Wrzalek

Hi,

happy new year!

Now to my problem:

I use Gromacs 2019.3 and to try to run some simulations (roughly 30k 
atoms per system) on my PC which has the following configuration:


CPU: Ryzen 3950X (overclocked to 4.1 GHz)

GPU #1: Nvidia RTX 2080 Ti

GPU #2: Nvidia RTX 2080 Ti

RAM: 64 GB

PSU: 1600 Watts


Each run uses one GPU and 16 of 32 logical cores. Doing only one run at 
time (gmx mdrun -deffnm rna0 -gpu_id 0 -nb gpu -pme gpu) the GPU 
utilization is roughly around 84% but if I add a second run, the 
utilization of both GPUs drops to roughly 20%, while leaving logical 
cores 17-32 idle (I changed parameter gpu_id, accordingly).


Adding additional parameters for each run:

gmx mdrun -deffnm rna0 -nt 16 -pin on -pinoffset 0 -gpu_id 0 -nb gpu 
-pme gpu


gmx mdrun -deffnm rna0 -nt 16 -pin on -pinoffset 17 -gpu_id 1 -nb gpu 
-pme gpu


I get a utilization of 78% per GPU, which is nice but not near the 84% I 
got with only one run. In theory, however, it should come at least close 
to that utilization.


I suspect, the Ryzen Chiplet design as the culprit since Gromacs seems 
to prefer the the first Chiplet, even if two simultaneous simulations 
have the resources to occupy both. The reason for the 78% utilization 
could be because of overhead between the two Chiplets via the infinity 
band. However, I have no proof, nor am I able to explain why gmx mdrun 
-deffnm rna0 -nt 16 -gpu_id 0 & 1 -nb gpu -pme gpu works as well - seems 
to occupy free logical cores then.


Long story short:

Are there any workarounds to squeeze the last bit out of my setup? Is it 
possible to choose the logical cores manually (I did not found anything 
in the docs so far)?



Thank you for your help!


Best,

Sandro

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.