date:20190726

Re: [gmx-users] simulation on 2 gpus

2019-07-26 Thread Kevin Boyd

Sure - you can do it 2 ways with normal Gromacs. Either run the simulations
in separate terminals, or use ampersands to run them in the background of 1
terminal.

I'll give a concrete example for your threadripper, using 32 of your cores,
so that you could run some other computation on the other 32. I typically
make a bash variable with all the common arguments.

Given tprs run1.tpr ...run4.tpr

gmx_common="gmx mdrun -ntomp 8 -ntmpi 1 -pme gpu -nb gpu -pin on -pinstride
1"
$gmx_common -deffnm run1 -pinoffset 32 -gputasks 00 &
$gmx_common -deffnm run2 -pinoffest 40 -gputasks 00 &
$gmx_common -deffnm run3 -pinoffset 48 -gputasks 11 &
$gmx_common -deffnm run3 -pinoffset 56 -gputasks 11

So run1 will run on cores 32-39, on GPU 0, run2 on cores 40-47 on the same
GPU, and the other 2 runs will use GPU 1. Note the ampersands on the first
3 runs, so they'll go off in the background

I should also have mentioned one peculiarity with running with -ntmpi 1 and
-pme gpu, in that even though there's now only one rank (with nonbonded and
PME both running on it), you still need 2 gpu tasks for that one rank, one
for each type of interaction.

As for multidir, I forget what troubles I ran into exactly, but I was
unable to run some subset of simulations. Anyhow if you aren't running on a
cluster, I see no reason to compile with MPI and have to use srun or slurm,
and need to use gmx_mpi rather than gmx. The built-in thread-mpi gives you
up to 64 threads, and can have a minor (<5% in my experience) performance
benefit over MPI.

Kevin

On Fri, Jul 26, 2019 at 8:21 AM Gregory Man Kai Poon  wrote:

> Hi Kevin,
> Thanks for your very useful post.  Could you give a few command line
> examples on how to start multiple runs at different times (e.g., allocate a
> subset of CPU/GPU to one run, and start another run later using another
> unsubset of yet-unallocated CPU/GPU).  Also, could you elaborate on the
> drawbacks of the MPI compilation that you hinted at?
> Gregory
>
> From: Kevin Boyd
> Sent: Thursday, July 25, 2019 10:31 PM
> To: gmx-us...@gromacs.org
> Subject: Re: [gmx-users] simulation on 2 gpus
>
> Hi,
>
> I've done a lot of research/experimentation on this, so I can maybe get you
> started - if anyone has any questions about the essay to follow, feel free
> to email me personally, and I'll link it to the email thread if it ends up
> being pertinent.
>
> First, there's some more internet resources to checkout. See Mark's talk at
> -
>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbioexcel.eu%2Fwebinar-performance-tuning-and-optimization-of-gromacs%2Fdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C370b1f9ca9ad4a5e52af08d711c3d178%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636997405053958340sdata=6KGjrrBb8w%2FSqMvTtdzoiufYtOpmOKX5QyYNFAivTMo%3Dreserved=0
> Gromacs development moves fast, but a lot of it is still relevant.
>
> I'll expand a bit here, with the caveat that Gromacs GPU development is
> moving very fast and so the correct commands for optimal performance are
> both system-dependent and a moving target between versions. This is a good
> thing - GPUs have revolutionized the field, and with each iteration we make
> better use of them. The downside is that it's unclear exactly what sort of
> CPU-GPU balance you should look to purchase to take advantage of future
> developments, though the trend is certainly that more and more computation
> is being offloaded to the GPUs.
>
> The most important consideration is that to get maximum total throughput
> performance, you should be running not one but multiple simulations
> simultaneously. You can do this through the -multidir option, but I don't
> recommend that in this case, as it requires compiling with MPI and limits
> some of your options. My run scripts usually use "gmx mdrun ... &" to
> initiate subprocesses, with combinations of -ntomp, -ntmpi, -pin
> -pinoffset, and -gputasks. I can give specific examples if you're
> interested.
>
> Another important point is that you can run more simulations than the
> number of GPUs you have. Depending on CPU-GPU balance and quality, you
> won't double your throughput by e.g. putting 4 simulations on 2 GPUs, but
> you might increase it up to 1.5x. This would involve targeting the same GPU
> with -gputasks.
>
> Within a simulation, you should set up a benchmarking script to figure out
> the best combination of thread-mpi ranks and open-mp threads - this can
> have pretty drastic effects on performance. For example, if you want to use
> your entire machine for one simulation (not recommended for maximal
> efficiency), you have a lot of decomposition options (ignoring PME - which
> is important, see below):
>
> -ntmpi 2 -ntomp 32 -gputasks 01
> -ntmpi 4 -ntomp 16 -gputasks 0011
> -ntmpi 8 -ntomp 8  -gputasks 
> -ntmpi 16 -ntomp 4 -gputasks 111
> (and a few others - note that ntmpi * ntomp = total threads available)
>
> In my experience,

Re: [gmx-users] simulation on 2 gpus

2019-07-26 Thread Mark Abraham

Hi,

It's rather like the example at
http://manual.gromacs.org/current/user-guide/mdrun-performance.html#examples-for-mdrun-on-one-node
where
instead of

gmx mdrun -nt 6 -pin on -pinoffset 0 -pinstride 1
gmx mdrun -nt 6 -pin on -pinoffset 6 -pinstride 1

to run on a machine with 12 hardware threads, you want to adapt the number
of threads and also specify disjoint GPU sets, e.g.

gmx mdrun -nt 32 -pin on -pinoffset 0 -pinstride 1 -gpu_id 0
gmx mdrun -nt 32 -pin on -pinoffset 32 -pinstride 1 -gpu_id 1

That lets mdrun choose the mix of thread-MPI ranks vs OpenMP threads on
those ranks, but you could replace -nt 32 with -ntmpi N -ntomp M so long as
the product of N and M are 32.

Mark

On Fri, 26 Jul 2019 at 14:22, Gregory Man Kai Poon  wrote:

> Hi Kevin,
> Thanks for your very useful post.  Could you give a few command line
> examples on how to start multiple runs at different times (e.g., allocate a
> subset of CPU/GPU to one run, and start another run later using another
> unsubset of yet-unallocated CPU/GPU).  Also, could you elaborate on the
> drawbacks of the MPI compilation that you hinted at?
> Gregory
>
> From: Kevin Boyd
> Sent: Thursday, July 25, 2019 10:31 PM
> To: gmx-us...@gromacs.org
> Subject: Re: [gmx-users] simulation on 2 gpus
>
> Hi,
>
> I've done a lot of research/experimentation on this, so I can maybe get you
> started - if anyone has any questions about the essay to follow, feel free
> to email me personally, and I'll link it to the email thread if it ends up
> being pertinent.
>
> First, there's some more internet resources to checkout. See Mark's talk at
> -
>
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbioexcel.eu%2Fwebinar-performance-tuning-and-optimization-of-gromacs%2Fdata=02%7C01%7Cgpoon%40gsu.edu%7Cfd42b6ec3efa41d855b608d711714bdd%7C515ad73d8d5e4169895c9789dc742a70%7C0%7C0%7C636997050628368338sdata=%2BaUIuI63M7HRo%2B2VSUs0WIr0nYB10jE7lxnHW6gM8Os%3Dreserved=0
> Gromacs development moves fast, but a lot of it is still relevant.
>
> I'll expand a bit here, with the caveat that Gromacs GPU development is
> moving very fast and so the correct commands for optimal performance are
> both system-dependent and a moving target between versions. This is a good
> thing - GPUs have revolutionized the field, and with each iteration we make
> better use of them. The downside is that it's unclear exactly what sort of
> CPU-GPU balance you should look to purchase to take advantage of future
> developments, though the trend is certainly that more and more computation
> is being offloaded to the GPUs.
>
> The most important consideration is that to get maximum total throughput
> performance, you should be running not one but multiple simulations
> simultaneously. You can do this through the -multidir option, but I don't
> recommend that in this case, as it requires compiling with MPI and limits
> some of your options. My run scripts usually use "gmx mdrun ... &" to
> initiate subprocesses, with combinations of -ntomp, -ntmpi, -pin
> -pinoffset, and -gputasks. I can give specific examples if you're
> interested.
>
> Another important point is that you can run more simulations than the
> number of GPUs you have. Depending on CPU-GPU balance and quality, you
> won't double your throughput by e.g. putting 4 simulations on 2 GPUs, but
> you might increase it up to 1.5x. This would involve targeting the same GPU
> with -gputasks.
>
> Within a simulation, you should set up a benchmarking script to figure out
> the best combination of thread-mpi ranks and open-mp threads - this can
> have pretty drastic effects on performance. For example, if you want to use
> your entire machine for one simulation (not recommended for maximal
> efficiency), you have a lot of decomposition options (ignoring PME - which
> is important, see below):
>
> -ntmpi 2 -ntomp 32 -gputasks 01
> -ntmpi 4 -ntomp 16 -gputasks 0011
> -ntmpi 8 -ntomp 8  -gputasks 
> -ntmpi 16 -ntomp 4 -gputasks 111
> (and a few others - note that ntmpi * ntomp = total threads available)
>
> In my experience, you need to scan the options in a benchmarking script for
> each simulation size/content you want to simulate, and the difference
> between the best and the worst can be up to a factor of 2-4 in terms of
> performance. If you're splitting your machine among multiple simulations, I
> suggest running 1 mpi thread (-ntmpi 1) per simulation, unless your
> benchmarking suggests that the optimal performance lies elsewhere.
>
> Things get more complicated when you start putting PME on the GPUs. For the
> machines I work on, putting PME on GPUs absolutely improves performance,
> but I'm not fully confident in that assessment without testing your
> specific machine - you have a lot of cores with that threadripper, and this
> is another area where I expect Gromacs 2020 might shift the GPU-CPU optimal
> balance.
>
> The issue with PME on GPUs is that we

Re: [gmx-users] simulation on 2 gpus

2019-07-26 Thread Gregory Man Kai Poon

Hi Kevin,
Thanks for your very useful post.  Could you give a few command line examples 
on how to start multiple runs at different times (e.g., allocate a subset of 
CPU/GPU to one run, and start another run later using another unsubset of 
yet-unallocated CPU/GPU).  Also, could you elaborate on the drawbacks of the 
MPI compilation that you hinted at?
Gregory

From: Kevin Boyd
Sent: Thursday, July 25, 2019 10:31 PM
To: gmx-us...@gromacs.org
Subject: Re: [gmx-users] simulation on 2 gpus

Hi,

I've done a lot of research/experimentation on this, so I can maybe get you
started - if anyone has any questions about the essay to follow, feel free
to email me personally, and I'll link it to the email thread if it ends up
being pertinent.

First, there's some more internet resources to checkout. See Mark's talk at
-
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbioexcel.eu%2Fwebinar-performance-tuning-and-optimization-of-gromacs%2Fdata=02%7C01%7Cgpoon%40gsu.edu%7Cfd42b6ec3efa41d855b608d711714bdd%7C515ad73d8d5e4169895c9789dc742a70%7C0%7C0%7C636997050628368338sdata=%2BaUIuI63M7HRo%2B2VSUs0WIr0nYB10jE7lxnHW6gM8Os%3Dreserved=0
Gromacs development moves fast, but a lot of it is still relevant.

I'll expand a bit here, with the caveat that Gromacs GPU development is
moving very fast and so the correct commands for optimal performance are
both system-dependent and a moving target between versions. This is a good
thing - GPUs have revolutionized the field, and with each iteration we make
better use of them. The downside is that it's unclear exactly what sort of
CPU-GPU balance you should look to purchase to take advantage of future
developments, though the trend is certainly that more and more computation
is being offloaded to the GPUs.

The most important consideration is that to get maximum total throughput
performance, you should be running not one but multiple simulations
simultaneously. You can do this through the -multidir option, but I don't
recommend that in this case, as it requires compiling with MPI and limits
some of your options. My run scripts usually use "gmx mdrun ... &" to
initiate subprocesses, with combinations of -ntomp, -ntmpi, -pin
-pinoffset, and -gputasks. I can give specific examples if you're
interested.

Another important point is that you can run more simulations than the
number of GPUs you have. Depending on CPU-GPU balance and quality, you
won't double your throughput by e.g. putting 4 simulations on 2 GPUs, but
you might increase it up to 1.5x. This would involve targeting the same GPU
with -gputasks.

Within a simulation, you should set up a benchmarking script to figure out
the best combination of thread-mpi ranks and open-mp threads - this can
have pretty drastic effects on performance. For example, if you want to use
your entire machine for one simulation (not recommended for maximal
efficiency), you have a lot of decomposition options (ignoring PME - which
is important, see below):

-ntmpi 2 -ntomp 32 -gputasks 01
-ntmpi 4 -ntomp 16 -gputasks 0011
-ntmpi 8 -ntomp 8  -gputasks 
-ntmpi 16 -ntomp 4 -gputasks 111
(and a few others - note that ntmpi * ntomp = total threads available)

In my experience, you need to scan the options in a benchmarking script for
each simulation size/content you want to simulate, and the difference
between the best and the worst can be up to a factor of 2-4 in terms of
performance. If you're splitting your machine among multiple simulations, I
suggest running 1 mpi thread (-ntmpi 1) per simulation, unless your
benchmarking suggests that the optimal performance lies elsewhere.

Things get more complicated when you start putting PME on the GPUs. For the
machines I work on, putting PME on GPUs absolutely improves performance,
but I'm not fully confident in that assessment without testing your
specific machine - you have a lot of cores with that threadripper, and this
is another area where I expect Gromacs 2020 might shift the GPU-CPU optimal
balance.

The issue with PME on GPUs is that we can (currently) only have one rank
doing GPU PME work. So, if we have a machine with say 20 cores and 2 gpus,
if I run the following

gmx mdrun  -ntomp 10 -ntmpi 2 -pme gpu -npme 1 -gputasks 01

, two ranks will be started - one with cores 0-9, will work on the
short-range interactions, offloading where it can to GPU 0, and the PME
rank (cores 10-19)  will offload to GPU 1. There is one significant problem
(and one minor problem) with this setup. First, it is massively inefficient
in terms of load balance. In a typical system (there are exceptions), PME
takes up ~1/3 of the computation that short-range interactions take. So, we
are offloading 1/4 of our interactions to one GPU and 3/4 to the other,
which leads to imbalance. In this specific case (2 GPUs and sufficient
cores), the most optimal solution is often (but not always) to run with
-ntmpi 4 (in this example, then -ntomp

Re: [gmx-users] Variation in free energy between GROMACS versions?

2019-07-26 Thread Mark Abraham

Hi,

On Wed, 24 Jul 2019 at 21:33, Kenneth Huang  wrote:

> Hi Mark,
>
>
> Thanks for the reply- I guessed a number of things had changed in the last
> decade, but wasn't sure if it was even salvageable as a comparison point
> anymore.
>
>
> Out of curiosity, could you elaborate on what sort of measurements I could
> make to scale the difference? Since I don’t have a run with the older
> version, wouldn’t anything I measure reflect the current version? I imagine
> it can't be as simple as adjusting the cutoffs in the electrostatics.

The cutoff determines the shift, which is the same quantity on *each*
interaction. So as I said, you could measure the shift applicable in your
case (from the value of the Coulomb interaction at the cutoff), and
estimate the number of interactions of each kind of charge-charge pair to
which it applies, to see if the difference is plausible. But I don't see
that as a useful way of assessing whether the modern code can reproduce the
intended observables, which don't relate to the absolute energies.

Mark
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Simulated tampering using GROMACS software package (protein+membrane)

2019-07-26 Thread Pratiti Bhadra

Dear User,

I am trying simulated tampering with Gromacs 2018.1

mdp setting with


nstexpanded = 100
simulated-tempering = yes
sim-temp-low = 300
sim-temp-high = 355
simulated-tempering-scaling = linear
init_lambda_state = 0
temperature_lambdas = 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

But temperature of simulation is not shifting. it is always 300.
I am pulluzed, what I am doing wrong and what parameters I have to set.

Regards,
Pratiti

-- 
Pratiti Bhadra
Post Doctoral Research Fellow
University of Macau, Macau, China
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Gromos force field with Tip5p water model

2019-07-26 Thread nevjernik

Is it possible to use tip5p water model with gromos force field?

 

Thanks

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] topol. top file

2019-07-26 Thread Dhrubajyoti Maji

Dear gromacs users,
 I am writing the top file for liquid acetamide system for
OPLS FF. I am confused with the [pairs] section. Should I give the 1-4
atom numbers here or left it empty for OPLS to calculate this by itself ?
Another question is, in [ atomtypes] section what should be written first ,
sigma or epsilon ?
Thanks and regards
Dhrubajyoti Maji
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] simulation on 2 gpus

Re: [gmx-users] simulation on 2 gpus

Re: [gmx-users] simulation on 2 gpus

Re: [gmx-users] Variation in free energy between GROMACS versions?

[gmx-users] Simulated tampering using GROMACS software package (protein+membrane)

[gmx-users] Gromos force field with Tip5p water model

[gmx-users] topol. top file

7 matches

Site Navigation

Mail list logo

Footer information