Re: [gmx-users] simulation on 2 gpus

2019-07-25 Thread Kevin Boyd
Hi,

I've done a lot of research/experimentation on this, so I can maybe get you
started - if anyone has any questions about the essay to follow, feel free
to email me personally, and I'll link it to the email thread if it ends up
being pertinent.

First, there's some more internet resources to checkout. See Mark's talk at
-
https://bioexcel.eu/webinar-performance-tuning-and-optimization-of-gromacs/
Gromacs development moves fast, but a lot of it is still relevant.

I'll expand a bit here, with the caveat that Gromacs GPU development is
moving very fast and so the correct commands for optimal performance are
both system-dependent and a moving target between versions. This is a good
thing - GPUs have revolutionized the field, and with each iteration we make
better use of them. The downside is that it's unclear exactly what sort of
CPU-GPU balance you should look to purchase to take advantage of future
developments, though the trend is certainly that more and more computation
is being offloaded to the GPUs.

The most important consideration is that to get maximum total throughput
performance, you should be running not one but multiple simulations
simultaneously. You can do this through the -multidir option, but I don't
recommend that in this case, as it requires compiling with MPI and limits
some of your options. My run scripts usually use "gmx mdrun ... &" to
initiate subprocesses, with combinations of -ntomp, -ntmpi, -pin
-pinoffset, and -gputasks. I can give specific examples if you're
interested.

Another important point is that you can run more simulations than the
number of GPUs you have. Depending on CPU-GPU balance and quality, you
won't double your throughput by e.g. putting 4 simulations on 2 GPUs, but
you might increase it up to 1.5x. This would involve targeting the same GPU
with -gputasks.

Within a simulation, you should set up a benchmarking script to figure out
the best combination of thread-mpi ranks and open-mp threads - this can
have pretty drastic effects on performance. For example, if you want to use
your entire machine for one simulation (not recommended for maximal
efficiency), you have a lot of decomposition options (ignoring PME - which
is important, see below):

-ntmpi 2 -ntomp 32 -gputasks 01
-ntmpi 4 -ntomp 16 -gputasks 0011
-ntmpi 8 -ntomp 8  -gputasks 
-ntmpi 16 -ntomp 4 -gputasks 111
(and a few others - note that ntmpi * ntomp = total threads available)

In my experience, you need to scan the options in a benchmarking script for
each simulation size/content you want to simulate, and the difference
between the best and the worst can be up to a factor of 2-4 in terms of
performance. If you're splitting your machine among multiple simulations, I
suggest running 1 mpi thread (-ntmpi 1) per simulation, unless your
benchmarking suggests that the optimal performance lies elsewhere.

Things get more complicated when you start putting PME on the GPUs. For the
machines I work on, putting PME on GPUs absolutely improves performance,
but I'm not fully confident in that assessment without testing your
specific machine - you have a lot of cores with that threadripper, and this
is another area where I expect Gromacs 2020 might shift the GPU-CPU optimal
balance.

The issue with PME on GPUs is that we can (currently) only have one rank
doing GPU PME work. So, if we have a machine with say 20 cores and 2 gpus,
if I run the following

gmx mdrun  -ntomp 10 -ntmpi 2 -pme gpu -npme 1 -gputasks 01

, two ranks will be started - one with cores 0-9, will work on the
short-range interactions, offloading where it can to GPU 0, and the PME
rank (cores 10-19)  will offload to GPU 1. There is one significant problem
(and one minor problem) with this setup. First, it is massively inefficient
in terms of load balance. In a typical system (there are exceptions), PME
takes up ~1/3 of the computation that short-range interactions take. So, we
are offloading 1/4 of our interactions to one GPU and 3/4 to the other,
which leads to imbalance. In this specific case (2 GPUs and sufficient
cores), the most optimal solution is often (but not always) to run with
-ntmpi 4 (in this example, then -ntomp 5), as the PME rank then gets 1/4 of
the GPU instructions, proportional to the computation needed.

The second(less critical - don't worry about this unless you're
CPU-limited) problem is that PME-GPU mpi ranks only use 1 CPU core in their
calculations. So, with a node of 20 cores and 2 GPUs, if I run a simulation
with -ntmpi 4 -ntmpi 5 -pme gpu -npme 1 -pme gpu, each one of those ranks
will have 5 CPUs, but the PME rank will only use one of them. You can
specify the number of PME cores per rank with -ntomp_pme. This is useful in
restricted cases. For example, given the above architecture setup (20
cores, 2 GPUs), I could maximally exploit my CPUs with the following
commands:

gmx mdrun  -ntmpi 4 -ntomp 3 -ntomp_pme 1 -pme gpu -npme 1 -gputasks
 -pin on -pinoffset 0 &
gmx mdrun  -ntmpi 4 

Re: [gmx-users] gmx insert-molecules question

2019-07-25 Thread Mala L Radhakrishnan
Thanks!  (and sorry about all the typos in my original email -- just
re-read it and saw it was barely intelligible!)

M

On Thu, Jul 25, 2019 at 4:25 PM Justin Lemkul  wrote:

>
>
> On 7/25/19 4:11 PM, Mala L Radhakrishnan wrote:
> > Hi,
> >
> > I am trying to crod snapshots with multiple copies of a molecule.  When I
> > run gmx insert-molecules and I have a box of a certain size, does it make
> > sure that there is no overlap of molecules even considering pbc?  What I
> > mean by this is if I do a trjconv on the resulting  crowded snapshot,
> using
> > -pbc atom, will any of the atoms/molecules overlap?  Or maybe another way
> > of saying it is, if a molecule, when places, falls partially outside the
> > box does it ensure that another molecule wouldn't overlap with the
> periodic
> > image of that first molecule?
>
> Yes, the insert-molecules code uses PBC in its neighbor searching.
>
> -Justin
>
> --
> ==
>
> Justin A. Lemkul, Ph.D.
> Assistant Professor
> Office: 301 Fralin Hall
> Lab: 303 Engel Hall
>
> Virginia Tech Department of Biochemistry
> 340 West Campus Dr.
> Blacksburg, VA 24061
>
> jalem...@vt.edu | (540) 231-3129
> http://www.thelemkullab.com
>
> ==
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>


-- 
Mala L. Radhakrishnan
Associate Professor of Chemistry
Director, Biochemistry Program
Wellesley College
106 Central Street
Wellesley, MA 02481
(781)283-2981
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] gmx insert-molecules question

2019-07-25 Thread Justin Lemkul




On 7/25/19 4:11 PM, Mala L Radhakrishnan wrote:

Hi,

I am trying to crod snapshots with multiple copies of a molecule.  When I
run gmx insert-molecules and I have a box of a certain size, does it make
sure that there is no overlap of molecules even considering pbc?  What I
mean by this is if I do a trjconv on the resulting  crowded snapshot, using
-pbc atom, will any of the atoms/molecules overlap?  Or maybe another way
of saying it is, if a molecule, when places, falls partially outside the
box does it ensure that another molecule wouldn't overlap with the periodic
image of that first molecule?


Yes, the insert-molecules code uses PBC in its neighbor searching.

-Justin

--
==

Justin A. Lemkul, Ph.D.
Assistant Professor
Office: 301 Fralin Hall
Lab: 303 Engel Hall

Virginia Tech Department of Biochemistry
340 West Campus Dr.
Blacksburg, VA 24061

jalem...@vt.edu | (540) 231-3129
http://www.thelemkullab.com

==

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Angular distribution

2019-07-25 Thread Justin Lemkul




On 7/25/19 8:05 AM, Omkar Singh wrote:

Hi everyone,
I have a protein Water simulated system and I want to calculate Angular
distribution function. How can I find can anyone help me with this?


What angles do you want to calculate?

-Justin

--
==

Justin A. Lemkul, Ph.D.
Assistant Professor
Office: 301 Fralin Hall
Lab: 303 Engel Hall

Virginia Tech Department of Biochemistry
340 West Campus Dr.
Blacksburg, VA 24061

jalem...@vt.edu | (540) 231-3129
http://www.thelemkullab.com

==

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


[gmx-users] gmx insert-molecules question

2019-07-25 Thread Mala L Radhakrishnan
Hi,

I am trying to crod snapshots with multiple copies of a molecule.  When I
run gmx insert-molecules and I have a box of a certain size, does it make
sure that there is no overlap of molecules even considering pbc?  What I
mean by this is if I do a trjconv on the resulting  crowded snapshot, using
-pbc atom, will any of the atoms/molecules overlap?  Or maybe another way
of saying it is, if a molecule, when places, falls partially outside the
box does it ensure that another molecule wouldn't overlap with the periodic
image of that first molecule?

Hope this question makes sense -- thanks!

Mala

-- 
Mala L. Radhakrishnan
Associate Professor of Chemistry
Director, Biochemistry Program
Wellesley College
106 Central Street
Wellesley, MA 02481
(781)283-2981
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


[gmx-users] simulation on 2 gpus

2019-07-25 Thread Stefano Guglielmo
Dear all,
I am trying to run simulation with Gromacs 2019.2 on a workstation with an
amd Threadripper cpu (32 core, 64 threads, 128 GB RAM and with two rtx 2080
ti with nvlink bridge. I read user's guide section regarding performance
and I am exploring some possibile combinations of cpu/gpu work to run as
fast as possible. I was wondering if some of you has experience of running
on more than one gpu with several cores and can give some hints as starting
point.
Thanks
Stefano


-- 
Stefano GUGLIELMO PhD
Assistant Professor of Medicinal Chemistry
Department of Drug Science and Technology
Via P. Giuria 9
10125 Turin, ITALY
ph. +39 (0)11 6707178
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


[gmx-users] Angular distribution

2019-07-25 Thread Omkar Singh
Hi everyone,
I have a protein Water simulated system and I want to calculate Angular
distribution function. How can I find can anyone help me with this?

Thanks
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Sun Solaris

2019-07-25 Thread Szilárd Páll
On Thu, Jul 25, 2019 at 11:31 AM amitabh jayaswal
 wrote:
>
> Dear All,
> *Namaskar!*
> Can GROMACS be installed and run on a Sun Solaris system?

Hi,

As long as you have modern C++ compilers and toolchain, you should be
able to do so.

> We have a robust IBM Desktop which we intend to dedicatedly use for
> GROMACS; however we are facing difficulties in installing it.

Without specifics of your difficulties, I do not think we can help out.

--
Szilárd

> The machine specifications are:
> PRODUCT NAME: IBM System x3400
> MACHINE TYPE:   7973
> SERIAL NO.:  99A8370
> PRODUCT ID:7973PAA
>
> Is there a way to progress ahead?
> Kindly provide a solution asap.
> Best
>
> *Dr. Amitabh Jayaswal*
> *PhD Bioinformatics*
> *IIT(BHU), Varanasi, India*
> *M: +91 9868 33 00 88 *(also on WhatsApp), and
> * +91 7376 019 155*
> --
> Gromacs Users mailing list
>
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] Sun Solaris

2019-07-25 Thread amitabh jayaswal
Dear All,
*Namaskar!*
Can GROMACS be installed and run on a Sun Solaris system?
We have a robust IBM Desktop which we intend to dedicatedly use for
GROMACS; however we are facing difficulties in installing it.
The machine specifications are:
PRODUCT NAME: IBM System x3400
MACHINE TYPE:   7973
SERIAL NO.:  99A8370
PRODUCT ID:7973PAA

Is there a way to progress ahead?
Kindly provide a solution asap.
Best

*Dr. Amitabh Jayaswal*
*PhD Bioinformatics*
*IIT(BHU), Varanasi, India*
*M: +91 9868 33 00 88 *(also on WhatsApp), and
* +91 7376 019 155*
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] remd error

2019-07-25 Thread Szilárd Páll
This is an MPI / job scheduler error: you are requesting 2 nodes with
20 processes per node (=40 total), but starting 80 ranks.
--
Szilárd

On Thu, Jul 18, 2019 at 8:33 AM Bratin Kumar Das
<177cy500.bra...@nitk.edu.in> wrote:
>
> Hi,
>I am running remd simulation in gromacs-2016.5. After generating the
> multiple .tpr file in each directory by the following command
> *for i in {0..7}; do cd equil$i; gmx grompp -f equil${i}.mdp -c em.gro -p
> topol.top -o remd$i.tpr -maxwarn 1; cd ..; done*
> I run *mpirun -np 80 gmx_mpi mdrun -s remd.tpr -multi 8 -replex 1000
> -reseed 175320 -deffnm remd_equil*
> It is giving the following error
> There are not enough slots available in the system to satisfy the 40 slots
> that were requested by the application:
>   gmx_mpi
>
> Either request fewer slots for your application, or make more slots
> available
> for use.
> --
> --
> There are not enough slots available in the system to satisfy the 40 slots
> that were requested by the application:
>   gmx_mpi
>
> Either request fewer slots for your application, or make more slots
> available
> for use.
> --
> I am not understanding the error. Any suggestion will be highly
> appriciated. The mdp file and the qsub.sh file is attached below
>
> qsub.sh...
> #! /bin/bash
> #PBS -V
> #PBS -l nodes=2:ppn=20
> #PBS -l walltime=48:00:00
> #PBS -N mdrun-serial
> #PBS -j oe
> #PBS -o output.log
> #PBS -e error.log
> #cd /home/bratin/Downloads/GROMACS/Gromacs_fibril
> cd $PBS_O_WORKDIR
> module load openmpi3.0.0
> module load gromacs-2016.5
> NP='cat $PBS_NODEFILE | wc -1'
> # mpirun --machinefile $PBS_PBS_NODEFILE -np $NP 'which gmx_mpi' mdrun -v
> -s nvt.tpr -deffnm nvt
> #/apps/gromacs-2016.5/bin/mpirun -np 8 gmx_mpi mdrun -v -s remd.tpr -multi
> 8 -replex 1000 -deffnm remd_out
> for i in {0..7}; do cd equil$i; gmx grompp -f equil${i}.mdp -c em.gro -r
> em.gro -p topol.top -o remd$i.tpr -maxwarn 1; cd ..; done
>
> for i in {0..7}; do cd equil${i}; mpirun -np 40 gmx_mpi mdrun -v -s
> remd.tpr -multi 8 -replex 1000 -deffnm remd$i_out ; cd ..; done
> --
> Gromacs Users mailing list
>
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] performance issues running gromacs with more than 1 gpu card in slurm

2019-07-25 Thread Szilárd Páll
Hi,

It is not clear to me how are you trying to set up your runs, so
please provide some details:
- are you trying to run multiple simulations concurrently on the same
node or are you trying to strong-scale?
- what are you simulating?
- can you provide log files of the runs?

Cheers,

--
Szilárd

On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
 wrote:
>
> No one can give me an idea of what can be happening? Or how I can solve it?
> Best regards,
> Carlos
>
> ——
> Carlos Navarro Retamal
> Bioinformatic Engineering. PhD.
> Postdoctoral Researcher in Center of Bioinformatics and Molecular
> Simulations
> Universidad de Talca
> Av. Lircay S/N, Talca, Chile
> E: carlos.navarr...@gmail.com or cnava...@utalca.cl
>
> On July 19, 2019 at 2:20:41 PM, Carlos Navarro (carlos.navarr...@gmail.com)
> wrote:
>
> Dear gmx-users,
> I’m currently working in a server where each node posses 40 physical cores
> (40 threads) and 4 Nvidia-V100.
> When I launch a single job (1 simulation using a single gpu card) I get a
> performance of about ~35ns/day in a system of about 300k atoms. Looking
> into the usage of the video card during the simulation I notice that the
> card is being used about and ~80%.
> The problems arise when I increase the number of jobs running at the same
> time. If for instance 2 jobs are running at the same time, the performance
> drops to ~25ns/day each and the usage of the video cards also drops during
> the simulation to about a ~30-40% (and sometimes dropping to less than 5%).
> Clearly there is a communication problem between the gpu cards and the cpu
> during the simulations, but I don’t know how to solve this.
> Here is the script I use to run the simulations:
>
> #!/bin/bash -x
> #SBATCH --job-name=testAtTPC1
> #SBATCH --ntasks-per-node=4
> #SBATCH --cpus-per-task=20
> #SBATCH --account=hdd22
> #SBATCH --nodes=1
> #SBATCH --mem=0
> #SBATCH --output=sout.%j
> #SBATCH --error=s4err.%j
> #SBATCH --time=00:10:00
> #SBATCH --partition=develgpus
> #SBATCH --gres=gpu:4
>
> module use /gpfs/software/juwels/otherstages
> module load Stages/2018b
> module load Intel/2019.0.117-GCC-7.3.0
> module load IntelMPI/2019.0.117
> module load GROMACS/2018.3
>
> WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
> WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
> WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
> WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4
>
> DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
> EXE=" gmx mdrun "
>
> cd $WORKDIR1
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
> -ntomp 20 &>log &
> cd $WORKDIR2
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
> -ntomp 20 &>log &
> cd $WORKDIR3
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20  -nmpi 1 -pin on -pinoffset 20
> -ntomp 20 &>log &
> cd $WORKDIR4
> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30
> -ntomp 20 &>log &
>
>
> Regarding to pinoffset, I first tried using 20 cores for each job but then
> also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2,
> pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the problem
> persist.
>
> Currently in this machine I’m not able to use more than 1 gpu per job, so
> this is my only choice to use properly the whole node.
> If you need more information please just let me know.
> Best regards.
> Carlos
>
> ——
> Carlos Navarro Retamal
> Bioinformatic Engineering. PhD.
> Postdoctoral Researcher in Center of Bioinformatics and Molecular
> Simulations
> Universidad de Talca
> Av. Lircay S/N, Talca, Chile
> E: carlos.navarr...@gmail.com or cnava...@utalca.cl
> --
> Gromacs Users mailing list
>
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

[gmx-users] older server CPUs with recent GPUs for GROMACS

2019-07-25 Thread Szilárd Páll
Hi Mike,

Forking the discussion to have a consistent topic that is more discoverable.

On Thu, Jul 18, 2019 at 4:21 PM Michael Williams
 wrote:
>
> Hi Szilárd,
>
> Thanks for the interesting observations on recent hardware. I was wondering 
> if you could comment on the use of somewhat older server cpus and 
> motherboards (versus more cutting edge consumer parts). I recently noticed 
> that Haswell era Xeon cpus (E5 v3) are quite affordable now (~$400 for 12 
> core models with 40 pcie lanes) and so are the corresponding 2 cpu socket 
> server motherboards. Of course the RAM is slower than what can be used with 
> the latest Ryzen or i7/i9 cpus.


,When it comes to GPU accelerated runs, given that most of the
arithmetically-intensive computation is offloaded, major features of
more modern processors / CPU instruction sets don't help much (like
AVX512). As most bio-MD (unless running huge systems) fits in the CPU
cache, RAM performance and more memory channels also has little to no
impact (with some exceptions being 1-st gen AMD Zen arch, but that's
another topic). What dominates the performance CPU contribution of
CPUs is cache size (and speed/efficiency) and number/speed of the CPU
cores. This is somewhat of a non-trivial thing to assess as the clock
speed specs don't always reflect the stable clocks these CPUs run at,
but roughly you can count the (#core x frequency) as a metric to gauge
the performance of a CPU *in such a scenario*.

More on this you can find in our recent paper where we do in fact
compare the performance of the best bang for buck modern servers
(spoiler alert: AMD EPYC was already and will especially be the
champion with the Rome arch) with upgraded older Xeon v2 nodes; see:
https://doi.org/10.1002/jcc.26011

>
> Are there any other bottlenecks with this somewhat older server hardware that 
> I might not be aware of?

There can be: PCI topology can be an issue; you want a symmetric, e.g.
two x16 buses connected directly to each socket (for dual-socket
systems) rather than e.g. many lanes connected to a PCI switch all
connected to the same socket. You can also have significant GPU-to-GPU
communication issues on older-gen hardware (like v2/v3 Xeon), but
GROMACS does not make use of that yet (partly due to that very
reason), but with the near future releases that may also be a slight
concern if you want to scale across many GPUs.


I hope that helps, let me know if you have any other questions!

Cheers,
--
Szilárd

> Thanks again for the interesting information and practical advice on this 
> topic.
>
> Mike
>
>
> > On Jul 18, 2019, at 2:21 AM, Szilárd Páll  wrote:
> >
> > PS: You will get more PCIe lanes without motherboard trickery -- and note
> > that consumer motherboards with PCIe switches can sometimes cause
> > instabilities when under heavy compute load -- if you buy the aging and
> > quite overpriced i9 X-series like the i9-7920 with 12 cores or the
> > Threadripper 2950x 16 cores and 60 PCIe lanes.
> >
> > Also note that, but more cores always win when the CPU performance matters
> > and while 8 cores are generally sufficient, in some use-cases it may not be
> > (like runs with free energy).
> >
> > --
> > Szilárd
> >
> >
> > On Thu, Jul 18, 2019 at 10:08 AM Szilárd Páll 
> > wrote:
> >
> >> On Wed, Jul 17, 2019 at 7:00 PM Moir, Michael (MMoir) 
> >> wrote:
> >>
> >>> This is not quite true.  I certainly observed this degradation in
> >>> performance using the 9900K with two GPUs as Szilárd states using a
> >>> motherboard with one PCIe controller, but the limitation is from the
> >>> motherboard not from the CPU.
> >>
> >>
> >> Sorry, but that's not the case. PCIe controllers have been integrated into
> >> CPUs for many years; see
> >>
> >> https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-introduction-basics-paper.pdf
> >>
> >> https://www.microway.com/hpc-tech-tips/common-pci-express-myths-gpu-computing/
> >>
> >> So no, the limitation is the CPU itself. Consumer CPUs these days have 24
> >> lanes total, some of which are used to connect the CPU to the chipset, and
> >> effectively you get 16-20 lanes (BTW here too the new AMD CPUs win as they
> >> provide 16 lanes for GPUs and similar devices and 4 lanes for NVMe, all on
> >> PCIe 4.0).
> >>
> >>
> >>>  It is possible to obtain a motherboard that contains two PCIe
> >>> controllers which overcomes this obstacle for not a whole lot more money.
> >>>
> >>
> >> It is possibly to buy motherboards with PCIe switches. These don't
> >> increase the number of lanes just do what a swtich does: as long as not all
> >> connected devices try to use the full capacity of the CPU (!) at the same
> >> time, you can get full speed on all connected devices.
> >> e.g.:
> >> https://techreport.com/r.x/2015_11_19_Gigabytes_Z170XGaming_G1_motherboard_reviewed/05-diagram_pcie_routing.gif
> >>
> >> Cheers,
> >> --
> >> Szilárd
> >>
> >> Mike
> >>>
> >>> -Original Message-
> >>> From: