Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-09 Thread Alex
Yup, your assessment agrees with our guess. Our HPC guru will be taking his
findings, along with your quote, to the admins.

Thank you,

Alex

On Thu, May 9, 2019 at 2:51 PM Szilárd Páll  wrote:

> On Thu, May 9, 2019 at 10:01 PM Alex  wrote:
>
> > Okay, we're positively unable to run a Gromacs (2019.1) test on Power9.
> The
> > test procedure is simple, using slurm:
> > 1. Request an interactive session: > srun -N 1 -n 20 --pty
> > --partition=debug --time=1:00:00 --gres=gpu:1 bash
> > 2. Load CUDA library: module load cuda
> > 3. Run test batch. This starts with a CPU-only static EM, which, despite
> > the mdrun variables, runs on a single thread. Any help will be highly
> > appreciated.
> >
> >  md.log below:
> >
> > GROMACS:  gmx mdrun, version 2019.1
> > Executable:   /home/reida/ppc64le/stow/gromacs/bin/gmx
> > Data prefix:  /home/reida/ppc64le/stow/gromacs
> > Working dir:  /home/smolyan/gmx_test1
> > Process ID:   115831
> > Command line:
> >   gmx mdrun -pin on -pinstride 2 -ntomp 4 -ntmpi 4 -pme cpu -nb cpu -s
> > em.tpr -o traj.trr -g md.log -c after_em.pdb
> >
> > GROMACS version:2019.1
> > Precision:  single
> > Memory model:   64 bit
> > MPI library:thread_mpi
> > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> > GPU support:CUDA
> > SIMD instructions:  IBM_VSX
> > FFT library:fftw-3.3.8
> > RDTSCP usage:   disabled
> > TNG support:enabled
> > Hwloc support:  hwloc-1.11.8
> > Tracing support:disabled
> > C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc GNU 7.3.1
> > C compiler flags:   -mcpu=power9 -mtune=power9  -mvsx -O2 -DNDEBUG
> > -funroll-all-loops -fexcess-precision=fast
> > C++ compiler:   /opt/rh/devtoolset-7/root/usr/bin/c++ GNU 7.3.1
> > C++ compiler flags: -mcpu=power9 -mtune=power9  -mvsx-std=c++11   -O2
> > -DNDEBUG -funroll-all-loops -fexcess-precision=fast
> > CUDA compiler:  /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA (R) Cuda
> > compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on
> > Sat_Aug_25_21:10:00_CDT_2018;Cuda compilation tools, release 10.0,
> > V10.0.130
> > CUDA compiler
> >
> >
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;;
> >
> >
> -mcpu=power9;-mtune=power9;-mvsx;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> > CUDA driver:10.10
> > CUDA runtime:   10.0
> >
> >
> > Running on 1 node with total 160 cores, 160 logical cores, 1 compatible
> GPU
> > Hardware detected:
> >   CPU info:
> > Vendor: IBM
> > Brand:  POWER9, altivec supported
> > Family: 0   Model: 0   Stepping: 0
> > Features: vmx vsx
> >   Hardware topology: Only logical processor count
> >   GPU info:
> > Number of GPUs detected: 1
> > #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> > compatible
> >
> >
> >  PLEASE READ AND CITE THE FOLLOWING REFERENCE 
> >
> > *SKIPPED*
> >
> > Input Parameters:
> >integrator = steep
> >tinit  = 0
> >dt = 0.001
> >nsteps = 5
> >init-step  = 0
> >simulation-part= 1
> >comm-mode  = Linear
> >nstcomm= 100
> >bd-fric= 0
> >ld-seed= 1941752878
> >emtol  = 100
> >emstep = 0.01
> >niter  = 20
> >fcstep = 0
> >nstcgsteep = 1000
> >nbfgscorr  = 10
> >rtpi   = 0.05
> >nstxout= 0
> >nstvout= 0
> >nstfout= 0
> >nstlog = 1000
> >nstcalcenergy  = 100
> >nstenergy  = 1000
> >nstxout-compressed = 0
> >compressed-x-precision = 1000
> >cutoff-scheme  = Verlet
> >nstlist= 1
> >ns-type= Grid
> >pbc= xyz
> >periodic-molecules = true
> >verlet-buffer-tolerance= 0.005
> >rlist  = 1.2
> >coulombtype= PME
> >coulomb-modifier   = Potential-shift
> >rcoulomb-switch= 0
> >rcoulomb   = 1.2
> >epsilon-r  = 1
> >epsilon-rf = inf
> >vdw-type   = Cut-off

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-09 Thread Szilárd Páll
On Thu, May 9, 2019 at 10:01 PM Alex  wrote:

> Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. The
> test procedure is simple, using slurm:
> 1. Request an interactive session: > srun -N 1 -n 20 --pty
> --partition=debug --time=1:00:00 --gres=gpu:1 bash
> 2. Load CUDA library: module load cuda
> 3. Run test batch. This starts with a CPU-only static EM, which, despite
> the mdrun variables, runs on a single thread. Any help will be highly
> appreciated.
>
>  md.log below:
>
> GROMACS:  gmx mdrun, version 2019.1
> Executable:   /home/reida/ppc64le/stow/gromacs/bin/gmx
> Data prefix:  /home/reida/ppc64le/stow/gromacs
> Working dir:  /home/smolyan/gmx_test1
> Process ID:   115831
> Command line:
>   gmx mdrun -pin on -pinstride 2 -ntomp 4 -ntmpi 4 -pme cpu -nb cpu -s
> em.tpr -o traj.trr -g md.log -c after_em.pdb
>
> GROMACS version:2019.1
> Precision:  single
> Memory model:   64 bit
> MPI library:thread_mpi
> OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> GPU support:CUDA
> SIMD instructions:  IBM_VSX
> FFT library:fftw-3.3.8
> RDTSCP usage:   disabled
> TNG support:enabled
> Hwloc support:  hwloc-1.11.8
> Tracing support:disabled
> C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc GNU 7.3.1
> C compiler flags:   -mcpu=power9 -mtune=power9  -mvsx -O2 -DNDEBUG
> -funroll-all-loops -fexcess-precision=fast
> C++ compiler:   /opt/rh/devtoolset-7/root/usr/bin/c++ GNU 7.3.1
> C++ compiler flags: -mcpu=power9 -mtune=power9  -mvsx-std=c++11   -O2
> -DNDEBUG -funroll-all-loops -fexcess-precision=fast
> CUDA compiler:  /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA (R) Cuda
> compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on
> Sat_Aug_25_21:10:00_CDT_2018;Cuda compilation tools, release 10.0,
> V10.0.130
> CUDA compiler
>
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;;
>
> -mcpu=power9;-mtune=power9;-mvsx;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> CUDA driver:10.10
> CUDA runtime:   10.0
>
>
> Running on 1 node with total 160 cores, 160 logical cores, 1 compatible GPU
> Hardware detected:
>   CPU info:
> Vendor: IBM
> Brand:  POWER9, altivec supported
> Family: 0   Model: 0   Stepping: 0
> Features: vmx vsx
>   Hardware topology: Only logical processor count
>   GPU info:
> Number of GPUs detected: 1
> #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> compatible
>
>
>  PLEASE READ AND CITE THE FOLLOWING REFERENCE 
>
> *SKIPPED*
>
> Input Parameters:
>integrator = steep
>tinit  = 0
>dt = 0.001
>nsteps = 5
>init-step  = 0
>simulation-part= 1
>comm-mode  = Linear
>nstcomm= 100
>bd-fric= 0
>ld-seed= 1941752878
>emtol  = 100
>emstep = 0.01
>niter  = 20
>fcstep = 0
>nstcgsteep = 1000
>nbfgscorr  = 10
>rtpi   = 0.05
>nstxout= 0
>nstvout= 0
>nstfout= 0
>nstlog = 1000
>nstcalcenergy  = 100
>nstenergy  = 1000
>nstxout-compressed = 0
>compressed-x-precision = 1000
>cutoff-scheme  = Verlet
>nstlist= 1
>ns-type= Grid
>pbc= xyz
>periodic-molecules = true
>verlet-buffer-tolerance= 0.005
>rlist  = 1.2
>coulombtype= PME
>coulomb-modifier   = Potential-shift
>rcoulomb-switch= 0
>rcoulomb   = 1.2
>epsilon-r  = 1
>epsilon-rf = inf
>vdw-type   = Cut-off
>vdw-modifier   = Potential-shift
>rvdw-switch= 0
>rvdw   = 1.2
>DispCorr   = No
>table-extension= 1
>fourierspacing = 0.12
>fourier-nx = 52
>fourier-ny = 52
>fourier-nz = 52
>pme-order  = 4
> 

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-09 Thread Alex
Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. The
test procedure is simple, using slurm:
1. Request an interactive session: > srun -N 1 -n 20 --pty
--partition=debug --time=1:00:00 --gres=gpu:1 bash
2. Load CUDA library: module load cuda
3. Run test batch. This starts with a CPU-only static EM, which, despite
the mdrun variables, runs on a single thread. Any help will be highly
appreciated.

 md.log below:

GROMACS:  gmx mdrun, version 2019.1
Executable:   /home/reida/ppc64le/stow/gromacs/bin/gmx
Data prefix:  /home/reida/ppc64le/stow/gromacs
Working dir:  /home/smolyan/gmx_test1
Process ID:   115831
Command line:
  gmx mdrun -pin on -pinstride 2 -ntomp 4 -ntmpi 4 -pme cpu -nb cpu -s
em.tpr -o traj.trr -g md.log -c after_em.pdb

GROMACS version:2019.1
Precision:  single
Memory model:   64 bit
MPI library:thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:CUDA
SIMD instructions:  IBM_VSX
FFT library:fftw-3.3.8
RDTSCP usage:   disabled
TNG support:enabled
Hwloc support:  hwloc-1.11.8
Tracing support:disabled
C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc GNU 7.3.1
C compiler flags:   -mcpu=power9 -mtune=power9  -mvsx -O2 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
C++ compiler:   /opt/rh/devtoolset-7/root/usr/bin/c++ GNU 7.3.1
C++ compiler flags: -mcpu=power9 -mtune=power9  -mvsx-std=c++11   -O2
-DNDEBUG -funroll-all-loops -fexcess-precision=fast
CUDA compiler:  /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA (R) Cuda
compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on
Sat_Aug_25_21:10:00_CDT_2018;Cuda compilation tools, release 10.0, V10.0.130
CUDA compiler
flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;;
-mcpu=power9;-mtune=power9;-mvsx;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:10.10
CUDA runtime:   10.0


Running on 1 node with total 160 cores, 160 logical cores, 1 compatible GPU
Hardware detected:
  CPU info:
Vendor: IBM
Brand:  POWER9, altivec supported
Family: 0   Model: 0   Stepping: 0
Features: vmx vsx
  Hardware topology: Only logical processor count
  GPU info:
Number of GPUs detected: 1
#0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
compatible


 PLEASE READ AND CITE THE FOLLOWING REFERENCE 

*SKIPPED*

Input Parameters:
   integrator = steep
   tinit  = 0
   dt = 0.001
   nsteps = 5
   init-step  = 0
   simulation-part= 1
   comm-mode  = Linear
   nstcomm= 100
   bd-fric= 0
   ld-seed= 1941752878
   emtol  = 100
   emstep = 0.01
   niter  = 20
   fcstep = 0
   nstcgsteep = 1000
   nbfgscorr  = 10
   rtpi   = 0.05
   nstxout= 0
   nstvout= 0
   nstfout= 0
   nstlog = 1000
   nstcalcenergy  = 100
   nstenergy  = 1000
   nstxout-compressed = 0
   compressed-x-precision = 1000
   cutoff-scheme  = Verlet
   nstlist= 1
   ns-type= Grid
   pbc= xyz
   periodic-molecules = true
   verlet-buffer-tolerance= 0.005
   rlist  = 1.2
   coulombtype= PME
   coulomb-modifier   = Potential-shift
   rcoulomb-switch= 0
   rcoulomb   = 1.2
   epsilon-r  = 1
   epsilon-rf = inf
   vdw-type   = Cut-off
   vdw-modifier   = Potential-shift
   rvdw-switch= 0
   rvdw   = 1.2
   DispCorr   = No
   table-extension= 1
   fourierspacing = 0.12
   fourier-nx = 52
   fourier-ny = 52
   fourier-nz = 52
   pme-order  = 4
   ewald-rtol = 1e-05
   ewald-rtol-lj  = 0.001
   lj-pme-comb-rule   = Geometric
   ewald-geometry = 0
   epsilon-surface= 0
   tcoupl = No
   nsttcouple   

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-02 Thread Szilárd Páll
Power9 (for HPC) is 4-way SMT, so make sure to try 1,2, and 4 threads per
core (stride 4, 2, and 1 respectively). Especially if you are offloading
all force computing to the GPU, what remains on the couch may not be able
to benefit from more than 1-2 threads per core.


--
Szilárd

On Thu, May 2, 2019, 01:19 Alex  wrote:

> Well, unless something important has changed within a year, I distinctly
> remember being advised here not to offload anything to GPU for EM. Not
> that we ever needed to, to be honest...
>
> In any case, we appear to be dealing with build issues here.
>
> Alex
>
> On 5/1/2019 5:09 PM, Kevin Boyd wrote:
> > Hi,
> >
> >> Of course, i am not. This is the EM. ;)
> > I haven't looked back at the code, but IIRC EM can use GPUs for the
> > nonbondeds, just not the PME. I just double-checked on one of my systems
> > with 10 cores and a GTX 1080 Ti, offloading to the GPU more than doubled
> > the minimization speed.
> >
> > Kevin
> >
> > On Wed, May 1, 2019 at 6:33 PM Alex  wrote:
> >
> >> Of course, i am not. This is the EM. ;)
> >>
> >> On Wed, May 1, 2019, 4:30 PM Kevin Boyd  wrote:
> >>
> >>> Hi,
> >>>
> >>> In addition to what Mark said (and I've also found pinning to be
> critical
> >>> for performance), you're also not using the GPUs with "-pme cpu -nb
> cpu".
> >>>
> >>> Kevin
> >>>
> >>> On Wed, May 1, 2019 at 5:56 PM Alex  wrote:
> >>>
>  Well, my experience so far has been with the EM, because the rest of
> >> the
>  script (with all the dynamic things) needed that to finish. And it
>  "finished" by hitting the wall. However, your comment does touch upon
> >>> what
>  to do with thread pinning and I will try to set '-pin on' throughout
> to
> >>> see
>  if things make a difference for the better. I am less confident about
>  setting strides because it is unclear what the job manager provides in
>  terms of the available core numbers. I will play around some more and
>  report here.
> 
>  Thanks!
> 
>  Alex
> 
>  On Wed, May 1, 2019 at 3:49 PM Mark Abraham  >
>  wrote:
> 
> > Hi,
> >
> > As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus
> >> fairly
> > insensitive to the compiler's vectorisation abilities. GCC is the
> >> only
> > compiler we've tested, as xlc can't compile simple C++11. As
> >>> everywhere,
> > you should use the latest version of gcc, as IBM spent quite some
> >> years
> > landing improvements for POWER9.
> >
> > EM is useless as a performance indicator of a dynamical simulation,
> >>> avoid
> > that - it runs serial code much much more often.
> >
> > Your run deliberately didn't fill the available cores, so just like
> >> on
>  x86,
> > mdrun will leave the thread affinity handling to the environment,
> >> which
>  is
> > often a path to bad performance. So, if you plan on doing that often,
> > you'll want to check out the mdrun performance guide docs about the
> >>> mdrun
> > -pin and related options.
> >
> > Mark
> >
> >
> > On Wed., 1 May 2019, 23:21 Alex,  wrote:
> >
> >> Hi all,
> >>
> >> Our institution decided to be all fancy, so now we have a bunch of
>  Power9
> >> nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by
> >> slurm.
> > Today
> >> I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu')
> >>> and
> > the
> >> performance is abysmal, I would guess 100 times slower than on
> >>> anything
> >> I've ever seen before.
> >>
> >> Our admin person emailed me the following:
> >> "-- it would not surprise me if the GCC compilers were relatively
> >> bad
>  at
> >> taking advantage of POWER9 vectorization, they're likely optimized
> >>> for
> >> x86_64 vector stuff like SSE and AVX operations.  This was an issue
> >>> in
> > the
> >> build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but
> >> according
> >>> to
> > my
> >> notes, that was part of an attempt to fix the "unimplemented SIMD"
>  error
> >> that was dogging me at first, and/but which was eventually cleared
> >> by
> >> switching to gcc-6."
> >>
> >> Does anyone have any comments/suggestions on building and running
> >> GMX
>  on
> >> Power9?
> >>
> >> Thank you,
> >>
> >> Alex
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> >>
> >>
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=zejDS0OvUCl%2BSch%2BzVtxic%2B%2BDFIPEhB1DygmpmQ2dvw%3Dreserved=0
>  before
> >> posting!
> >>
> >> * Can't post? Read
> >>
> 

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Alex
Well, unless something important has changed within a year, I distinctly 
remember being advised here not to offload anything to GPU for EM. Not 
that we ever needed to, to be honest...


In any case, we appear to be dealing with build issues here.

Alex

On 5/1/2019 5:09 PM, Kevin Boyd wrote:

Hi,


Of course, i am not. This is the EM. ;)

I haven't looked back at the code, but IIRC EM can use GPUs for the
nonbondeds, just not the PME. I just double-checked on one of my systems
with 10 cores and a GTX 1080 Ti, offloading to the GPU more than doubled
the minimization speed.

Kevin

On Wed, May 1, 2019 at 6:33 PM Alex  wrote:


Of course, i am not. This is the EM. ;)

On Wed, May 1, 2019, 4:30 PM Kevin Boyd  wrote:


Hi,

In addition to what Mark said (and I've also found pinning to be critical
for performance), you're also not using the GPUs with "-pme cpu -nb cpu".

Kevin

On Wed, May 1, 2019 at 5:56 PM Alex  wrote:


Well, my experience so far has been with the EM, because the rest of

the

script (with all the dynamic things) needed that to finish. And it
"finished" by hitting the wall. However, your comment does touch upon

what

to do with thread pinning and I will try to set '-pin on' throughout to

see

if things make a difference for the better. I am less confident about
setting strides because it is unclear what the job manager provides in
terms of the available core numbers. I will play around some more and
report here.

Thanks!

Alex

On Wed, May 1, 2019 at 3:49 PM Mark Abraham 
wrote:


Hi,

As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus

fairly

insensitive to the compiler's vectorisation abilities. GCC is the

only

compiler we've tested, as xlc can't compile simple C++11. As

everywhere,

you should use the latest version of gcc, as IBM spent quite some

years

landing improvements for POWER9.

EM is useless as a performance indicator of a dynamical simulation,

avoid

that - it runs serial code much much more often.

Your run deliberately didn't fill the available cores, so just like

on

x86,

mdrun will leave the thread affinity handling to the environment,

which

is

often a path to bad performance. So, if you plan on doing that often,
you'll want to check out the mdrun performance guide docs about the

mdrun

-pin and related options.

Mark


On Wed., 1 May 2019, 23:21 Alex,  wrote:


Hi all,

Our institution decided to be all fancy, so now we have a bunch of

Power9

nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by

slurm.

Today

I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu')

and

the

performance is abysmal, I would guess 100 times slower than on

anything

I've ever seen before.

Our admin person emailed me the following:
"-- it would not surprise me if the GCC compilers were relatively

bad

at

taking advantage of POWER9 vectorization, they're likely optimized

for

x86_64 vector stuff like SSE and AVX operations.  This was an issue

in

the

build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but

according

to

my

notes, that was part of an attempt to fix the "unimplemented SIMD"

error

that was dogging me at first, and/but which was eventually cleared

by

switching to gcc-6."

Does anyone have any comments/suggestions on building and running

GMX

on

Power9?

Thank you,

Alex
--
Gromacs Users mailing list

* Please search the archive at


https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=zejDS0OvUCl%2BSch%2BzVtxic%2B%2BDFIPEhB1DygmpmQ2dvw%3Dreserved=0

before

posting!

* Can't post? Read

https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Listsdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=X87Kk%2FtkodePJ9uhDb9XPIA0Xhaqi52e6Z9%2FhqY35fo%3Dreserved=0

* For (un)subscribe requests visit


https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-usersdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=mJ%2FfYvTgmL49ZCAUYzSRJqz%2FJY8MxQdGpoYwKtbN39U%3Dreserved=0

or

send a mail to gmx-users-requ...@gromacs.org.


--
Gromacs Users mailing list

* Please search the archive at


https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=zejDS0OvUCl%2BSch%2BzVtxic%2B%2BDFIPEhB1DygmpmQ2dvw%3Dreserved=0

before

posting!

* Can't post? Read


Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Kevin Boyd
Hi,

>Of course, i am not. This is the EM. ;)

I haven't looked back at the code, but IIRC EM can use GPUs for the
nonbondeds, just not the PME. I just double-checked on one of my systems
with 10 cores and a GTX 1080 Ti, offloading to the GPU more than doubled
the minimization speed.

Kevin

On Wed, May 1, 2019 at 6:33 PM Alex  wrote:

> Of course, i am not. This is the EM. ;)
>
> On Wed, May 1, 2019, 4:30 PM Kevin Boyd  wrote:
>
> > Hi,
> >
> > In addition to what Mark said (and I've also found pinning to be critical
> > for performance), you're also not using the GPUs with "-pme cpu -nb cpu".
> >
> > Kevin
> >
> > On Wed, May 1, 2019 at 5:56 PM Alex  wrote:
> >
> > > Well, my experience so far has been with the EM, because the rest of
> the
> > > script (with all the dynamic things) needed that to finish. And it
> > > "finished" by hitting the wall. However, your comment does touch upon
> > what
> > > to do with thread pinning and I will try to set '-pin on' throughout to
> > see
> > > if things make a difference for the better. I am less confident about
> > > setting strides because it is unclear what the job manager provides in
> > > terms of the available core numbers. I will play around some more and
> > > report here.
> > >
> > > Thanks!
> > >
> > > Alex
> > >
> > > On Wed, May 1, 2019 at 3:49 PM Mark Abraham 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus
> fairly
> > > > insensitive to the compiler's vectorisation abilities. GCC is the
> only
> > > > compiler we've tested, as xlc can't compile simple C++11. As
> > everywhere,
> > > > you should use the latest version of gcc, as IBM spent quite some
> years
> > > > landing improvements for POWER9.
> > > >
> > > > EM is useless as a performance indicator of a dynamical simulation,
> > avoid
> > > > that - it runs serial code much much more often.
> > > >
> > > > Your run deliberately didn't fill the available cores, so just like
> on
> > > x86,
> > > > mdrun will leave the thread affinity handling to the environment,
> which
> > > is
> > > > often a path to bad performance. So, if you plan on doing that often,
> > > > you'll want to check out the mdrun performance guide docs about the
> > mdrun
> > > > -pin and related options.
> > > >
> > > > Mark
> > > >
> > > >
> > > > On Wed., 1 May 2019, 23:21 Alex,  wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Our institution decided to be all fancy, so now we have a bunch of
> > > Power9
> > > > > nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by
> slurm.
> > > > Today
> > > > > I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu')
> > and
> > > > the
> > > > > performance is abysmal, I would guess 100 times slower than on
> > anything
> > > > > I've ever seen before.
> > > > >
> > > > > Our admin person emailed me the following:
> > > > > "-- it would not surprise me if the GCC compilers were relatively
> bad
> > > at
> > > > > taking advantage of POWER9 vectorization, they're likely optimized
> > for
> > > > > x86_64 vector stuff like SSE and AVX operations.  This was an issue
> > in
> > > > the
> > > > > build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but
> according
> > to
> > > > my
> > > > > notes, that was part of an attempt to fix the "unimplemented SIMD"
> > > error
> > > > > that was dogging me at first, and/but which was eventually cleared
> by
> > > > > switching to gcc-6."
> > > > >
> > > > > Does anyone have any comments/suggestions on building and running
> GMX
> > > on
> > > > > Power9?
> > > > >
> > > > > Thank you,
> > > > >
> > > > > Alex
> > > > > --
> > > > > Gromacs Users mailing list
> > > > >
> > > > > * Please search the archive at
> > > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=zejDS0OvUCl%2BSch%2BzVtxic%2B%2BDFIPEhB1DygmpmQ2dvw%3Dreserved=0
> > > before
> > > > > posting!
> > > > >
> > > > > * Can't post? Read
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Listsdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=X87Kk%2FtkodePJ9uhDb9XPIA0Xhaqi52e6Z9%2FhqY35fo%3Dreserved=0
> > > > >
> > > > > * For (un)subscribe requests visit
> > > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-usersdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=mJ%2FfYvTgmL49ZCAUYzSRJqz%2FJY8MxQdGpoYwKtbN39U%3Dreserved=0
> > > or
> > > > > send a mail to gmx-users-requ...@gromacs.org.
> > > > >
> > > > --
> > > > Gromacs Users mailing list

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Alex
Of course, i am not. This is the EM. ;)

On Wed, May 1, 2019, 4:30 PM Kevin Boyd  wrote:

> Hi,
>
> In addition to what Mark said (and I've also found pinning to be critical
> for performance), you're also not using the GPUs with "-pme cpu -nb cpu".
>
> Kevin
>
> On Wed, May 1, 2019 at 5:56 PM Alex  wrote:
>
> > Well, my experience so far has been with the EM, because the rest of the
> > script (with all the dynamic things) needed that to finish. And it
> > "finished" by hitting the wall. However, your comment does touch upon
> what
> > to do with thread pinning and I will try to set '-pin on' throughout to
> see
> > if things make a difference for the better. I am less confident about
> > setting strides because it is unclear what the job manager provides in
> > terms of the available core numbers. I will play around some more and
> > report here.
> >
> > Thanks!
> >
> > Alex
> >
> > On Wed, May 1, 2019 at 3:49 PM Mark Abraham 
> > wrote:
> >
> > > Hi,
> > >
> > > As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus fairly
> > > insensitive to the compiler's vectorisation abilities. GCC is the only
> > > compiler we've tested, as xlc can't compile simple C++11. As
> everywhere,
> > > you should use the latest version of gcc, as IBM spent quite some years
> > > landing improvements for POWER9.
> > >
> > > EM is useless as a performance indicator of a dynamical simulation,
> avoid
> > > that - it runs serial code much much more often.
> > >
> > > Your run deliberately didn't fill the available cores, so just like on
> > x86,
> > > mdrun will leave the thread affinity handling to the environment, which
> > is
> > > often a path to bad performance. So, if you plan on doing that often,
> > > you'll want to check out the mdrun performance guide docs about the
> mdrun
> > > -pin and related options.
> > >
> > > Mark
> > >
> > >
> > > On Wed., 1 May 2019, 23:21 Alex,  wrote:
> > >
> > > > Hi all,
> > > >
> > > > Our institution decided to be all fancy, so now we have a bunch of
> > Power9
> > > > nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm.
> > > Today
> > > > I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu')
> and
> > > the
> > > > performance is abysmal, I would guess 100 times slower than on
> anything
> > > > I've ever seen before.
> > > >
> > > > Our admin person emailed me the following:
> > > > "-- it would not surprise me if the GCC compilers were relatively bad
> > at
> > > > taking advantage of POWER9 vectorization, they're likely optimized
> for
> > > > x86_64 vector stuff like SSE and AVX operations.  This was an issue
> in
> > > the
> > > > build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but according
> to
> > > my
> > > > notes, that was part of an attempt to fix the "unimplemented SIMD"
> > error
> > > > that was dogging me at first, and/but which was eventually cleared by
> > > > switching to gcc-6."
> > > >
> > > > Does anyone have any comments/suggestions on building and running GMX
> > on
> > > > Power9?
> > > >
> > > > Thank you,
> > > >
> > > > Alex
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=hInjXVJw1xyIo23W3Q%2Fnt5UlXy%2Bx5mok7re4cpCopG8%3Dreserved=0
> > before
> > > > posting!
> > > >
> > > > * Can't post? Read
> >
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Listsdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=iiux5GTZD%2F7xh56kyGi%2BCImX55GOgP9gdi1Bx6lUEOM%3Dreserved=0
> > > >
> > > > * For (un)subscribe requests visit
> > > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-usersdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=fA46t2G3%2FRErO9ephu1d2QoOcWoLadgzG6DkhSG9Los%3Dreserved=0
> > or
> > > > send a mail to gmx-users-requ...@gromacs.org.
> > > >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=hInjXVJw1xyIo23W3Q%2Fnt5UlXy%2Bx5mok7re4cpCopG8%3Dreserved=0
> > before
> > > posting!
> > >
> > > * Can't post? Read
> >
> 

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Kevin Boyd
Hi,

In addition to what Mark said (and I've also found pinning to be critical
for performance), you're also not using the GPUs with "-pme cpu -nb cpu".

Kevin

On Wed, May 1, 2019 at 5:56 PM Alex  wrote:

> Well, my experience so far has been with the EM, because the rest of the
> script (with all the dynamic things) needed that to finish. And it
> "finished" by hitting the wall. However, your comment does touch upon what
> to do with thread pinning and I will try to set '-pin on' throughout to see
> if things make a difference for the better. I am less confident about
> setting strides because it is unclear what the job manager provides in
> terms of the available core numbers. I will play around some more and
> report here.
>
> Thanks!
>
> Alex
>
> On Wed, May 1, 2019 at 3:49 PM Mark Abraham 
> wrote:
>
> > Hi,
> >
> > As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus fairly
> > insensitive to the compiler's vectorisation abilities. GCC is the only
> > compiler we've tested, as xlc can't compile simple C++11. As everywhere,
> > you should use the latest version of gcc, as IBM spent quite some years
> > landing improvements for POWER9.
> >
> > EM is useless as a performance indicator of a dynamical simulation, avoid
> > that - it runs serial code much much more often.
> >
> > Your run deliberately didn't fill the available cores, so just like on
> x86,
> > mdrun will leave the thread affinity handling to the environment, which
> is
> > often a path to bad performance. So, if you plan on doing that often,
> > you'll want to check out the mdrun performance guide docs about the mdrun
> > -pin and related options.
> >
> > Mark
> >
> >
> > On Wed., 1 May 2019, 23:21 Alex,  wrote:
> >
> > > Hi all,
> > >
> > > Our institution decided to be all fancy, so now we have a bunch of
> Power9
> > > nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm.
> > Today
> > > I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') and
> > the
> > > performance is abysmal, I would guess 100 times slower than on anything
> > > I've ever seen before.
> > >
> > > Our admin person emailed me the following:
> > > "-- it would not surprise me if the GCC compilers were relatively bad
> at
> > > taking advantage of POWER9 vectorization, they're likely optimized for
> > > x86_64 vector stuff like SSE and AVX operations.  This was an issue in
> > the
> > > build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but according to
> > my
> > > notes, that was part of an attempt to fix the "unimplemented SIMD"
> error
> > > that was dogging me at first, and/but which was eventually cleared by
> > > switching to gcc-6."
> > >
> > > Does anyone have any comments/suggestions on building and running GMX
> on
> > > Power9?
> > >
> > > Thank you,
> > >
> > > Alex
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > >
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=hInjXVJw1xyIo23W3Q%2Fnt5UlXy%2Bx5mok7re4cpCopG8%3Dreserved=0
> before
> > > posting!
> > >
> > > * Can't post? Read
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Listsdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=iiux5GTZD%2F7xh56kyGi%2BCImX55GOgP9gdi1Bx6lUEOM%3Dreserved=0
> > >
> > > * For (un)subscribe requests visit
> > >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-usersdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=fA46t2G3%2FRErO9ephu1d2QoOcWoLadgzG6DkhSG9Los%3Dreserved=0
> or
> > > send a mail to gmx-users-requ...@gromacs.org.
> > >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> >
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=hInjXVJw1xyIo23W3Q%2Fnt5UlXy%2Bx5mok7re4cpCopG8%3Dreserved=0
> before
> > posting!
> >
> > * Can't post? Read
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Listsdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=iiux5GTZD%2F7xh56kyGi%2BCImX55GOgP9gdi1Bx6lUEOM%3Dreserved=0
> >
> > * For (un)subscribe requests visit
> >
> 

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Alex
Well, my experience so far has been with the EM, because the rest of the
script (with all the dynamic things) needed that to finish. And it
"finished" by hitting the wall. However, your comment does touch upon what
to do with thread pinning and I will try to set '-pin on' throughout to see
if things make a difference for the better. I am less confident about
setting strides because it is unclear what the job manager provides in
terms of the available core numbers. I will play around some more and
report here.

Thanks!

Alex

On Wed, May 1, 2019 at 3:49 PM Mark Abraham 
wrote:

> Hi,
>
> As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus fairly
> insensitive to the compiler's vectorisation abilities. GCC is the only
> compiler we've tested, as xlc can't compile simple C++11. As everywhere,
> you should use the latest version of gcc, as IBM spent quite some years
> landing improvements for POWER9.
>
> EM is useless as a performance indicator of a dynamical simulation, avoid
> that - it runs serial code much much more often.
>
> Your run deliberately didn't fill the available cores, so just like on x86,
> mdrun will leave the thread affinity handling to the environment, which is
> often a path to bad performance. So, if you plan on doing that often,
> you'll want to check out the mdrun performance guide docs about the mdrun
> -pin and related options.
>
> Mark
>
>
> On Wed., 1 May 2019, 23:21 Alex,  wrote:
>
> > Hi all,
> >
> > Our institution decided to be all fancy, so now we have a bunch of Power9
> > nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm.
> Today
> > I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') and
> the
> > performance is abysmal, I would guess 100 times slower than on anything
> > I've ever seen before.
> >
> > Our admin person emailed me the following:
> > "-- it would not surprise me if the GCC compilers were relatively bad at
> > taking advantage of POWER9 vectorization, they're likely optimized for
> > x86_64 vector stuff like SSE and AVX operations.  This was an issue in
> the
> > build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but according to
> my
> > notes, that was part of an attempt to fix the "unimplemented SIMD" error
> > that was dogging me at first, and/but which was eventually cleared by
> > switching to gcc-6."
> >
> > Does anyone have any comments/suggestions on building and running GMX on
> > Power9?
> >
> > Thank you,
> >
> > Alex
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-requ...@gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Mark Abraham
Hi,

As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus fairly
insensitive to the compiler's vectorisation abilities. GCC is the only
compiler we've tested, as xlc can't compile simple C++11. As everywhere,
you should use the latest version of gcc, as IBM spent quite some years
landing improvements for POWER9.

EM is useless as a performance indicator of a dynamical simulation, avoid
that - it runs serial code much much more often.

Your run deliberately didn't fill the available cores, so just like on x86,
mdrun will leave the thread affinity handling to the environment, which is
often a path to bad performance. So, if you plan on doing that often,
you'll want to check out the mdrun performance guide docs about the mdrun
-pin and related options.

Mark


On Wed., 1 May 2019, 23:21 Alex,  wrote:

> Hi all,
>
> Our institution decided to be all fancy, so now we have a bunch of Power9
> nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm. Today
> I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') and the
> performance is abysmal, I would guess 100 times slower than on anything
> I've ever seen before.
>
> Our admin person emailed me the following:
> "-- it would not surprise me if the GCC compilers were relatively bad at
> taking advantage of POWER9 vectorization, they're likely optimized for
> x86_64 vector stuff like SSE and AVX operations.  This was an issue in the
> build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but according to my
> notes, that was part of an attempt to fix the "unimplemented SIMD" error
> that was dogging me at first, and/but which was eventually cleared by
> switching to gcc-6."
>
> Does anyone have any comments/suggestions on building and running GMX on
> Power9?
>
> Thank you,
>
> Alex
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


[gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Alex
Hi all,

Our institution decided to be all fancy, so now we have a bunch of Power9
nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm. Today
I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') and the
performance is abysmal, I would guess 100 times slower than on anything
I've ever seen before.

Our admin person emailed me the following:
"-- it would not surprise me if the GCC compilers were relatively bad at
taking advantage of POWER9 vectorization, they're likely optimized for
x86_64 vector stuff like SSE and AVX operations.  This was an issue in the
build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but according to my
notes, that was part of an attempt to fix the "unimplemented SIMD" error
that was dogging me at first, and/but which was eventually cleared by
switching to gcc-6."

Does anyone have any comments/suggestions on building and running GMX on
Power9?

Thank you,

Alex
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.