Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
Yup, your assessment agrees with our guess. Our HPC guru will be taking his findings, along with your quote, to the admins. Thank you, Alex On Thu, May 9, 2019 at 2:51 PM Szilárd Páll wrote: > On Thu, May 9, 2019 at 10:01 PM Alex wrote: > > > Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. > The > > test procedure is simple, using slurm: > > 1. Request an interactive session: > srun -N 1 -n 20 --pty > > --partition=debug --time=1:00:00 --gres=gpu:1 bash > > 2. Load CUDA library: module load cuda > > 3. Run test batch. This starts with a CPU-only static EM, which, despite > > the mdrun variables, runs on a single thread. Any help will be highly > > appreciated. > > > > md.log below: > > > > GROMACS: gmx mdrun, version 2019.1 > > Executable: /home/reida/ppc64le/stow/gromacs/bin/gmx > > Data prefix: /home/reida/ppc64le/stow/gromacs > > Working dir: /home/smolyan/gmx_test1 > > Process ID: 115831 > > Command line: > > gmx mdrun -pin on -pinstride 2 -ntomp 4 -ntmpi 4 -pme cpu -nb cpu -s > > em.tpr -o traj.trr -g md.log -c after_em.pdb > > > > GROMACS version:2019.1 > > Precision: single > > Memory model: 64 bit > > MPI library:thread_mpi > > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) > > GPU support:CUDA > > SIMD instructions: IBM_VSX > > FFT library:fftw-3.3.8 > > RDTSCP usage: disabled > > TNG support:enabled > > Hwloc support: hwloc-1.11.8 > > Tracing support:disabled > > C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc GNU 7.3.1 > > C compiler flags: -mcpu=power9 -mtune=power9 -mvsx -O2 -DNDEBUG > > -funroll-all-loops -fexcess-precision=fast > > C++ compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ GNU 7.3.1 > > C++ compiler flags: -mcpu=power9 -mtune=power9 -mvsx-std=c++11 -O2 > > -DNDEBUG -funroll-all-loops -fexcess-precision=fast > > CUDA compiler: /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA (R) Cuda > > compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on > > Sat_Aug_25_21:10:00_CDT_2018;Cuda compilation tools, release 10.0, > > V10.0.130 > > CUDA compiler > > > > > flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; > > > > > -mcpu=power9;-mtune=power9;-mvsx;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast; > > CUDA driver:10.10 > > CUDA runtime: 10.0 > > > > > > Running on 1 node with total 160 cores, 160 logical cores, 1 compatible > GPU > > Hardware detected: > > CPU info: > > Vendor: IBM > > Brand: POWER9, altivec supported > > Family: 0 Model: 0 Stepping: 0 > > Features: vmx vsx > > Hardware topology: Only logical processor count > > GPU info: > > Number of GPUs detected: 1 > > #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: > > compatible > > > > > > PLEASE READ AND CITE THE FOLLOWING REFERENCE > > > > *SKIPPED* > > > > Input Parameters: > >integrator = steep > >tinit = 0 > >dt = 0.001 > >nsteps = 5 > >init-step = 0 > >simulation-part= 1 > >comm-mode = Linear > >nstcomm= 100 > >bd-fric= 0 > >ld-seed= 1941752878 > >emtol = 100 > >emstep = 0.01 > >niter = 20 > >fcstep = 0 > >nstcgsteep = 1000 > >nbfgscorr = 10 > >rtpi = 0.05 > >nstxout= 0 > >nstvout= 0 > >nstfout= 0 > >nstlog = 1000 > >nstcalcenergy = 100 > >nstenergy = 1000 > >nstxout-compressed = 0 > >compressed-x-precision = 1000 > >cutoff-scheme = Verlet > >nstlist= 1 > >ns-type= Grid > >pbc= xyz > >periodic-molecules = true > >verlet-buffer-tolerance= 0.005 > >rlist = 1.2 > >coulombtype= PME > >coulomb-modifier = Potential-shift > >rcoulomb-switch= 0 > >rcoulomb = 1.2 > >epsilon-r = 1 > >epsilon-rf = inf > >vdw-type = Cut-off
Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
On Thu, May 9, 2019 at 10:01 PM Alex wrote: > Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. The > test procedure is simple, using slurm: > 1. Request an interactive session: > srun -N 1 -n 20 --pty > --partition=debug --time=1:00:00 --gres=gpu:1 bash > 2. Load CUDA library: module load cuda > 3. Run test batch. This starts with a CPU-only static EM, which, despite > the mdrun variables, runs on a single thread. Any help will be highly > appreciated. > > md.log below: > > GROMACS: gmx mdrun, version 2019.1 > Executable: /home/reida/ppc64le/stow/gromacs/bin/gmx > Data prefix: /home/reida/ppc64le/stow/gromacs > Working dir: /home/smolyan/gmx_test1 > Process ID: 115831 > Command line: > gmx mdrun -pin on -pinstride 2 -ntomp 4 -ntmpi 4 -pme cpu -nb cpu -s > em.tpr -o traj.trr -g md.log -c after_em.pdb > > GROMACS version:2019.1 > Precision: single > Memory model: 64 bit > MPI library:thread_mpi > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) > GPU support:CUDA > SIMD instructions: IBM_VSX > FFT library:fftw-3.3.8 > RDTSCP usage: disabled > TNG support:enabled > Hwloc support: hwloc-1.11.8 > Tracing support:disabled > C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc GNU 7.3.1 > C compiler flags: -mcpu=power9 -mtune=power9 -mvsx -O2 -DNDEBUG > -funroll-all-loops -fexcess-precision=fast > C++ compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ GNU 7.3.1 > C++ compiler flags: -mcpu=power9 -mtune=power9 -mvsx-std=c++11 -O2 > -DNDEBUG -funroll-all-loops -fexcess-precision=fast > CUDA compiler: /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA (R) Cuda > compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on > Sat_Aug_25_21:10:00_CDT_2018;Cuda compilation tools, release 10.0, > V10.0.130 > CUDA compiler > > flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; > > -mcpu=power9;-mtune=power9;-mvsx;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast; > CUDA driver:10.10 > CUDA runtime: 10.0 > > > Running on 1 node with total 160 cores, 160 logical cores, 1 compatible GPU > Hardware detected: > CPU info: > Vendor: IBM > Brand: POWER9, altivec supported > Family: 0 Model: 0 Stepping: 0 > Features: vmx vsx > Hardware topology: Only logical processor count > GPU info: > Number of GPUs detected: 1 > #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: > compatible > > > PLEASE READ AND CITE THE FOLLOWING REFERENCE > > *SKIPPED* > > Input Parameters: >integrator = steep >tinit = 0 >dt = 0.001 >nsteps = 5 >init-step = 0 >simulation-part= 1 >comm-mode = Linear >nstcomm= 100 >bd-fric= 0 >ld-seed= 1941752878 >emtol = 100 >emstep = 0.01 >niter = 20 >fcstep = 0 >nstcgsteep = 1000 >nbfgscorr = 10 >rtpi = 0.05 >nstxout= 0 >nstvout= 0 >nstfout= 0 >nstlog = 1000 >nstcalcenergy = 100 >nstenergy = 1000 >nstxout-compressed = 0 >compressed-x-precision = 1000 >cutoff-scheme = Verlet >nstlist= 1 >ns-type= Grid >pbc= xyz >periodic-molecules = true >verlet-buffer-tolerance= 0.005 >rlist = 1.2 >coulombtype= PME >coulomb-modifier = Potential-shift >rcoulomb-switch= 0 >rcoulomb = 1.2 >epsilon-r = 1 >epsilon-rf = inf >vdw-type = Cut-off >vdw-modifier = Potential-shift >rvdw-switch= 0 >rvdw = 1.2 >DispCorr = No >table-extension= 1 >fourierspacing = 0.12 >fourier-nx = 52 >fourier-ny = 52 >fourier-nz = 52 >pme-order = 4 >
Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. The test procedure is simple, using slurm: 1. Request an interactive session: > srun -N 1 -n 20 --pty --partition=debug --time=1:00:00 --gres=gpu:1 bash 2. Load CUDA library: module load cuda 3. Run test batch. This starts with a CPU-only static EM, which, despite the mdrun variables, runs on a single thread. Any help will be highly appreciated. md.log below: GROMACS: gmx mdrun, version 2019.1 Executable: /home/reida/ppc64le/stow/gromacs/bin/gmx Data prefix: /home/reida/ppc64le/stow/gromacs Working dir: /home/smolyan/gmx_test1 Process ID: 115831 Command line: gmx mdrun -pin on -pinstride 2 -ntomp 4 -ntmpi 4 -pme cpu -nb cpu -s em.tpr -o traj.trr -g md.log -c after_em.pdb GROMACS version:2019.1 Precision: single Memory model: 64 bit MPI library:thread_mpi OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) GPU support:CUDA SIMD instructions: IBM_VSX FFT library:fftw-3.3.8 RDTSCP usage: disabled TNG support:enabled Hwloc support: hwloc-1.11.8 Tracing support:disabled C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc GNU 7.3.1 C compiler flags: -mcpu=power9 -mtune=power9 -mvsx -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast C++ compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ GNU 7.3.1 C++ compiler flags: -mcpu=power9 -mtune=power9 -mvsx-std=c++11 -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast CUDA compiler: /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on Sat_Aug_25_21:10:00_CDT_2018;Cuda compilation tools, release 10.0, V10.0.130 CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; -mcpu=power9;-mtune=power9;-mvsx;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast; CUDA driver:10.10 CUDA runtime: 10.0 Running on 1 node with total 160 cores, 160 logical cores, 1 compatible GPU Hardware detected: CPU info: Vendor: IBM Brand: POWER9, altivec supported Family: 0 Model: 0 Stepping: 0 Features: vmx vsx Hardware topology: Only logical processor count GPU info: Number of GPUs detected: 1 #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: compatible PLEASE READ AND CITE THE FOLLOWING REFERENCE *SKIPPED* Input Parameters: integrator = steep tinit = 0 dt = 0.001 nsteps = 5 init-step = 0 simulation-part= 1 comm-mode = Linear nstcomm= 100 bd-fric= 0 ld-seed= 1941752878 emtol = 100 emstep = 0.01 niter = 20 fcstep = 0 nstcgsteep = 1000 nbfgscorr = 10 rtpi = 0.05 nstxout= 0 nstvout= 0 nstfout= 0 nstlog = 1000 nstcalcenergy = 100 nstenergy = 1000 nstxout-compressed = 0 compressed-x-precision = 1000 cutoff-scheme = Verlet nstlist= 1 ns-type= Grid pbc= xyz periodic-molecules = true verlet-buffer-tolerance= 0.005 rlist = 1.2 coulombtype= PME coulomb-modifier = Potential-shift rcoulomb-switch= 0 rcoulomb = 1.2 epsilon-r = 1 epsilon-rf = inf vdw-type = Cut-off vdw-modifier = Potential-shift rvdw-switch= 0 rvdw = 1.2 DispCorr = No table-extension= 1 fourierspacing = 0.12 fourier-nx = 52 fourier-ny = 52 fourier-nz = 52 pme-order = 4 ewald-rtol = 1e-05 ewald-rtol-lj = 0.001 lj-pme-comb-rule = Geometric ewald-geometry = 0 epsilon-surface= 0 tcoupl = No nsttcouple
Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
Power9 (for HPC) is 4-way SMT, so make sure to try 1,2, and 4 threads per core (stride 4, 2, and 1 respectively). Especially if you are offloading all force computing to the GPU, what remains on the couch may not be able to benefit from more than 1-2 threads per core. -- Szilárd On Thu, May 2, 2019, 01:19 Alex wrote: > Well, unless something important has changed within a year, I distinctly > remember being advised here not to offload anything to GPU for EM. Not > that we ever needed to, to be honest... > > In any case, we appear to be dealing with build issues here. > > Alex > > On 5/1/2019 5:09 PM, Kevin Boyd wrote: > > Hi, > > > >> Of course, i am not. This is the EM. ;) > > I haven't looked back at the code, but IIRC EM can use GPUs for the > > nonbondeds, just not the PME. I just double-checked on one of my systems > > with 10 cores and a GTX 1080 Ti, offloading to the GPU more than doubled > > the minimization speed. > > > > Kevin > > > > On Wed, May 1, 2019 at 6:33 PM Alex wrote: > > > >> Of course, i am not. This is the EM. ;) > >> > >> On Wed, May 1, 2019, 4:30 PM Kevin Boyd wrote: > >> > >>> Hi, > >>> > >>> In addition to what Mark said (and I've also found pinning to be > critical > >>> for performance), you're also not using the GPUs with "-pme cpu -nb > cpu". > >>> > >>> Kevin > >>> > >>> On Wed, May 1, 2019 at 5:56 PM Alex wrote: > >>> > Well, my experience so far has been with the EM, because the rest of > >> the > script (with all the dynamic things) needed that to finish. And it > "finished" by hitting the wall. However, your comment does touch upon > >>> what > to do with thread pinning and I will try to set '-pin on' throughout > to > >>> see > if things make a difference for the better. I am less confident about > setting strides because it is unclear what the job manager provides in > terms of the available core numbers. I will play around some more and > report here. > > Thanks! > > Alex > > On Wed, May 1, 2019 at 3:49 PM Mark Abraham > > wrote: > > > Hi, > > > > As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus > >> fairly > > insensitive to the compiler's vectorisation abilities. GCC is the > >> only > > compiler we've tested, as xlc can't compile simple C++11. As > >>> everywhere, > > you should use the latest version of gcc, as IBM spent quite some > >> years > > landing improvements for POWER9. > > > > EM is useless as a performance indicator of a dynamical simulation, > >>> avoid > > that - it runs serial code much much more often. > > > > Your run deliberately didn't fill the available cores, so just like > >> on > x86, > > mdrun will leave the thread affinity handling to the environment, > >> which > is > > often a path to bad performance. So, if you plan on doing that often, > > you'll want to check out the mdrun performance guide docs about the > >>> mdrun > > -pin and related options. > > > > Mark > > > > > > On Wed., 1 May 2019, 23:21 Alex, wrote: > > > >> Hi all, > >> > >> Our institution decided to be all fancy, so now we have a bunch of > Power9 > >> nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by > >> slurm. > > Today > >> I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') > >>> and > > the > >> performance is abysmal, I would guess 100 times slower than on > >>> anything > >> I've ever seen before. > >> > >> Our admin person emailed me the following: > >> "-- it would not surprise me if the GCC compilers were relatively > >> bad > at > >> taking advantage of POWER9 vectorization, they're likely optimized > >>> for > >> x86_64 vector stuff like SSE and AVX operations. This was an issue > >>> in > > the > >> build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but > >> according > >>> to > > my > >> notes, that was part of an attempt to fix the "unimplemented SIMD" > error > >> that was dogging me at first, and/but which was eventually cleared > >> by > >> switching to gcc-6." > >> > >> Does anyone have any comments/suggestions on building and running > >> GMX > on > >> Power9? > >> > >> Thank you, > >> > >> Alex > >> -- > >> Gromacs Users mailing list > >> > >> * Please search the archive at > >> > >> > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=zejDS0OvUCl%2BSch%2BzVtxic%2B%2BDFIPEhB1DygmpmQ2dvw%3Dreserved=0 > before > >> posting! > >> > >> * Can't post? Read > >> >
Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
Well, unless something important has changed within a year, I distinctly remember being advised here not to offload anything to GPU for EM. Not that we ever needed to, to be honest... In any case, we appear to be dealing with build issues here. Alex On 5/1/2019 5:09 PM, Kevin Boyd wrote: Hi, Of course, i am not. This is the EM. ;) I haven't looked back at the code, but IIRC EM can use GPUs for the nonbondeds, just not the PME. I just double-checked on one of my systems with 10 cores and a GTX 1080 Ti, offloading to the GPU more than doubled the minimization speed. Kevin On Wed, May 1, 2019 at 6:33 PM Alex wrote: Of course, i am not. This is the EM. ;) On Wed, May 1, 2019, 4:30 PM Kevin Boyd wrote: Hi, In addition to what Mark said (and I've also found pinning to be critical for performance), you're also not using the GPUs with "-pme cpu -nb cpu". Kevin On Wed, May 1, 2019 at 5:56 PM Alex wrote: Well, my experience so far has been with the EM, because the rest of the script (with all the dynamic things) needed that to finish. And it "finished" by hitting the wall. However, your comment does touch upon what to do with thread pinning and I will try to set '-pin on' throughout to see if things make a difference for the better. I am less confident about setting strides because it is unclear what the job manager provides in terms of the available core numbers. I will play around some more and report here. Thanks! Alex On Wed, May 1, 2019 at 3:49 PM Mark Abraham wrote: Hi, As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus fairly insensitive to the compiler's vectorisation abilities. GCC is the only compiler we've tested, as xlc can't compile simple C++11. As everywhere, you should use the latest version of gcc, as IBM spent quite some years landing improvements for POWER9. EM is useless as a performance indicator of a dynamical simulation, avoid that - it runs serial code much much more often. Your run deliberately didn't fill the available cores, so just like on x86, mdrun will leave the thread affinity handling to the environment, which is often a path to bad performance. So, if you plan on doing that often, you'll want to check out the mdrun performance guide docs about the mdrun -pin and related options. Mark On Wed., 1 May 2019, 23:21 Alex, wrote: Hi all, Our institution decided to be all fancy, so now we have a bunch of Power9 nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm. Today I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') and the performance is abysmal, I would guess 100 times slower than on anything I've ever seen before. Our admin person emailed me the following: "-- it would not surprise me if the GCC compilers were relatively bad at taking advantage of POWER9 vectorization, they're likely optimized for x86_64 vector stuff like SSE and AVX operations. This was an issue in the build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but according to my notes, that was part of an attempt to fix the "unimplemented SIMD" error that was dogging me at first, and/but which was eventually cleared by switching to gcc-6." Does anyone have any comments/suggestions on building and running GMX on Power9? Thank you, Alex -- Gromacs Users mailing list * Please search the archive at https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=zejDS0OvUCl%2BSch%2BzVtxic%2B%2BDFIPEhB1DygmpmQ2dvw%3Dreserved=0 before posting! * Can't post? Read https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Listsdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=X87Kk%2FtkodePJ9uhDb9XPIA0Xhaqi52e6Z9%2FhqY35fo%3Dreserved=0 * For (un)subscribe requests visit https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-usersdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=mJ%2FfYvTgmL49ZCAUYzSRJqz%2FJY8MxQdGpoYwKtbN39U%3Dreserved=0 or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=zejDS0OvUCl%2BSch%2BzVtxic%2B%2BDFIPEhB1DygmpmQ2dvw%3Dreserved=0 before posting! * Can't post? Read
Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
Hi, >Of course, i am not. This is the EM. ;) I haven't looked back at the code, but IIRC EM can use GPUs for the nonbondeds, just not the PME. I just double-checked on one of my systems with 10 cores and a GTX 1080 Ti, offloading to the GPU more than doubled the minimization speed. Kevin On Wed, May 1, 2019 at 6:33 PM Alex wrote: > Of course, i am not. This is the EM. ;) > > On Wed, May 1, 2019, 4:30 PM Kevin Boyd wrote: > > > Hi, > > > > In addition to what Mark said (and I've also found pinning to be critical > > for performance), you're also not using the GPUs with "-pme cpu -nb cpu". > > > > Kevin > > > > On Wed, May 1, 2019 at 5:56 PM Alex wrote: > > > > > Well, my experience so far has been with the EM, because the rest of > the > > > script (with all the dynamic things) needed that to finish. And it > > > "finished" by hitting the wall. However, your comment does touch upon > > what > > > to do with thread pinning and I will try to set '-pin on' throughout to > > see > > > if things make a difference for the better. I am less confident about > > > setting strides because it is unclear what the job manager provides in > > > terms of the available core numbers. I will play around some more and > > > report here. > > > > > > Thanks! > > > > > > Alex > > > > > > On Wed, May 1, 2019 at 3:49 PM Mark Abraham > > > wrote: > > > > > > > Hi, > > > > > > > > As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus > fairly > > > > insensitive to the compiler's vectorisation abilities. GCC is the > only > > > > compiler we've tested, as xlc can't compile simple C++11. As > > everywhere, > > > > you should use the latest version of gcc, as IBM spent quite some > years > > > > landing improvements for POWER9. > > > > > > > > EM is useless as a performance indicator of a dynamical simulation, > > avoid > > > > that - it runs serial code much much more often. > > > > > > > > Your run deliberately didn't fill the available cores, so just like > on > > > x86, > > > > mdrun will leave the thread affinity handling to the environment, > which > > > is > > > > often a path to bad performance. So, if you plan on doing that often, > > > > you'll want to check out the mdrun performance guide docs about the > > mdrun > > > > -pin and related options. > > > > > > > > Mark > > > > > > > > > > > > On Wed., 1 May 2019, 23:21 Alex, wrote: > > > > > > > > > Hi all, > > > > > > > > > > Our institution decided to be all fancy, so now we have a bunch of > > > Power9 > > > > > nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by > slurm. > > > > Today > > > > > I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') > > and > > > > the > > > > > performance is abysmal, I would guess 100 times slower than on > > anything > > > > > I've ever seen before. > > > > > > > > > > Our admin person emailed me the following: > > > > > "-- it would not surprise me if the GCC compilers were relatively > bad > > > at > > > > > taking advantage of POWER9 vectorization, they're likely optimized > > for > > > > > x86_64 vector stuff like SSE and AVX operations. This was an issue > > in > > > > the > > > > > build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but > according > > to > > > > my > > > > > notes, that was part of an attempt to fix the "unimplemented SIMD" > > > error > > > > > that was dogging me at first, and/but which was eventually cleared > by > > > > > switching to gcc-6." > > > > > > > > > > Does anyone have any comments/suggestions on building and running > GMX > > > on > > > > > Power9? > > > > > > > > > > Thank you, > > > > > > > > > > Alex > > > > > -- > > > > > Gromacs Users mailing list > > > > > > > > > > * Please search the archive at > > > > > > > > > > > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=zejDS0OvUCl%2BSch%2BzVtxic%2B%2BDFIPEhB1DygmpmQ2dvw%3Dreserved=0 > > > before > > > > > posting! > > > > > > > > > > * Can't post? Read > > > > > > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Listsdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=X87Kk%2FtkodePJ9uhDb9XPIA0Xhaqi52e6Z9%2FhqY35fo%3Dreserved=0 > > > > > > > > > > * For (un)subscribe requests visit > > > > > > > > > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-usersdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C5ae99d654910469ebe9008d6ce8502d1%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923468018052656sdata=mJ%2FfYvTgmL49ZCAUYzSRJqz%2FJY8MxQdGpoYwKtbN39U%3Dreserved=0 > > > or > > > > > send a mail to gmx-users-requ...@gromacs.org. > > > > > > > > > -- > > > > Gromacs Users mailing list
Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
Of course, i am not. This is the EM. ;) On Wed, May 1, 2019, 4:30 PM Kevin Boyd wrote: > Hi, > > In addition to what Mark said (and I've also found pinning to be critical > for performance), you're also not using the GPUs with "-pme cpu -nb cpu". > > Kevin > > On Wed, May 1, 2019 at 5:56 PM Alex wrote: > > > Well, my experience so far has been with the EM, because the rest of the > > script (with all the dynamic things) needed that to finish. And it > > "finished" by hitting the wall. However, your comment does touch upon > what > > to do with thread pinning and I will try to set '-pin on' throughout to > see > > if things make a difference for the better. I am less confident about > > setting strides because it is unclear what the job manager provides in > > terms of the available core numbers. I will play around some more and > > report here. > > > > Thanks! > > > > Alex > > > > On Wed, May 1, 2019 at 3:49 PM Mark Abraham > > wrote: > > > > > Hi, > > > > > > As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus fairly > > > insensitive to the compiler's vectorisation abilities. GCC is the only > > > compiler we've tested, as xlc can't compile simple C++11. As > everywhere, > > > you should use the latest version of gcc, as IBM spent quite some years > > > landing improvements for POWER9. > > > > > > EM is useless as a performance indicator of a dynamical simulation, > avoid > > > that - it runs serial code much much more often. > > > > > > Your run deliberately didn't fill the available cores, so just like on > > x86, > > > mdrun will leave the thread affinity handling to the environment, which > > is > > > often a path to bad performance. So, if you plan on doing that often, > > > you'll want to check out the mdrun performance guide docs about the > mdrun > > > -pin and related options. > > > > > > Mark > > > > > > > > > On Wed., 1 May 2019, 23:21 Alex, wrote: > > > > > > > Hi all, > > > > > > > > Our institution decided to be all fancy, so now we have a bunch of > > Power9 > > > > nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm. > > > Today > > > > I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') > and > > > the > > > > performance is abysmal, I would guess 100 times slower than on > anything > > > > I've ever seen before. > > > > > > > > Our admin person emailed me the following: > > > > "-- it would not surprise me if the GCC compilers were relatively bad > > at > > > > taking advantage of POWER9 vectorization, they're likely optimized > for > > > > x86_64 vector stuff like SSE and AVX operations. This was an issue > in > > > the > > > > build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but according > to > > > my > > > > notes, that was part of an attempt to fix the "unimplemented SIMD" > > error > > > > that was dogging me at first, and/but which was eventually cleared by > > > > switching to gcc-6." > > > > > > > > Does anyone have any comments/suggestions on building and running GMX > > on > > > > Power9? > > > > > > > > Thank you, > > > > > > > > Alex > > > > -- > > > > Gromacs Users mailing list > > > > > > > > * Please search the archive at > > > > > > > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=hInjXVJw1xyIo23W3Q%2Fnt5UlXy%2Bx5mok7re4cpCopG8%3Dreserved=0 > > before > > > > posting! > > > > > > > > * Can't post? Read > > > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Listsdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=iiux5GTZD%2F7xh56kyGi%2BCImX55GOgP9gdi1Bx6lUEOM%3Dreserved=0 > > > > > > > > * For (un)subscribe requests visit > > > > > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-usersdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=fA46t2G3%2FRErO9ephu1d2QoOcWoLadgzG6DkhSG9Los%3Dreserved=0 > > or > > > > send a mail to gmx-users-requ...@gromacs.org. > > > > > > > -- > > > Gromacs Users mailing list > > > > > > * Please search the archive at > > > > > > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=hInjXVJw1xyIo23W3Q%2Fnt5UlXy%2Bx5mok7re4cpCopG8%3Dreserved=0 > > before > > > posting! > > > > > > * Can't post? Read > > >
Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
Hi, In addition to what Mark said (and I've also found pinning to be critical for performance), you're also not using the GPUs with "-pme cpu -nb cpu". Kevin On Wed, May 1, 2019 at 5:56 PM Alex wrote: > Well, my experience so far has been with the EM, because the rest of the > script (with all the dynamic things) needed that to finish. And it > "finished" by hitting the wall. However, your comment does touch upon what > to do with thread pinning and I will try to set '-pin on' throughout to see > if things make a difference for the better. I am less confident about > setting strides because it is unclear what the job manager provides in > terms of the available core numbers. I will play around some more and > report here. > > Thanks! > > Alex > > On Wed, May 1, 2019 at 3:49 PM Mark Abraham > wrote: > > > Hi, > > > > As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus fairly > > insensitive to the compiler's vectorisation abilities. GCC is the only > > compiler we've tested, as xlc can't compile simple C++11. As everywhere, > > you should use the latest version of gcc, as IBM spent quite some years > > landing improvements for POWER9. > > > > EM is useless as a performance indicator of a dynamical simulation, avoid > > that - it runs serial code much much more often. > > > > Your run deliberately didn't fill the available cores, so just like on > x86, > > mdrun will leave the thread affinity handling to the environment, which > is > > often a path to bad performance. So, if you plan on doing that often, > > you'll want to check out the mdrun performance guide docs about the mdrun > > -pin and related options. > > > > Mark > > > > > > On Wed., 1 May 2019, 23:21 Alex, wrote: > > > > > Hi all, > > > > > > Our institution decided to be all fancy, so now we have a bunch of > Power9 > > > nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm. > > Today > > > I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') and > > the > > > performance is abysmal, I would guess 100 times slower than on anything > > > I've ever seen before. > > > > > > Our admin person emailed me the following: > > > "-- it would not surprise me if the GCC compilers were relatively bad > at > > > taking advantage of POWER9 vectorization, they're likely optimized for > > > x86_64 vector stuff like SSE and AVX operations. This was an issue in > > the > > > build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but according to > > my > > > notes, that was part of an attempt to fix the "unimplemented SIMD" > error > > > that was dogging me at first, and/but which was eventually cleared by > > > switching to gcc-6." > > > > > > Does anyone have any comments/suggestions on building and running GMX > on > > > Power9? > > > > > > Thank you, > > > > > > Alex > > > -- > > > Gromacs Users mailing list > > > > > > * Please search the archive at > > > > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=hInjXVJw1xyIo23W3Q%2Fnt5UlXy%2Bx5mok7re4cpCopG8%3Dreserved=0 > before > > > posting! > > > > > > * Can't post? Read > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Listsdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=iiux5GTZD%2F7xh56kyGi%2BCImX55GOgP9gdi1Bx6lUEOM%3Dreserved=0 > > > > > > * For (un)subscribe requests visit > > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-usersdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=fA46t2G3%2FRErO9ephu1d2QoOcWoLadgzG6DkhSG9Los%3Dreserved=0 > or > > > send a mail to gmx-users-requ...@gromacs.org. > > > > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > > > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_Listdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=hInjXVJw1xyIo23W3Q%2Fnt5UlXy%2Bx5mok7re4cpCopG8%3Dreserved=0 > before > > posting! > > > > * Can't post? Read > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Listsdata=02%7C01%7Ckevin.boyd%40uconn.edu%7C4c05490f75ba4dc9658e08d6ce7fd451%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636923445772493263sdata=iiux5GTZD%2F7xh56kyGi%2BCImX55GOgP9gdi1Bx6lUEOM%3Dreserved=0 > > > > * For (un)subscribe requests visit > > >
Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
Well, my experience so far has been with the EM, because the rest of the script (with all the dynamic things) needed that to finish. And it "finished" by hitting the wall. However, your comment does touch upon what to do with thread pinning and I will try to set '-pin on' throughout to see if things make a difference for the better. I am less confident about setting strides because it is unclear what the job manager provides in terms of the available core numbers. I will play around some more and report here. Thanks! Alex On Wed, May 1, 2019 at 3:49 PM Mark Abraham wrote: > Hi, > > As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus fairly > insensitive to the compiler's vectorisation abilities. GCC is the only > compiler we've tested, as xlc can't compile simple C++11. As everywhere, > you should use the latest version of gcc, as IBM spent quite some years > landing improvements for POWER9. > > EM is useless as a performance indicator of a dynamical simulation, avoid > that - it runs serial code much much more often. > > Your run deliberately didn't fill the available cores, so just like on x86, > mdrun will leave the thread affinity handling to the environment, which is > often a path to bad performance. So, if you plan on doing that often, > you'll want to check out the mdrun performance guide docs about the mdrun > -pin and related options. > > Mark > > > On Wed., 1 May 2019, 23:21 Alex, wrote: > > > Hi all, > > > > Our institution decided to be all fancy, so now we have a bunch of Power9 > > nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm. > Today > > I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') and > the > > performance is abysmal, I would guess 100 times slower than on anything > > I've ever seen before. > > > > Our admin person emailed me the following: > > "-- it would not surprise me if the GCC compilers were relatively bad at > > taking advantage of POWER9 vectorization, they're likely optimized for > > x86_64 vector stuff like SSE and AVX operations. This was an issue in > the > > build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but according to > my > > notes, that was part of an attempt to fix the "unimplemented SIMD" error > > that was dogging me at first, and/but which was eventually cleared by > > switching to gcc-6." > > > > Does anyone have any comments/suggestions on building and running GMX on > > Power9? > > > > Thank you, > > > > Alex > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > send a mail to gmx-users-requ...@gromacs.org. > > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
Hi, As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus fairly insensitive to the compiler's vectorisation abilities. GCC is the only compiler we've tested, as xlc can't compile simple C++11. As everywhere, you should use the latest version of gcc, as IBM spent quite some years landing improvements for POWER9. EM is useless as a performance indicator of a dynamical simulation, avoid that - it runs serial code much much more often. Your run deliberately didn't fill the available cores, so just like on x86, mdrun will leave the thread affinity handling to the environment, which is often a path to bad performance. So, if you plan on doing that often, you'll want to check out the mdrun performance guide docs about the mdrun -pin and related options. Mark On Wed., 1 May 2019, 23:21 Alex, wrote: > Hi all, > > Our institution decided to be all fancy, so now we have a bunch of Power9 > nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm. Today > I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') and the > performance is abysmal, I would guess 100 times slower than on anything > I've ever seen before. > > Our admin person emailed me the following: > "-- it would not surprise me if the GCC compilers were relatively bad at > taking advantage of POWER9 vectorization, they're likely optimized for > x86_64 vector stuff like SSE and AVX operations. This was an issue in the > build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but according to my > notes, that was part of an attempt to fix the "unimplemented SIMD" error > that was dogging me at first, and/but which was eventually cleared by > switching to gcc-6." > > Does anyone have any comments/suggestions on building and running GMX on > Power9? > > Thank you, > > Alex > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
[gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
Hi all, Our institution decided to be all fancy, so now we have a bunch of Power9 nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm. Today I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') and the performance is abysmal, I would guess 100 times slower than on anything I've ever seen before. Our admin person emailed me the following: "-- it would not surprise me if the GCC compilers were relatively bad at taking advantage of POWER9 vectorization, they're likely optimized for x86_64 vector stuff like SSE and AVX operations. This was an issue in the build, I selected "-DGMX_SIMD=IBM_VSX" for the config, but according to my notes, that was part of an attempt to fix the "unimplemented SIMD" error that was dogging me at first, and/but which was eventually cleared by switching to gcc-6." Does anyone have any comments/suggestions on building and running GMX on Power9? Thank you, Alex -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.