Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-09 Thread Alex
Yup, your assessment agrees with our guess. Our HPC guru will be taking his findings, along with your quote, to the admins. Thank you, Alex On Thu, May 9, 2019 at 2:51 PM Szilárd Páll wrote: > On Thu, May 9, 2019 at 10:01 PM Alex wrote: > > > Okay, we're positively unable to run a Gromacs

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-09 Thread Szilárd Páll
On Thu, May 9, 2019 at 10:01 PM Alex wrote: > Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. The > test procedure is simple, using slurm: > 1. Request an interactive session: > srun -N 1 -n 20 --pty > --partition=debug --time=1:00:00 --gres=gpu:1 bash > 2. Load CUDA

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-09 Thread Alex
Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. The test procedure is simple, using slurm: 1. Request an interactive session: > srun -N 1 -n 20 --pty --partition=debug --time=1:00:00 --gres=gpu:1 bash 2. Load CUDA library: module load cuda 3. Run test batch. This starts

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-02 Thread Szilárd Páll
Power9 (for HPC) is 4-way SMT, so make sure to try 1,2, and 4 threads per core (stride 4, 2, and 1 respectively). Especially if you are offloading all force computing to the GPU, what remains on the couch may not be able to benefit from more than 1-2 threads per core. -- Szilárd On Thu, May 2,

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Alex
Well, unless something important has changed within a year, I distinctly remember being advised here not to offload anything to GPU for EM. Not that we ever needed to, to be honest... In any case, we appear to be dealing with build issues here. Alex On 5/1/2019 5:09 PM, Kevin Boyd wrote:

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Kevin Boyd
Hi, >Of course, i am not. This is the EM. ;) I haven't looked back at the code, but IIRC EM can use GPUs for the nonbondeds, just not the PME. I just double-checked on one of my systems with 10 cores and a GTX 1080 Ti, offloading to the GPU more than doubled the minimization speed. Kevin On

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Alex
Of course, i am not. This is the EM. ;) On Wed, May 1, 2019, 4:30 PM Kevin Boyd wrote: > Hi, > > In addition to what Mark said (and I've also found pinning to be critical > for performance), you're also not using the GPUs with "-pme cpu -nb cpu". > > Kevin > > On Wed, May 1, 2019 at 5:56 PM

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Kevin Boyd
Hi, In addition to what Mark said (and I've also found pinning to be critical for performance), you're also not using the GPUs with "-pme cpu -nb cpu". Kevin On Wed, May 1, 2019 at 5:56 PM Alex wrote: > Well, my experience so far has been with the EM, because the rest of the > script (with

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Alex
Well, my experience so far has been with the EM, because the rest of the script (with all the dynamic things) needed that to finish. And it "finished" by hitting the wall. However, your comment does touch upon what to do with thread pinning and I will try to set '-pin on' throughout to see if

Re: [gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Mark Abraham
Hi, As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus fairly insensitive to the compiler's vectorisation abilities. GCC is the only compiler we've tested, as xlc can't compile simple C++11. As everywhere, you should use the latest version of gcc, as IBM spent quite some years

[gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

2019-05-01 Thread Alex
Hi all, Our institution decided to be all fancy, so now we have a bunch of Power9 nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm. Today I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') and the performance is abysmal, I would guess 100 times slower than on