Yup, your assessment agrees with our guess. Our HPC guru will be taking his
findings, along with your quote, to the admins.
Thank you,
Alex
On Thu, May 9, 2019 at 2:51 PM Szilárd Páll wrote:
> On Thu, May 9, 2019 at 10:01 PM Alex wrote:
>
> > Okay, we're positively unable to run a Gromacs
On Thu, May 9, 2019 at 10:01 PM Alex wrote:
> Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. The
> test procedure is simple, using slurm:
> 1. Request an interactive session: > srun -N 1 -n 20 --pty
> --partition=debug --time=1:00:00 --gres=gpu:1 bash
> 2. Load CUDA
Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. The
test procedure is simple, using slurm:
1. Request an interactive session: > srun -N 1 -n 20 --pty
--partition=debug --time=1:00:00 --gres=gpu:1 bash
2. Load CUDA library: module load cuda
3. Run test batch. This starts
Power9 (for HPC) is 4-way SMT, so make sure to try 1,2, and 4 threads per
core (stride 4, 2, and 1 respectively). Especially if you are offloading
all force computing to the GPU, what remains on the couch may not be able
to benefit from more than 1-2 threads per core.
--
Szilárd
On Thu, May 2,
Well, unless something important has changed within a year, I distinctly
remember being advised here not to offload anything to GPU for EM. Not
that we ever needed to, to be honest...
In any case, we appear to be dealing with build issues here.
Alex
On 5/1/2019 5:09 PM, Kevin Boyd wrote:
Hi,
>Of course, i am not. This is the EM. ;)
I haven't looked back at the code, but IIRC EM can use GPUs for the
nonbondeds, just not the PME. I just double-checked on one of my systems
with 10 cores and a GTX 1080 Ti, offloading to the GPU more than doubled
the minimization speed.
Kevin
On
Of course, i am not. This is the EM. ;)
On Wed, May 1, 2019, 4:30 PM Kevin Boyd wrote:
> Hi,
>
> In addition to what Mark said (and I've also found pinning to be critical
> for performance), you're also not using the GPUs with "-pme cpu -nb cpu".
>
> Kevin
>
> On Wed, May 1, 2019 at 5:56 PM
Hi,
In addition to what Mark said (and I've also found pinning to be critical
for performance), you're also not using the GPUs with "-pme cpu -nb cpu".
Kevin
On Wed, May 1, 2019 at 5:56 PM Alex wrote:
> Well, my experience so far has been with the EM, because the rest of the
> script (with
Well, my experience so far has been with the EM, because the rest of the
script (with all the dynamic things) needed that to finish. And it
"finished" by hitting the wall. However, your comment does touch upon what
to do with thread pinning and I will try to set '-pin on' throughout to see
if
Hi,
As with x86, GROMACS uses SIMD intrinsics on POWER9 and is thus fairly
insensitive to the compiler's vectorisation abilities. GCC is the only
compiler we've tested, as xlc can't compile simple C++11. As everywhere,
you should use the latest version of gcc, as IBM spent quite some years
Hi all,
Our institution decided to be all fancy, so now we have a bunch of Power9
nodes, each with 80 cores + 4 Volta GPUs. Stuff is managed by slurm. Today
I did a simple EM ('gmx mdrun -ntomp 4 -ntmpi 4 -pme cpu -nb cpu') and the
performance is abysmal, I would guess 100 times slower than on
11 matches
Mail list logo