Re: [gmx-users] Poor GPU Performance with GROMACS 5.1.4

2017-05-25 Thread Szilárd Páll
On Thu, May 25, 2017 at 2:09 PM, Marcelo Depólo wrote: > Hi, > > > I had the same struggle benchmarking a similar system last week. Just for > curiosity, could you tell us the performance you get when sharing your GPU > with multiple jobs? BTW, interpreting some

Re: [gmx-users] Poor GPU Performance with GROMACS 5.1.4

2017-05-25 Thread Szilárd Páll
On Thu, May 25, 2017 at 2:09 PM, Marcelo Depólo wrote: > Hi, > > > I had the same struggle benchmarking a similar system last week. Just for > curiosity, could you tell us the performance you get when sharing your GPU > with multiple jobs? > > In my case (6k atoms +

Re: [gmx-users] Poor GPU Performance with GROMACS 5.1.4

2017-05-25 Thread Daniel Kozuch
Hi Marcelo, That sounds reasonable depending on your time-step and other factors, but I have not attempted to run with more than one job for GPU. Maybe Mark can comment more. Best, Dan On Thu, May 25, 2017 at 8:09 AM, Marcelo Depólo wrote: > Hi, > > > I had the same

Re: [gmx-users] Poor GPU Performance with GROMACS 5.1.4

2017-05-25 Thread Marcelo Depólo
Hi, I had the same struggle benchmarking a similar system last week. Just for curiosity, could you tell us the performance you get when sharing your GPU with multiple jobs? In my case (6k atoms + Reaction field + 8 cores 2.2Ghz + TitanX Pascal), I've got ~440 ns/day. However, I get ~280 ns/day

Re: [gmx-users] Poor GPU Performance with GROMACS 5.1.4

2017-05-25 Thread Mark Abraham
Hi, Good. Remember that the job scheduler is a degree of freedom that matters, so how you used it and why would have been good to mention the first time ;-) And don't just set your time step to arbitrary numbers unless you know why it is a stable integration scheme. Mark On Thu, May 25, 2017 at

Re: [gmx-users] Poor GPU Performance with GROMACS 5.1.4

2017-05-24 Thread Daniel Kozuch
I apologize for the confusion, but I found my error. I was failing to request a certain number of cpus-per-task and the scheduler was having issues assigning the threads because of this. Speed is now at ~400 ns/day with a 3 fs timestep which seems reasonable. Thanks for all the help, Dan On Wed,

Re: [gmx-users] Poor GPU Performance with GROMACS 5.1.4

2017-05-24 Thread Daniel Kozuch
Szilárd, I think I must be misunderstanding your advice. If I remove the domain decomposition and set pin on as suggested by Mark, using: gmx_gpu mdrun -deffnm my_tpr -dd 1 -pin on Then I get very poor performance and the following error: NOTE: Affinity setting for 6/6 threads failed. This can

Re: [gmx-users] Poor GPU Performance with GROMACS 5.1.4

2017-05-24 Thread Szilárd Páll
gmx_mpi mdrun (your run stuff here) -ntomp 4 >> > -gpu_id 00 >> > >> > That may speed it up. >> > >> > =========== >> > Micholas Dean Smith, PhD. >> > Post-doctoral Research Associate >> > University of Tennessee/Oa

Re: [gmx-users] Poor GPU Performance with GROMACS 5.1.4

2017-05-24 Thread Mark Abraham
ost-doctoral Research Associate > > University of Tennessee/Oak Ridge National Laboratory > > Center for Molecular Biophysics > > > > ________ > > From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se < > > gromacs.org_gmx-users-b

Re: [gmx-users] Poor GPU Performance with GROMACS 5.1.4

2017-05-24 Thread Daniel Kozuch
h.se> on behalf of Daniel > Kozuch <dkoz...@princeton.edu> > Sent: Wednesday, May 24, 2017 3:08 PM > To: gromacs.org_gmx-users@maillist.sys.kth.se > Subject: [gmx-users] Poor GPU Performance with GROMACS 5.1.4 > > Hello, > > I'm using GROMACS 5.1.4 on 8 CPUs and 1 GP

Re: [gmx-users] Poor GPU Performance with GROMACS 5.1.4

2017-05-24 Thread Smith, Micholas D.
t.sys.kth.se Subject: [gmx-users] Poor GPU Performance with GROMACS 5.1.4 Hello, I'm using GROMACS 5.1.4 on 8 CPUs and 1 GPU for a system of ~8000 atoms in a dodecahedron box, and I'm having trouble getting good performance out of the GPU. Specifically it appears that there is significant performance

[gmx-users] Poor GPU Performance with GROMACS 5.1.4

2017-05-24 Thread Daniel Kozuch
Hello, I'm using GROMACS 5.1.4 on 8 CPUs and 1 GPU for a system of ~8000 atoms in a dodecahedron box, and I'm having trouble getting good performance out of the GPU. Specifically it appears that there is significant performance loss to wait times ("Wait + Comm. F" and "Wait GPU nonlocal"). I have