Re: [gmx-users] Help on MD performance, GPU has less load than CPU.

2017-07-13 Thread Mark Abraham
So the state of the art in the whole field is to just blindly copy .mdp
settings from webpages rather than review the literature of related work?
Nice. :-D

Mark

On Thu, Jul 13, 2017 at 7:18 PM Téletchéa Stéphane <
stephane.teletc...@univ-nantes.fr> wrote:

> Le 12/07/2017 à 18:15, Mark Abraham a écrit :
> > Hi,
> >
> > Sure. But who has data that shows that e.g. a free-energy calculation
> with
> > the defaults produces lower quality observables than you get with the
> > defaults?
> >
> > Mark
>
> Hi,
>
> As defaults are defaults ... who knows :-) To get number in front of
> these assumptions is hard, and probably nobody wants to do this on a
> large scale ... But I'm too close to holidays to argue on this point by
> now!
>
> Stéphane
>
> --
> Assistant Professor in BioInformatics, UFIP, UMR 6286 CNRS, Team Protein
> Design In Silico
> UFR Sciences et Techniques, 2, rue de la Houssinière, Bât. 25, 44322
> Nantes cedex 03, France
> Tél : +33 251 125 636 / Fax : +33 251 125 632
> http://www.ufip.univ-nantes.fr/ - http://www.steletch.org
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Help on MD performance, GPU has less load than CPU.

2017-07-13 Thread Téletchéa Stéphane

Le 12/07/2017 à 18:15, Mark Abraham a écrit :

Hi,

Sure. But who has data that shows that e.g. a free-energy calculation with
the defaults produces lower quality observables than you get with the
defaults?

Mark


Hi,

As defaults are defaults ... who knows :-) To get number in front of 
these assumptions is hard, and probably nobody wants to do this on a 
large scale ... But I'm too close to holidays to argue on this point by now!


Stéphane

--
Assistant Professor in BioInformatics, UFIP, UMR 6286 CNRS, Team Protein 
Design In Silico
UFR Sciences et Techniques, 2, rue de la Houssinière, Bât. 25, 44322 
Nantes cedex 03, France

Tél : +33 251 125 636 / Fax : +33 251 125 632
http://www.ufip.univ-nantes.fr/ - http://www.steletch.org
--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Help on MD performance, GPU has less load than CPU.

2017-07-12 Thread Mark Abraham
Hi,

Sure. But who has data that shows that e.g. a free-energy calculation with
the defaults produces lower quality observables than you get with the
defaults?

Mark

On Wed, Jul 12, 2017 at 5:59 PM Téletchéa Stéphane <
stephane.teletc...@univ-nantes.fr> wrote:

> Le 11/07/2017 à 15:24, Mark Abraham a écrit :
> > Guessing wildly, the cost of your simulation is probably at least double
> > what the defaults would give, and for that cost, I'd want to know why.
>
> Estimated colleague,
>
> Since this is a wild guess, I'd think to add some guesses myself. I
> remember "some time" back having used a lower tolerance on Ewald for
> amber simulations (around amber 4/5/6 ...) and it was more common at
> this time I presume. This may also be linked to the fact that amber has
> a short cut-off at 8 angstrom for electrostatics ...
> Someone apparently "ill" at the time already found this stane in 2009:
>
>
> http://gromacs.org_gmx-users.maillist.sys.kth.narkive.com/vTjpMdwU/gromacs-preformance-versus-amber
>
> Out of my memroy, I remembered using 10-6 for Ewald tolerance in AMBER,
> and this is mentioned here:
>
> http://ambermd.org/Questions/ewald.html
>
> ... apparently linked to DNA simulation as found in JACS 117,4193 (1995)
>
> In short, this value may come in back and forth for "historical" reasons
> (and misuse, of course).
>
> Others may have additional comments :-)
>
> Best,
>
> Stéphane
>
>
> --
> Assistant Professor in BioInformatics, UFIP, UMR 6286 CNRS, Team Protein
> Design In Silico
> UFR Sciences et Techniques, 2, rue de la Houssinière, Bât. 25, 44322
> Nantes cedex 03, France
> Tél : +33 251 125 636 / Fax : +33 251 125 632
> http://www.ufip.univ-nantes.fr/ - http://www.steletch.org
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Help on MD performance, GPU has less load than CPU.

2017-07-12 Thread Téletchéa Stéphane

Le 11/07/2017 à 15:24, Mark Abraham a écrit :

Guessing wildly, the cost of your simulation is probably at least double
what the defaults would give, and for that cost, I'd want to know why.


Estimated colleague,

Since this is a wild guess, I'd think to add some guesses myself. I 
remember "some time" back having used a lower tolerance on Ewald for 
amber simulations (around amber 4/5/6 ...) and it was more common at 
this time I presume. This may also be linked to the fact that amber has 
a short cut-off at 8 angstrom for electrostatics ...

Someone apparently "ill" at the time already found this stane in 2009:

http://gromacs.org_gmx-users.maillist.sys.kth.narkive.com/vTjpMdwU/gromacs-preformance-versus-amber

Out of my memroy, I remembered using 10-6 for Ewald tolerance in AMBER, 
and this is mentioned here:


http://ambermd.org/Questions/ewald.html

... apparently linked to DNA simulation as found in JACS 117,4193 (1995)

In short, this value may come in back and forth for "historical" reasons 
(and misuse, of course).


Others may have additional comments :-)

Best,

Stéphane


--
Assistant Professor in BioInformatics, UFIP, UMR 6286 CNRS, Team Protein 
Design In Silico
UFR Sciences et Techniques, 2, rue de la Houssinière, Bât. 25, 44322 
Nantes cedex 03, France

Tél : +33 251 125 636 / Fax : +33 251 125 632
http://www.ufip.univ-nantes.fr/ - http://www.steletch.org
--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Help on MD performance, GPU has less load than CPU.

2017-07-11 Thread Mark Abraham
Hi,

I'm genuinely curious about why people set ewald_rtol smaller (which is
unlikely to be useful, because the accumulation of forces in single
precision will have round-off error that means the approximation to the
correct sum is not reliably accurate to more than about 1 in 1e-5), and
thus pme_order to large values - second time I've seen this in 24 hours. Is
there data somewhere that shows this is useful?

In any case, it a) causes a lot more work on the CPU, and b) only 4 (and to
a lesser extent, 5) is optimized for performance (because there's no data
that shows higher order is useful). And for free-energy calculation, that
extra expense accrues for each lambda state. See the "PME mesh" parts of
the performance report.

Guessing wildly, the cost of your simulation is probably at least double
what the defaults would give, and for that cost, I'd want to know why.

Mark

On Mon, Jul 10, 2017 at 5:02 PM Davide Bonanni 
wrote:

> Hi,
>
> I am working on a node with Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 16
> physical core, 32 logical core and 1 GPU NVIDIA GeForce GTX 980 Ti.
> I am launching a series of 2 ns molecolar dynamics simulations of a system
> of 6 atoms.
> I tried diverse setting combination, but however i obtained the best
> performance with the command:
>
> "gmx mdrun  -deffnm md_LIG -cpt 1 -cpo restart1.cpt -pin on"
>
> which use 32 OpenMP threads, 1 MPI thread, and the GPU.
> At the end of the file.log of molecular dynamic production I obtain this
> message:
>
> "NOTE: The GPU has >25% less load than the CPU. This imbalance causes
>   performance loss."
>
> I don't know how can improve the load on CPU more than this, or how I can
> decrease the load on GPU. Do you have any suggestions?
>
> Thank you in advance.
>
> Cheers,
>
> Davide Bonanni
>
>
> Initial and final part of LOG file here:
>
> Log file opened on Sun Jul  9 04:02:44 2017
> Host: bigblue  pid: 16777  rank ID: 0  number of ranks:  1
>:-) GROMACS - gmx mdrun, VERSION 5.1.4 (-:
>
>
>
> GROMACS:  gmx mdrun, VERSION 5.1.4
> Executable:   /usr/bin/gmx
> Data prefix:  /usr/local/gromacs
> Command line:
>   gmx mdrun -deffnm md_fluo_7 -cpt 1 -cpo restart1.cpt -pin on
>
> GROMACS version:VERSION 5.1.4
> Precision:  single
> Memory model:   64 bit
> MPI library:thread_mpi
> OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
> GPU support:enabled
> OpenCL support: disabled
> invsqrt routine:gmx_software_invsqrt(x)
> SIMD instructions:  AVX2_256
> FFT library:fftw-3.3.4-sse2-avx
> RDTSCP usage:   enabled
> C++11 compilation:  disabled
> TNG support:enabled
> Tracing support:disabled
> Built on:   Tue  8 Nov 12:26:14 CET 2016
> Built by:   root@bigblue [CMAKE]
> Build OS/arch:  Linux 3.10.0-327.el7.x86_64 x86_64
> Build CPU vendor:   GenuineIntel
> Build CPU brand:Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
> Build CPU family:   6   Model: 63   Stepping: 2
> Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt
> lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd
> rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> C compiler: /bin/cc GNU 4.8.5
> C compiler flags:-march=core-avx2-Wextra
> -Wno-missing-field-initializers
> -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value
> -Wunused-parameter  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
>  -Wno-array-bounds
> C++ compiler:   /bin/c++ GNU 4.8.5
> C++ compiler flags:  -march=core-avx2-Wextra
> -Wno-missing-field-initializers
> -Wpointer-arith -Wall -Wno-unused-function  -O3 -DNDEBUG -funroll-all-loops
> -fexcess-precision=fast  -Wno-array-bounds
> Boost version:  1.55.0 (internal)
> CUDA compiler:  /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
> driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built on
> Sun_Sep__4_22:14:01_CDT_2016;Cuda compilation tools, release 8.0, V8.0.44
> CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=
> compute_30,code=sm_30;-gencode;arch=compute_35,code=
> sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=
> compute_50,code=sm_50;-gencode;arch=compute_52,code=
> sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=
> compute_61,code=sm_61;-gencode;arch=compute_60,code=
> compute_60;-gencode;arch=compute_61,code=compute_61;-use_fast_math;;
> ;-march=core-avx2;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-
> Wall;-Wno-unused-function;-O3;-DNDEBUG;-funroll-all-loops;-
> fexcess-precision=fast;-Wno-array-bounds;
> CUDA driver:8.0
> CUDA runtime:   8.0
>
>
> Running on 1 node with total 16 cores, 32 logical cores, 1 compatible GPU
> Hardware detected:
>   CPU info:
> Vendor: GenuineIntel
> Brand:  Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
> Family:  6  model: 63  stepping:  2
> CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt
> 

[gmx-users] Help on MD performance, GPU has less load than CPU.

2017-07-10 Thread Davide Bonanni
Hi,

I am working on a node with Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 16
physical core, 32 logical core and 1 GPU NVIDIA GeForce GTX 980 Ti.
I am launching a series of 2 ns molecolar dynamics simulations of a system
of 6 atoms.
I tried diverse setting combination, but however i obtained the best
performance with the command:

"gmx mdrun  -deffnm md_LIG -cpt 1 -cpo restart1.cpt -pin on"

which use 32 OpenMP threads, 1 MPI thread, and the GPU.
At the end of the file.log of molecular dynamic production I obtain this
message:

"NOTE: The GPU has >25% less load than the CPU. This imbalance causes
  performance loss."

I don't know how can improve the load on CPU more than this, or how I can
decrease the load on GPU. Do you have any suggestions?

Thank you in advance.

Cheers,

Davide Bonanni


Initial and final part of LOG file here:

Log file opened on Sun Jul  9 04:02:44 2017
Host: bigblue  pid: 16777  rank ID: 0  number of ranks:  1
   :-) GROMACS - gmx mdrun, VERSION 5.1.4 (-:



GROMACS:  gmx mdrun, VERSION 5.1.4
Executable:   /usr/bin/gmx
Data prefix:  /usr/local/gromacs
Command line:
  gmx mdrun -deffnm md_fluo_7 -cpt 1 -cpo restart1.cpt -pin on

GROMACS version:VERSION 5.1.4
Precision:  single
Memory model:   64 bit
MPI library:thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support:enabled
OpenCL support: disabled
invsqrt routine:gmx_software_invsqrt(x)
SIMD instructions:  AVX2_256
FFT library:fftw-3.3.4-sse2-avx
RDTSCP usage:   enabled
C++11 compilation:  disabled
TNG support:enabled
Tracing support:disabled
Built on:   Tue  8 Nov 12:26:14 CET 2016
Built by:   root@bigblue [CMAKE]
Build OS/arch:  Linux 3.10.0-327.el7.x86_64 x86_64
Build CPU vendor:   GenuineIntel
Build CPU brand:Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Build CPU family:   6   Model: 63   Stepping: 2
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt
lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd
rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /bin/cc GNU 4.8.5
C compiler flags:-march=core-avx2-Wextra
-Wno-missing-field-initializers
-Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value
-Wunused-parameter  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
 -Wno-array-bounds
C++ compiler:   /bin/c++ GNU 4.8.5
C++ compiler flags:  -march=core-avx2-Wextra
-Wno-missing-field-initializers
-Wpointer-arith -Wall -Wno-unused-function  -O3 -DNDEBUG -funroll-all-loops
-fexcess-precision=fast  -Wno-array-bounds
Boost version:  1.55.0 (internal)
CUDA compiler:  /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built on
Sun_Sep__4_22:14:01_CDT_2016;Cuda compilation tools, release 8.0, V8.0.44
CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=
compute_30,code=sm_30;-gencode;arch=compute_35,code=
sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=
compute_50,code=sm_50;-gencode;arch=compute_52,code=
sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=
compute_61,code=sm_61;-gencode;arch=compute_60,code=
compute_60;-gencode;arch=compute_61,code=compute_61;-use_fast_math;;
;-march=core-avx2;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-
Wall;-Wno-unused-function;-O3;-DNDEBUG;-funroll-all-loops;-
fexcess-precision=fast;-Wno-array-bounds;
CUDA driver:8.0
CUDA runtime:   8.0


Running on 1 node with total 16 cores, 32 logical cores, 1 compatible GPU
Hardware detected:
  CPU info:
Vendor: GenuineIntel
Brand:  Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Family:  6  model: 63  stepping:  2
CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt
lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd
rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
SIMD instructions most likely to fit this hardware: AVX2_256
SIMD instructions selected at GROMACS compile time: AVX2_256
  GPU info:
Number of GPUs detected: 1
#0: NVIDIA GeForce GTX 980 Ti, compute cap.: 5.2, ECC:  no, stat:
compatible



Changing nstlist from 20 to 40, rlist from 1.2 to 1.2

Input Parameters:
   integrator = sd
   tinit  = 0
   dt = 0.002
   nsteps = 100
   init-step  = 0
   simulation-part= 1
   comm-mode  = Linear
   nstcomm= 100
   bd-fric= 0
   ld-seed= 57540858
   emtol  = 10
   emstep = 0.01
   niter  = 20
   fcstep = 0
   nstcgsteep = 1000
   nbfgscorr  = 10
   rtpi   = 0.05
   nstxout