Re: [gmx-users] 2019.2 not using all available cores

2019-08-21 Thread Dallas Warren
I've discovered an option that caused 2019.2 to use all of the cores
correctly.

Use "-pin on" and it works as expected, using all 12 cores, CPU load being
show as appropriate (gets up to 68% total CPU utilisation)

Use "-pin auto", which is the default, or "-pin off" and it will only use a
single core (maximum is 8% total CPU utilisation).

Catch ya,

Dr. Dallas Warren
Drug Delivery, Disposition and Dynamics
Monash Institute of Pharmaceutical Sciences, Monash University
381 Royal Parade, Parkville VIC 3052
dallas.war...@monash.edu
-
When the only tool you own is a hammer, every problem begins to resemble a
nail.


On Thu, 9 May 2019 at 07:54, Dallas Warren  wrote:

> gmx 2019.2 compiled using threads only uses a single core mdrun_mpi
> compiled using MPI only uses a single core, gmx 2016.3 using threads
> uses all 12 cores.
>
> For compiling thread version of 2019.2 used:
> cmake .. -DGMX_GPU=ON
> -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/gromacs-2019.2
>
> For compiling MPI version of 2019.2 used:
> cmake .. -DGMX_MPI=ON -DBUILD_SHARED_LIBS=OFF -DGMX_GPU=ON
> -DCMAKE_CXX_COMPILER=/usr/lib64/mpi/gcc/openmpi/bin/mpiCC
> -DCMAKE_C_COMPILER=/usr/lib64/mpi/gcc/openmpi/bin/mpicc
> -DGMX_BUILD_MDRUN_ONLY=ON
> -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/gromacs-2019.2
>
> Between building both to of those, deleted the build directory.
>
> 
> GROMACS:  gmx, version 2019.2
> Executable:   /usr/local/gromacs/gromacs-2019.2/bin/gmx
> Data prefix:  /usr/local/gromacs/gromacs-2019.2
> Working dir:  /home/dallas/experiments/current/19-064/P6DLO
> Command line:
>   gmx -version
>
> GROMACS version:2019.2
> Precision:  single
> Memory model:   64 bit
> MPI library:thread_mpi
> OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> GPU support:CUDA
> SIMD instructions:  AVX_256
> FFT library:fftw-3.3.8-sse2
> RDTSCP usage:   enabled
> TNG support:enabled
> Hwloc support:  disabled
> Tracing support:disabled
> C compiler: /usr/bin/cc GNU 7.4.0
> C compiler flags:-mavx -O3 -DNDEBUG -funroll-all-loops
> -fexcess-precision=fast
> C++ compiler:   /usr/bin/c++ GNU 7.4.0
> C++ compiler flags:  -mavx-std=c++11   -O3 -DNDEBUG
> -funroll-all-loops -fexcess-precision=fast
> CUDA compiler:  /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
> compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on
> Fri_Feb__8_19:08:17_PST_2019;Cuda compilation tools, release 10.1,
> V10.1.105
> CUDA compiler
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;-D_FORCE_INLINES;;
> ;-mavx;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> CUDA driver:10.10
> CUDA runtime:   10.10
>
> 
> GROMACS:  mdrun_mpi, version 2019.2
> Executable:   /usr/local/gromacs/gromacs-2019.2/bin/mdrun_mpi
> Data prefix:  /usr/local/gromacs/gromacs-2019.2
> Working dir:  /home/dallas/experiments/current/19-064/P6DLO
> Command line:
>   mdrun_mpi -version
>
> GROMACS version:2019.2
> Precision:  single
> Memory model:   64 bit
> MPI library:MPI
> OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> GPU support:CUDA
> SIMD instructions:  AVX_256
> FFT library:fftw-3.3.8-sse2
> RDTSCP usage:   enabled
> TNG support:enabled
> Hwloc support:  disabled
> Tracing support:disabled
> C compiler: /usr/lib64/mpi/gcc/openmpi/bin/mpicc GNU 7.4.0
> C compiler flags:-mavx -O3 -DNDEBUG -funroll-all-loops
> -fexcess-precision=fast
> C++ compiler:   /usr/lib64/mpi/gcc/openmpi/bin/mpiCC GNU 7.4.0
> C++ compiler flags:  -mavx-std=c++11   -O3 -DNDEBUG
> -funroll-all-loops -fexcess-precision=fast
> CUDA compiler:  /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
> compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on
> Fri_Feb__8_19:08:17_PST_2019;Cuda compilation tools, release 10.1,
> V10.1.105
> CUDA compiler
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;-D_FORCE_INLINES;;
> ;-mavx;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> CUDA driver:10.10
> CUDA runtime:   10.10
>
> 
> /usr/local/gromacs/gromacs-2016.3/bin/gmx -version
>
> 

[gmx-users] 2019.2 not using all available cores

2019-05-08 Thread Dallas Warren
gmx 2019.2 compiled using threads only uses a single core mdrun_mpi
compiled using MPI only uses a single core, gmx 2016.3 using threads
uses all 12 cores.

For compiling thread version of 2019.2 used:
cmake .. -DGMX_GPU=ON -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/gromacs-2019.2

For compiling MPI version of 2019.2 used:
cmake .. -DGMX_MPI=ON -DBUILD_SHARED_LIBS=OFF -DGMX_GPU=ON
-DCMAKE_CXX_COMPILER=/usr/lib64/mpi/gcc/openmpi/bin/mpiCC
-DCMAKE_C_COMPILER=/usr/lib64/mpi/gcc/openmpi/bin/mpicc
-DGMX_BUILD_MDRUN_ONLY=ON
-DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/gromacs-2019.2

Between building both to of those, deleted the build directory.


GROMACS:  gmx, version 2019.2
Executable:   /usr/local/gromacs/gromacs-2019.2/bin/gmx
Data prefix:  /usr/local/gromacs/gromacs-2019.2
Working dir:  /home/dallas/experiments/current/19-064/P6DLO
Command line:
  gmx -version

GROMACS version:2019.2
Precision:  single
Memory model:   64 bit
MPI library:thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:CUDA
SIMD instructions:  AVX_256
FFT library:fftw-3.3.8-sse2
RDTSCP usage:   enabled
TNG support:enabled
Hwloc support:  disabled
Tracing support:disabled
C compiler: /usr/bin/cc GNU 7.4.0
C compiler flags:-mavx -O3 -DNDEBUG -funroll-all-loops
-fexcess-precision=fast
C++ compiler:   /usr/bin/c++ GNU 7.4.0
C++ compiler flags:  -mavx-std=c++11   -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
CUDA compiler:  /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on
Fri_Feb__8_19:08:17_PST_2019;Cuda compilation tools, release 10.1,
V10.1.105
CUDA compiler 
flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;-D_FORCE_INLINES;;
;-mavx;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:10.10
CUDA runtime:   10.10


GROMACS:  mdrun_mpi, version 2019.2
Executable:   /usr/local/gromacs/gromacs-2019.2/bin/mdrun_mpi
Data prefix:  /usr/local/gromacs/gromacs-2019.2
Working dir:  /home/dallas/experiments/current/19-064/P6DLO
Command line:
  mdrun_mpi -version

GROMACS version:2019.2
Precision:  single
Memory model:   64 bit
MPI library:MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:CUDA
SIMD instructions:  AVX_256
FFT library:fftw-3.3.8-sse2
RDTSCP usage:   enabled
TNG support:enabled
Hwloc support:  disabled
Tracing support:disabled
C compiler: /usr/lib64/mpi/gcc/openmpi/bin/mpicc GNU 7.4.0
C compiler flags:-mavx -O3 -DNDEBUG -funroll-all-loops
-fexcess-precision=fast
C++ compiler:   /usr/lib64/mpi/gcc/openmpi/bin/mpiCC GNU 7.4.0
C++ compiler flags:  -mavx-std=c++11   -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
CUDA compiler:  /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on
Fri_Feb__8_19:08:17_PST_2019;Cuda compilation tools, release 10.1,
V10.1.105
CUDA compiler 
flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;-D_FORCE_INLINES;;
;-mavx;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:10.10
CUDA runtime:   10.10


/usr/local/gromacs/gromacs-2016.3/bin/gmx -version

GROMACS:  gmx, version 2016.3
Executable:   /usr/local/gromacs/gromacs-2016.3/bin/gmx
Data prefix:  /usr/local/gromacs/gromacs-2016.3
Working dir:  /home/dallas/experiments/current/19-064/P6DLO
Command line:
  gmx -version

GROMACS version:2016.3
Precision:  single
Memory model:   64 bit
MPI library:thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support:CUDA
SIMD instructions:  AVX_256
FFT library:fftw-3.3.8-sse2
RDTSCP usage:   enabled
TNG support:enabled
Hwloc support:  disabled
Tracing support:disabled
Built on:   Tue Mar 21 13:21:15 AEDT 2017
Built by:   dallas@morph [CMAKE]
Build OS/arch:  Linux 4.4.49-16-default x86_64
Build CPU vendor:   Intel
Build CPU brand:Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
Build CPU family:   6   Model: 45   Stepping: 7
Build CPU