Re: [SIESTA-L] Installing Siesta GPU acclerated version

2022-07-28 Por tôpico Narendranath Ghosh
Dear All,
Thank you very much for introducing me to a wonderful forum.
I am new in Siesta.
Please let me know how I calculate free energy change of a reaction, I am
interested in calculating free energy profile for CO2 reduction steps.
Thanks in advance.

Regards,
Dr. N N Ghsoh
University of Gour Banga
India-73211
‪Dr. NARENDRA NATH GHOSH‬ - ‪Google Scholar‬




On Fri, Jul 1, 2022 at 1:31 AM Mohammed Ghadiyali <
mohammed.ghadiy...@kaust.edu.sa> wrote:

> Hello,
>
> Thanks for the information, I’ll try to install it as per the instructions
> provided.
>
> Regards,
> Ghadiyali Mohammed Kader
> Post Doctoral Fellow
> King Abdullah University of Science and Technology
> On 29 Jun 2022, 11:06 PM +0300, Alberto Garcia , wrote:
>
> Hello,
>
> We are writing a section on GPUs in the documentation, but until it is
> ready you can use the ideas below:
>
>
> There are two ways to take advantage of GPUs (enabled only for the solver
> stage, which typically takes up the most time):
>
> -- Using the ELPA library and its native interface in Siesta (this method
> is available for Siesta versions 4.1.5 and up)
>
> -- Using the ELSI library (for Siesta "MaX" versions
> (see the Guide to Siesta Versions in
> https://urldefense.com/v3/__https://gitlab.com/siesta-project/siesta/-/wikis/Guide-to-Siesta-versions__;!!Nmw4Hv0!zoGYljExKqs1b-VkrSmFhJ6futBuvXnhXa2RrwwV6TwhqqPzKFcb7n5hg1lr7CF5rT0insiOyeJscp-ZjrM4f0RLn3Di2A$
> )
>
> In both cases the special installation instructions involve only enabling
> GPU support in either ELPA or ELSI, and using the proper options in Siesta.
>
> For the first method the fdf options to enable GPUs are (example):
>
> diag-algorithm elpa-2
> diag-elpa-usegpu T
> diag-blocksize 16
> # Optional
> number-of-eigenstates 17320
> use-tree-timer T
>
>
> For the second (ELSI) method:
>
> solution-method elsi
> elsi-solver elpa
> elsi-elpa-gpu 1
> elsi-elpa-flavor 2
>
> # Optional
> number-of-eigenstates 17320
> use-tree-timer T
> elsi-output-level 3
>
> The installation of ELPA and ELSI with GPU support is system-specific, but
> you can get inspiration from the following examples:
>
> * ELPA (on Marconi-100 at CINECA, with IBM P9 chips and nVidia A100 GPUs,
> using the gcc compiler):
>
> Script to configure:
>
> #!/bin/sh
>
> # (Need to define properly the symbols used below)
> # Note that the P9 does not use the typical Intel kernels
>
> FC=mpifort CC=mpicc CXX=mpic++ \
> CFLAGS="-O3 -mcpu=native -std=c++11" \
> FCFLAGS="-O3 -mcpu=native -ffree-line-length-none"
> LDFLAGS="${SCALAPACK_LIBS} ${LAPACK_LIBS}" \
> ../configure \
> --with-cuda-path=${CUDA_HOME} \
> --with-cuda-sdk-path=${CUDA_HOME} \
> --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_70 \
> --enable-NVIDIA-gpu-memory-debug --enable-nvtx \
> --disable-sse-assembly --disable-sse --disable-avx --disable-avx2
> --disable-avx512 \
> --enable-c-tests=no --prefix=$PRJ/bin/gcc/elpa/2021.05.002.jul22
>
>
> (Adapt the options to your system)
>
> * ELSI
>
> SET(CMAKE_INSTALL_PREFIX "$ENV{BASE_DIR}/elsi/2.6.2" CACHE STRING
> "Installation dir")
>
> SET(CMAKE_Fortran_COMPILER "mpif90" CACHE STRING "MPI Fortran compiler")
> SET(CMAKE_C_COMPILER "mpicc" CACHE STRING "MPI C compiler")
> SET(CMAKE_CXX_COMPILER "mpicxx" CACHE STRING "MPI C++ compiler")
>
> SET(CMAKE_Fortran_FLAGS "-O2 -g -fbacktrace -fdump-core" CACHE STRING
> "Fortran flags")
> SET(CMAKE_C_FLAGS "-O2 -g -std=c99" CACHE STRING "C flags")
> SET(CMAKE_CXX_FLAGS "-O2 -g -std=c++11" CACHE STRING "C++ flags")
> SET(CMAKE_CUDA_FLAGS "-O3 -arch=sm_70 -std=c++11" CACHE STRING "CUDA
> flags")
> # Workaround: specify -std=c++11 in CMAKE_CUDA_FLAGS to avoid __ieee128
> gcc/cuda bug
>
> SET(USE_GPU_CUDA ON CACHE BOOL "Use CUDA-based GPU acceleration in ELPA")
> SET(ENABLE_PEXSI ON CACHE BOOL "Enable PEXSI")
> SET(ENABLE_TESTS ON CACHE BOOL "Enable tests")
> #SET(ADD_UNDERSCORE OFF CACHE BOOL "Do not suffix C functions with an
> underscore")
>
> SET(LIB_PATHS
> "/cineca/prod/opt/libraries/lapack/3.9.0/gnu--8.4.0/lib;/cineca/prod/opt/libraries/scalapack/2.1.0/spectrum_mpi--10.3.1--binary/lib;/cineca/prod/opt/compilers/cuda/11.0/none/lib64;/cineca/prod/opt/libraries/essl/6.2.1/binary/lib64"
> CACHE STRING "External library paths")
>
> SET(LIBS "scalapack;lapack;essl;cublas;cudart" CACHE STRING "External
> libraries")
> You should modify appropriately the location and version numbers of your
> libraries.
>
> Finally, a note about the importance of the proper execution incantation,
> for "pinning" the MPI ranks to the appropriate GPU:
>
> (There are probably better and more streamlined ways to do this)
>
> For this example I use the 32 cores (2x16) in Marconi for MPI tasks, no
> OpenMP, and do not take advantage of the 4x hyperthreading.
>
> The slurm script I typically use is: (gcc_env et al are my own Lmod
> modules)
>
> =
> #!/bin/bash
> #SBATCH 

Re: [SIESTA-L] Installing Siesta GPU acclerated version

2022-06-30 Por tôpico Mohammed Ghadiyali
Hello,

Thanks for the information, I’ll try to install it as per the instructions 
provided.

Regards,
Ghadiyali Mohammed Kader
Post Doctoral Fellow
King Abdullah University of Science and Technology
On 29 Jun 2022, 11:06 PM +0300, Alberto Garcia , wrote:
> Hello,
>
> We are writing a section on GPUs in the documentation, but until it is ready 
> you can use the ideas below:
>
>
> There are two ways to take advantage of GPUs (enabled only for the solver 
> stage, which typically takes up the most time):
>
> -- Using the ELPA library and its native interface in Siesta (this method is 
> available for Siesta versions 4.1.5 and up)
>
> -- Using the ELSI library (for Siesta "MaX" versions
> (see the Guide to Siesta Versions in 
> https://urldefense.com/v3/__https://gitlab.com/siesta-project/siesta/-/wikis/Guide-to-Siesta-versions__;!!Nmw4Hv0!zoGYljExKqs1b-VkrSmFhJ6futBuvXnhXa2RrwwV6TwhqqPzKFcb7n5hg1lr7CF5rT0insiOyeJscp-ZjrM4f0RLn3Di2A$
>  )
>
> In both cases the special installation instructions involve only enabling GPU 
> support in either ELPA or ELSI, and using the proper options in Siesta.
>
> For the first method the fdf options to enable GPUs are (example):
>
> diag-algorithm elpa-2
> diag-elpa-usegpu T
> diag-blocksize 16
> # Optional
> number-of-eigenstates 17320
> use-tree-timer T
>
>
> For the second (ELSI) method:
>
> solution-method elsi
> elsi-solver elpa
> elsi-elpa-gpu 1
> elsi-elpa-flavor 2
>
> # Optional
> number-of-eigenstates 17320
> use-tree-timer T
> elsi-output-level 3
>
> The installation of ELPA and ELSI with GPU support is system-specific, but 
> you can get inspiration from the following examples:
>
> * ELPA (on Marconi-100 at CINECA, with IBM P9 chips and nVidia A100 GPUs, 
> using the gcc compiler):
>
> Script to configure:
>
> #!/bin/sh
>
> # (Need to define properly the symbols used below)
> # Note that the P9 does not use the typical Intel kernels
>
> FC=mpifort CC=mpicc CXX=mpic++ \
> CFLAGS="-O3 -mcpu=native -std=c++11" \
> FCFLAGS="-O3 -mcpu=native -ffree-line-length-none" LDFLAGS="${SCALAPACK_LIBS} 
> ${LAPACK_LIBS}" \
> ../configure \
> --with-cuda-path=${CUDA_HOME} \
> --with-cuda-sdk-path=${CUDA_HOME} \
> --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_70 \
> --enable-NVIDIA-gpu-memory-debug --enable-nvtx \
> --disable-sse-assembly --disable-sse --disable-avx --disable-avx2 
> --disable-avx512 \
> --enable-c-tests=no --prefix=$PRJ/bin/gcc/elpa/2021.05.002.jul22
>
>
> (Adapt the options to your system)
>
> * ELSI
>
> SET(CMAKE_INSTALL_PREFIX "$ENV{BASE_DIR}/elsi/2.6.2" CACHE STRING 
> "Installation dir")
>
> SET(CMAKE_Fortran_COMPILER "mpif90" CACHE STRING "MPI Fortran compiler")
> SET(CMAKE_C_COMPILER "mpicc" CACHE STRING "MPI C compiler")
> SET(CMAKE_CXX_COMPILER "mpicxx" CACHE STRING "MPI C++ compiler")
>
> SET(CMAKE_Fortran_FLAGS "-O2 -g -fbacktrace -fdump-core" CACHE STRING 
> "Fortran flags")
> SET(CMAKE_C_FLAGS "-O2 -g -std=c99" CACHE STRING "C flags")
> SET(CMAKE_CXX_FLAGS "-O2 -g -std=c++11" CACHE STRING "C++ flags")
> SET(CMAKE_CUDA_FLAGS "-O3 -arch=sm_70 -std=c++11" CACHE STRING "CUDA flags")
> # Workaround: specify -std=c++11 in CMAKE_CUDA_FLAGS to avoid __ieee128 
> gcc/cuda bug
>
> SET(USE_GPU_CUDA ON CACHE BOOL "Use CUDA-based GPU acceleration in ELPA")
> SET(ENABLE_PEXSI ON CACHE BOOL "Enable PEXSI")
> SET(ENABLE_TESTS ON CACHE BOOL "Enable tests")
> #SET(ADD_UNDERSCORE OFF CACHE BOOL "Do not suffix C functions with an 
> underscore")
>
> SET(LIB_PATHS 
> "/cineca/prod/opt/libraries/lapack/3.9.0/gnu--8.4.0/lib;/cineca/prod/opt/libraries/scalapack/2.1.0/spectrum_mpi--10.3.1--binary/lib;/cineca/prod/opt/compilers/cuda/11.0/none/lib64;/cineca/prod/opt/libraries/essl/6.2.1/binary/lib64"
>  CACHE STRING "External library paths")
>
> SET(LIBS "scalapack;lapack;essl;cublas;cudart" CACHE STRING "External 
> libraries")
> You should modify appropriately the location and version numbers of your 
> libraries.
>
> Finally, a note about the importance of the proper execution incantation, for 
> "pinning" the MPI ranks to the appropriate GPU:
>
> (There are probably better and more streamlined ways to do this)
>
> For this example I use the 32 cores (2x16) in Marconi for MPI tasks, no 
> OpenMP, and do not take advantage of the 4x hyperthreading.
>
> The slurm script I typically use is: (gcc_env et al are my own Lmod modules)
> =
> #!/bin/bash
> #SBATCH --job-name=test-covid
> #SBATCH --account=Pra19_MaX_1
> #SBATCH --partition=m100_usr_prod
> #SBATCH --output=mpi_%j.out
> #SBATCH --error=mpi_%j.err
> #SBATCH --nodes=8
> #SBATCH --ntasks-per-node=32
> #SBATCH --ntasks-per-socket=16
> #SBATCH --cpus-per-task=4
> #SBATCH --gres=gpu:4
> #SBATCH --time=00:19:00
>
> #
> ml purge
> ml gcc_env
> ml siesta-max/1.0-14
> #
> date
> which siesta
> echo "---"
> #
> export OMP_NUM_THREADS=1
> #
> mpirun --map-by socket:PE=1 --rank-by core --report-bindings \
> -np ${SLURM_NTASKS} 

Re: [SIESTA-L] Installing Siesta GPU acclerated version

2022-06-29 Por tôpico Alberto Garcia
Hello,

We are writing a section on GPUs in the documentation, but until it is ready 
you can use the ideas below:


There are two ways to take advantage of GPUs (enabled only for the solver 
stage, which typically takes up the most time):

-- Using the ELPA library and its native interface in Siesta (this method is 
available for Siesta versions 4.1.5 and up)

-- Using the ELSI library (for Siesta "MaX" versions 
 (see the Guide to Siesta Versions in 
https://gitlab.com/siesta-project/siesta/-/wikis/Guide-to-Siesta-versions)

In both cases the special installation instructions involve only enabling GPU 
support in either ELPA or ELSI, and using the proper options in Siesta.

For the first method the fdf options to enable GPUs are (example):

diag-algorithm elpa-2
diag-elpa-usegpu T
diag-blocksize 16
# Optional
number-of-eigenstates 17320
use-tree-timer T


For the second (ELSI) method: 

solution-method elsi
elsi-solver elpa
elsi-elpa-gpu 1
elsi-elpa-flavor 2

# Optional 
number-of-eigenstates 17320
use-tree-timer T
elsi-output-level 3

The installation of ELPA and ELSI with GPU support is system-specific, but you 
can get inspiration from the following examples:

* ELPA (on Marconi-100 at CINECA, with IBM P9 chips and nVidia A100 GPUs, using 
the gcc compiler):

Script to configure:

#!/bin/sh

# (Need to define properly the symbols used below)   
# Note that the P9 does not use the typical Intel kernels

FC=mpifort CC=mpicc CXX=mpic++ \
CFLAGS="-O3 -mcpu=native -std=c++11" \
FCFLAGS="-O3 -mcpu=native -ffree-line-length-none" 
LDFLAGS="${SCALAPACK_LIBS} ${LAPACK_LIBS}" \
../configure \
--with-cuda-path=${CUDA_HOME} \
--with-cuda-sdk-path=${CUDA_HOME} \
--enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_70 \
--enable-NVIDIA-gpu-memory-debug  --enable-nvtx  \
--disable-sse-assembly --disable-sse --disable-avx --disable-avx2 
--disable-avx512 \
--enable-c-tests=no --prefix=$PRJ/bin/gcc/elpa/2021.05.002.jul22


(Adapt the options to your system)

* ELSI

SET(CMAKE_INSTALL_PREFIX "$ENV{BASE_DIR}/elsi/2.6.2" CACHE STRING "Installation 
dir")

SET(CMAKE_Fortran_COMPILER "mpif90" CACHE STRING "MPI Fortran compiler")
SET(CMAKE_C_COMPILER "mpicc" CACHE STRING "MPI C compiler")
SET(CMAKE_CXX_COMPILER "mpicxx" CACHE STRING "MPI C++ compiler")

SET(CMAKE_Fortran_FLAGS "-O2 -g -fbacktrace -fdump-core" CACHE STRING "Fortran 
flags")
SET(CMAKE_C_FLAGS "-O2 -g -std=c99" CACHE STRING "C flags")
SET(CMAKE_CXX_FLAGS "-O2 -g -std=c++11" CACHE STRING "C++ flags")
SET(CMAKE_CUDA_FLAGS "-O3 -arch=sm_70 -std=c++11" CACHE STRING "CUDA flags")
# Workaround: specify -std=c++11 in CMAKE_CUDA_FLAGS to avoid __ieee128 
gcc/cuda bug

SET(USE_GPU_CUDA ON CACHE BOOL "Use CUDA-based GPU acceleration in ELPA")
SET(ENABLE_PEXSI ON CACHE BOOL "Enable PEXSI")
SET(ENABLE_TESTS ON CACHE BOOL "Enable tests")
#SET(ADD_UNDERSCORE OFF CACHE BOOL "Do not suffix C functions with an 
underscore")

SET(LIB_PATHS 
"/cineca/prod/opt/libraries/lapack/3.9.0/gnu--8.4.0/lib;/cineca/prod/opt/libraries/scalapack/2.1.0/spectrum_mpi--10.3.1--binary/lib;/cineca/prod/opt/compilers/cuda/11.0/none/lib64;/cineca/prod/opt/libraries/essl/6.2.1/binary/lib64"
 CACHE STRING "External library paths")

SET(LIBS "scalapack;lapack;essl;cublas;cudart" CACHE STRING "External 
libraries")
You should modify appropriately the location and version numbers of your 
libraries.

Finally, a note about the importance of the proper execution incantation, for 
"pinning" the MPI ranks to the appropriate GPU:

(There are probably better and more streamlined ways to do this)

For this example I use the 32 cores (2x16) in Marconi for MPI tasks, no OpenMP, 
and do not take advantage of the 4x hyperthreading.

The slurm script I typically use is:  (gcc_env et al are my own Lmod modules)
=
#!/bin/bash
#SBATCH --job-name=test-covid
#SBATCH --account=Pra19_MaX_1
#SBATCH --partition=m100_usr_prod
#SBATCH --output=mpi_%j.out
#SBATCH --error=mpi_%j.err
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=32
#SBATCH --ntasks-per-socket=16
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:4
#SBATCH --time=00:19:00

#
ml purge
ml gcc_env
ml siesta-max/1.0-14
#
date
which siesta
echo "---"
#
export OMP_NUM_THREADS=1
#
mpirun --map-by socket:PE=1 --rank-by core --report-bindings \
   -np ${SLURM_NTASKS} ./gpu_bind.sh \
   siesta covid.fdf
=

The crucial part is the gpu_bind.sh script, which contains code to make sure 
that each socket
talks to the right GPUs (1st socket, GPU0/GPU1), 2nd socket, GPU2/GPU3), and 
within each socket, the first 8 tasks
use GPU0/2 and the second group of 8 tasks use GPU1/3. For this, the tasks have 
to be ordered. (This is specific to Marconi). I found that using
the
  
   --map-by socket:PE=1 --rank-by-core

incantation I could achieve that ordering. 

The