Hello,

We are writing a section on GPUs in the documentation, but until it is ready 
you can use the ideas below:


There are two ways to take advantage of GPUs (enabled only for the solver 
stage, which typically takes up the most time):

-- Using the ELPA library and its native interface in Siesta (this method is 
available for Siesta versions 4.1.5 and up)

-- Using the ELSI library (for Siesta "MaX" versions 
         (see the Guide to Siesta Versions in 
https://gitlab.com/siesta-project/siesta/-/wikis/Guide-to-Siesta-versions)

In both cases the special installation instructions involve only enabling GPU 
support in either ELPA or ELSI, and using the proper options in Siesta.

For the first method the fdf options to enable GPUs are (example):

diag-algorithm elpa-2
diag-elpa-usegpu T
diag-blocksize 16
# Optional
number-of-eigenstates 17320
use-tree-timer T


For the second (ELSI) method: 

solution-method elsi
elsi-solver elpa
elsi-elpa-gpu 1
elsi-elpa-flavor 2

# Optional 
number-of-eigenstates 17320
use-tree-timer T
elsi-output-level 3

The installation of ELPA and ELSI with GPU support is system-specific, but you 
can get inspiration from the following examples:

* ELPA (on Marconi-100 at CINECA, with IBM P9 chips and nVidia A100 GPUs, using 
the gcc compiler):

Script to configure:

#!/bin/sh

# (Need to define properly the symbols used below)   
# Note that the P9 does not use the typical Intel kernels

FC=mpifort CC=mpicc CXX=mpic++ \
    CFLAGS="-O3 -mcpu=native -std=c++11" \
    FCFLAGS="-O3 -mcpu=native -ffree-line-length-none" 
LDFLAGS="${SCALAPACK_LIBS} ${LAPACK_LIBS}" \
    ../configure \
    --with-cuda-path=${CUDA_HOME} \
    --with-cuda-sdk-path=${CUDA_HOME} \
    --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_70 \
    --enable-NVIDIA-gpu-memory-debug  --enable-nvtx  \
    --disable-sse-assembly --disable-sse --disable-avx --disable-avx2 
--disable-avx512 \
    --enable-c-tests=no --prefix=$PRJ/bin/gcc/elpa/2021.05.002.jul22


(Adapt the options to your system)

* ELSI

SET(CMAKE_INSTALL_PREFIX "$ENV{BASE_DIR}/elsi/2.6.2" CACHE STRING "Installation 
dir")

SET(CMAKE_Fortran_COMPILER "mpif90" CACHE STRING "MPI Fortran compiler")
SET(CMAKE_C_COMPILER "mpicc" CACHE STRING "MPI C compiler")
SET(CMAKE_CXX_COMPILER "mpicxx" CACHE STRING "MPI C++ compiler")

SET(CMAKE_Fortran_FLAGS "-O2 -g -fbacktrace -fdump-core" CACHE STRING "Fortran 
flags")
SET(CMAKE_C_FLAGS "-O2 -g -std=c99" CACHE STRING "C flags")
SET(CMAKE_CXX_FLAGS "-O2 -g -std=c++11" CACHE STRING "C++ flags")
SET(CMAKE_CUDA_FLAGS "-O3 -arch=sm_70 -std=c++11" CACHE STRING "CUDA flags")
# Workaround: specify -std=c++11 in CMAKE_CUDA_FLAGS to avoid __ieee128 
gcc/cuda bug

SET(USE_GPU_CUDA ON CACHE BOOL "Use CUDA-based GPU acceleration in ELPA")
SET(ENABLE_PEXSI ON CACHE BOOL "Enable PEXSI")
SET(ENABLE_TESTS ON CACHE BOOL "Enable tests")
#SET(ADD_UNDERSCORE OFF CACHE BOOL "Do not suffix C functions with an 
underscore")

SET(LIB_PATHS 
"/cineca/prod/opt/libraries/lapack/3.9.0/gnu--8.4.0/lib;/cineca/prod/opt/libraries/scalapack/2.1.0/spectrum_mpi--10.3.1--binary/lib;/cineca/prod/opt/compilers/cuda/11.0/none/lib64;/cineca/prod/opt/libraries/essl/6.2.1/binary/lib64"
 CACHE STRING "External library paths")

SET(LIBS "scalapack;lapack;essl;cublas;cudart" CACHE STRING "External 
libraries")
You should modify appropriately the location and version numbers of your 
libraries.

Finally, a note about the importance of the proper execution incantation, for 
"pinning" the MPI ranks to the appropriate GPU:

(There are probably better and more streamlined ways to do this)

For this example I use the 32 cores (2x16) in Marconi for MPI tasks, no OpenMP, 
and do not take advantage of the 4x hyperthreading.

The slurm script I typically use is:  (gcc_env et al are my own Lmod modules)
=============================================================================
#!/bin/bash
#SBATCH --job-name=test-covid
#SBATCH --account=Pra19_MaX_1
#SBATCH --partition=m100_usr_prod
#SBATCH --output=mpi_%j.out
#SBATCH --error=mpi_%j.err
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=32
#SBATCH --ntasks-per-socket=16
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:4
#SBATCH --time=00:19:00

#
ml purge
ml gcc_env
ml siesta-max/1.0-14
#
date
which siesta
echo "-------------------"
#
export OMP_NUM_THREADS=1
#
mpirun --map-by socket:PE=1 --rank-by core --report-bindings \
       -np ${SLURM_NTASKS} ./gpu_bind.sh \
       siesta covid.fdf
=============================================================================

The crucial part is the gpu_bind.sh script, which contains code to make sure 
that each socket
talks to the right GPUs (1st socket, GPU0/GPU1), 2nd socket, GPU2/GPU3), and 
within each socket, the first 8 tasks
use GPU0/2 and the second group of 8 tasks use GPU1/3. For this, the tasks have 
to be ordered. (This is specific to Marconi). I found that using
the
  
   --map-by socket:PE=1 --rank-by-core

incantation I could achieve that ordering. 

The contents of gpu_bind.sh are:

====================================================
#!/bin/bash

np_node=$OMPI_COMM_WORLD_LOCAL_SIZE
rank=$OMPI_COMM_WORLD_LOCAL_RANK

block=$(( $np_node / 4 ))   # We have 4 GPUs
                            # If np_node is 32 (typical), then block=8

limit0=$(( $block * 1 ))
limit1=$(( $block * 2 ))
limit2=$(( $block * 3 ))
limit3=$(( $block * 4 ))

#-----------------

if [ $rank -lt $limit0 ]
then
   export CUDA_VISIBLE_DEVICES=0
 
elif [ $rank -lt $limit1 ]
then
   export CUDA_VISIBLE_DEVICES=1
 
elif [ $rank -lt $limit2 ]
then
   export CUDA_VISIBLE_DEVICES=2
else
   export CUDA_VISIBLE_DEVICES=3
fi

$@
====================================================


  I hope this helps.

  Best regards,

  Alberto


----- El 28 de Junio de 2022, a las 10:28, Mohammed Ghadiyali 
mohammed.ghadiy...@kaust.edu.sa escribió:

| Hello,
| 
| I’ve went Q&A available on max-center website and as per it Siesta can use
| GPU’s. However I’m not able to find any documentation on installation. So can
| some one inform me the procedure for installing Siesta on GPU’s. Our systems
| have 8xV100 (32 GB each) with NVLink.
| 
| Regards,
| Ghadiyali Mohammed Kader
| Post Doctoral Fellow
| King Abdullah University of Science and Technology
| 
| 
| This message and its contents, including attachments are intended solely for 
the
| original recipient. If you are not the intended recipient or have received 
this
| message in error, please notify me immediately and delete this message from
| your computer system. Any unauthorized use or distribution is prohibited.
| Please consider the environment before printing this email.
| 
| 
| --
| SIESTA is supported by the Spanish Research Agency (AEI) and by the European
| H2020 MaX Centre of Excellence (http://www.max-centre.eu/)
-- 
SIESTA is supported by the Spanish Research Agency (AEI) and by the European 
H2020 MaX Centre of Excellence (http://www.max-centre.eu/)

Responder a