The GPU executable can be launched in the same way as the CPU one, but 
considering this:

  *
the number of mpi per node must be the same as the number of GPUs (2 mpi per 
node in your case). In principle you can try to use more mpi processes per GPU, 
but it is not recommended;
  *
you can enable openMP together with GPU (add --enable-openmp to ./configure) in 
order to exploit CPU threading in the few places where GPU porting is not 
present (no more than 8 thread per node, generally doesn't make much difference 
though)

I don't know which scheduler is in use in your system, here is an example of a 
batch job in slurm launching on 2 nodes with 2 GPUs:
------------------------------------------------------------------------------------------------------------
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:2
#SBATCH --time=00:20:00

module purge
module load hpcsdk/24.3

export OMP_NUM_THREADS=1

mpirun -np 4  /home/q-e/bin/pw.x  -nk 1 -nb 1 -input scf.in > scf.out
---------------------------------------------------------------------------------------------------------------

Hope it helps
Cheers,

Fabrizio

________________________________
From: Mauro Francesco Sgroi <[email protected]>
Sent: Friday, August 2, 2024 2:35 PM
To: Fabrizio Ferrari Ruffino <[email protected]>
Cc: Quantum ESPRESSO users Forum <[email protected]>
Subject: Re: [QE-users] Help for compilation with Nvidia HPC SDK

Dear Fabrizio,
thanks a lot for the explanation.
I was unsure about how to proceed and worried not to get the proper performance 
on the GPU.

May I ask for help regarding the way of running the code? Where can I find 
instructions on how to launch the executable?

For example, how can I control the number of GPUs used and the parallel 
processes?

I have 2 GPUs for each node.

Thanks a lot and best regards,
Mauro Sgroi.

_______________________

Dr. Mauro Francesco Sgroi

Department of Chemistry

University of Turin

Via Quarello 15a

I-10135 TORINO (Italy)

Tel.

+39 011-670-8372

+39 011-670-7364

e-mail: [email protected]<mailto:[email protected]>

Web:

www.met.unito.it<http://www.met.unito.it/>

www.chimica.unito.it<http://www.chimica.unito.it>

Orcid: https://orcid.org/0000-0002-0914-4217

Webex: 
https://unito.webex.com/webappng/sites/unito/dashboard/pmr/maurofrancesco.sgroi


Il giorno ven 2 ago 2024 alle ore 14:11 Fabrizio Ferrari Ruffino 
<[email protected]<mailto:[email protected]>> ha scritto:
Hi,
there are a few minor FFTXlib calls somewhere  in QE  which are still on CPU, 
therefore it is better to have a CPU fft backend enabled too. Whether to use 
the internal one or FFTW3 should not make much difference, since all the main 
stuff runs on gpu (therefore calling cuFFT).
In a CPU run the FFTW3 backend is faster than the internal one, but, as I said, 
in a GPU run it should be quite irrelevant.
Cheers,

Fabrizio
CNR IOM
________________________________
From: users 
<[email protected]<mailto:[email protected]>>
 on behalf of Mauro Francesco Sgroi via users 
<[email protected]<mailto:[email protected]>>
Sent: Friday, August 2, 2024 12:13 PM
To: Quantum ESPRESSO users Forum 
<[email protected]<mailto:[email protected]>>
Subject: [QE-users] Help for compilation with Nvidia HPC SDK

Dear all,
I am trying to compile the 7.3.1 version of Quantum Espresso using the last 
Nvidia HPC SDK (24.7) on Ubuntu 24.04.

I am configuring as follows:

export BLAS_LIBS='-L/opt/nvidia/hpc_sdk/Linux_x86_64/2024/math_libs/lib64 
-lcublas -lcublasLt -L/opt/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib 
-lblas -L/opt/nvidia/hpc_sdk/Linux_x86_64/2024/cuda/lib64 -lcudart'

export LAPACK_LIBS='-L/opt/nvidia/hpc_sdk/Linux_x86_64/2024/math_libs/lib64 
-lcusolver -lcurand -lcublas -lcublasLt -lcusparse 
-L/opt/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/lib -llapack 
-L/opt/nvidia/hpc_sdk/Linux_x86_64/2024/cuda/lib64 -lcudart'

export 
SCALAPACK_LIBS='-/opt/nvidia/hpc_sdk/Linux_x86_64/2024/comm_libs/12.5/openmpi4/openmpi-4.1.5/lib
 -lscalapack 
-L/opt/nvidia/hpc_sdk/Linux_x86_64/2024/comm_libs/12.5/openmpi4/latest/lib 
-lmpi -lopen-pal'

./configure --with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/2024/cuda/12.5 
--with-cuda-cc=75 --with-cuda-runtime=12.5 --with-cuda-mpi=yes

In this way, the internal FFTW library is selected. Should I use the FFTW3 
library together with cufft?

Can the two libraries work together? Is it normal that the internal FFTW 
library is used? Or should the cufft library be sufficient?

Or is it better to use the cufftw library supplied by NVIDIA?

Can I have some guidance on these aspects?

Thanks a lot and best regards,
Mauro Sgroi.


_______________________

Dr. Mauro Francesco Sgroi

Department of Chemistry

University of Turin

Via Quarello 15a

I-10135 TORINO (Italy)

Tel.

+39 011-670-8372

+39 011-670-7364

e-mail: [email protected]<mailto:[email protected]>

Web:

www.met.unito.it<http://www.met.unito.it/>

www.chimica.unito.it<http://www.chimica.unito.it>

Orcid: https://orcid.org/0000-0002-0914-4217

Webex: 
https://unito.webex.com/webappng/sites/unito/dashboard/pmr/maurofrancesco.sgroi
_______________________________________________
The Quantum ESPRESSO community stands by the Ukrainian
people and expresses its concerns about the devastating
effects that the Russian military offensive has on their
country and on the free and peaceful scientific, cultural,
and economic cooperation amongst peoples
_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

Reply via email to