Hello Pietro,
 
Thanks for advice. I indeed was experimenting with actual GPU version of QE, taken from the page you mentioned.
 
So, fft and linear algebra libraries (blas/lapack/elpa, fftw3) - I have those but compiled for CPU. It seems that they can be used for GPU version as well, without modification? For some reason, I thought that for GPU you have to have gpu compiled blas/lapack/fft and this is what NVIDIA HPC SDK provides. But it seems that QE does not use it (or at least, configure script does not).
 
Sergey
 
 
09.03.2021, 21:29, "Pietro Bonfa'" <[email protected]>:

Dear Sergey,

some (trivial) advice:

* version 6.7 detects accelerators but does not use them, the actual
release of the accelerated version is here: https://gitlab.com/QEF/q-e-gpu.
The two codes have been merged therefore the next release will include
GPU support as well.

* You may get a little speedup by linking fftw3, but most of the ffts
are done on the GPU with cufft.

* OpenMP should definitively be enabled and provides the way to fully
exploit the CPUs. Indeed, the number of *MPI processes* should be (as a
rule) equal to the number of GPUs (6 x node in your case).

* CUDA-aware MPI is an experimental feature. I have used it extensively
without problems though.

Hope this helps,
Pietro


On 3/9/21 3:04 PM, Sergey Lisenkov wrote:

 Hello,
 I have an access to IBM Power9 cluster with 6 V100 GPUs cores/node, and
 40 CPU cores/node. I have a CPU version of QE-6.7 running, but I would
 like to explore GPU version.
 We have Nvidia compilers installed (PGI 21.2, cuda 11.1, ESSL 6.2).
   When I ran congifure script, in the way described on Wiki page for
 QE-GPU, it creates 'make.inc' file with internal FFTW and USE_CUSOLVER.
 Also, configure give blas/lapack libraries from PGI.
 Is it the way it should be? I see that there are cublas, cufft and other
 cuda libraries, but can they be used in QE? ESSL also has
 "libesslsmcuda" library, but I don't know if it is relevant. All
 examples on QE-GPU Wiki page seems to be outdated, or I may be wrong.
 Also, since every computing node has 6 GPUs, I could use CUDA-aware MPI
 (enabled with __GPU_MPI flag). Should I provide OMP_NUM_THREADS variable
 (=40), to utilize CPU cores? BTW, configure script for some reason does
 not activate OpenMP (even if --enable-openmp) is used.
 Thanks,
   Sergey
 
 _______________________________________________
 Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
 users mailing list [email protected]
 https://lists.quantum-espresso.org/mailman/listinfo/users
 


_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

Reply via email to