Dear experienced users, I have trouble in utilizing OpenMP with my compilation. From the output file, pw.x 6.8 recognizes "OMP_NUM_THREADS=2", but it took same time as "OMP_NUM_THREADS=1", and according to PBS batch queue, only 100% (not 200%) of CPU is used. Therefore, QE 6.8 with GPU is not as fast as expected.
I used nvidia HPC SDK 20.9, cuda 10.1, and Intel MKL 2021.2. The node has two Xeon Gold 6248, one Tesla V100 32GB, and 768GB of RAM. Benchmark results and make.inc are attached as tarball. Could you please point out my mistake? ---Sender--- Takahiro Chiba 1st-year student at grad. school of chem. sci. and eng., Hokkaido Univ. Expected graduation date: Mar. 2023 [email protected] -----
_______________________________________________ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list [email protected] https://lists.quantum-espresso.org/mailman/listinfo/users
