Just to add, I'd go through the trouble of manual compiling and tweaking fftw3 and OpenBLAS. OpenBLAS can be serial or threaded, and its threaded performance is not always optimal (a known issue). Fftw3 (from my experience) compiles and runs super fine on Ryzen, but the overall performance gain from it is 10% tops. If you are looking for a more crucial bottleneck - it is probably not purely at the FFT part.
On a Xeon supercomputer, PW runs faster (like, twice as fast or more) when compiled with Intel compiler, compared to GCC (although maybe it is just that one particular cluster). For some reason, I use 8 or 12 threads, I could be I ran some tests for it. On AMD machine, it might be different, and yet again it calls for some compilation tweaking. I'd made a few versions and benchmarked them. Finally, you might be getting memory issues. Try making a CPU-heavy task and a memory-heavy task, both to run for at least few minutes, and see how much gain you get when going from 1 to 32 cores. Best regards, and good luck! Andrii Shyichuk, University of Wrocław. W dniu 2020-11-16 20:36, Carlo Nervi napisał: > While compiling with gcc/gfortran use -march=native option. > With Intel i guess it is -xHost > This will enable the CPU native instruction. > With Epyc I guess a lot depends on the libraries. With gcc I used openblas > and fftw3. On Ryzen they work great. > HTH, > Carlo > > Il giorno lun 16 nov 2020 alle ore 18:24 Husak Michal <[email protected]> > ha scritto: > >> No. I do no use hyperthreading. >> I had checked both OpenMPI and OpenMP - in outup file ... >> It shows what I set by the launch and enviromental variable correctly. >> If I use more than 4 cores in any way, I get no speed up ... >> ________________________________ >> From: users <[email protected]> on behalf of Pietro >> Delugas <[email protected]> >> Sent: Monday, November 16, 2020 3:42:59 PM >> To: Quantum ESPRESSO users Forum >> Subject: Re: [QE-users] Sub optimal performance on 32 core AMD machine >> >> Hi >> Are you using hyperthreading ? >> If yes, you should try to use only 32 cores. Check the total number of >> processors at the beginning of the run. >> The program makes an intense usage of the cpus so that hyperthreading is >> usually inefficient. >> Pietro >> >> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 >> >> From: Michal Husak<mailto:[email protected]> >> Sent: Monday, November 16, 2020 3:19 PM >> To: Quantum ESPRESSO users Forum<mailto:[email protected]> >> Subject: [QE-users] Sub optimal performance on 32 core AMD machine >> >> I >> >> _______________________________________________ >> Quantum ESPRESSO is supported by MaX (www.max-centre.eu [1]) >> users mailing list [email protected] >> https://lists.quantum-espresso.org/mailman/listinfo/users > > -- > > ------------------------------------------------------------ > Prof. Carlo Nervi [email protected] Tel:+39 0116707507/8 > Fax: +39 0116707855 - Dipartimento di Chimica, via > P. Giuria 7, 10125 Torino, Italy. http://lem.ch.unito.it/ > > ICCC2020 HAS BEEN POSTPONED AT 2022 > > ICCC 2022 28 August - 2 September 2022, Rimini, Italy: > http://www.iccc2020.com [2] > International Conference on Coordination Chemistry (ICCC 2022) > > _______________________________________________ > Quantum ESPRESSO is supported by MaX (www.max-centre.eu [1]) > users mailing list [email protected] > https://lists.quantum-espresso.org/mailman/listinfo/users Links: ------ [1] http://www.max-centre.eu [2] http://www.iccc2020.com/
_______________________________________________ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list [email protected] https://lists.quantum-espresso.org/mailman/listinfo/users
