Re: [QE-users] Sub optimal performance on 32 core AMD machine

Andrii Shyichuk via users Tue, 17 Nov 2020 03:39:06 -0800

 

Just to add, I'd go through the trouble of manual compiling and tweaking
fftw3 and OpenBLAS. 
OpenBLAS can be serial or threaded, and its threaded performance is not
always optimal (a known issue). 
Fftw3 (from my experience) compiles and runs super fine on Ryzen, but
the overall performance gain from it is 10% tops. 
If you are looking for a more crucial bottleneck - it is probably not
purely at the FFT part.


On a Xeon supercomputer, PW runs faster (like, twice as fast or more)
when compiled with Intel compiler, compared to GCC (although maybe it is
just that one particular cluster). 
For some reason, I use 8 or 12 threads, I could be I ran some tests for
it. 
On AMD machine, it might be different, and yet again it calls for some
compilation tweaking. 
I'd made a few versions and benchmarked them. 

Finally, you might be getting memory issues. Try making a CPU-heavy task
and a memory-heavy task, both to run for at least few minutes, and see
how much gain you get when going from 1 to 32 cores. 

Best regards, and good luck!
Andrii Shyichuk, University of Wrocław.

W dniu 2020-11-16 20:36, Carlo Nervi napisał: 

> While compiling with gcc/gfortran use -march=native option. 
> With Intel i guess it is -xHost 
> This will enable the CPU native instruction. 
> With Epyc I guess a lot depends on the libraries. With gcc I used openblas 
> and fftw3. On Ryzen they work great. 
> HTH, 
> Carlo 
> 
> Il giorno lun 16 nov 2020 alle ore 18:24 Husak Michal <[email protected]> 
> ha scritto: 
> 
>> No. I do no use hyperthreading.
>> I had checked both OpenMPI and OpenMP - in outup file ...
>> It shows what I set by the launch and enviromental variable correctly.
>> If I use more than 4 cores in any way, I get no speed up ...
>> ________________________________
>> From: users <[email protected]> on behalf of Pietro 
>> Delugas <[email protected]>
>> Sent: Monday, November 16, 2020 3:42:59 PM
>> To: Quantum ESPRESSO users Forum
>> Subject: Re: [QE-users] Sub optimal performance on 32 core AMD machine
>> 
>> Hi
>> Are you using hyperthreading ?
>> If yes, you should try  to use  only 32 cores.  Check the total number of 
>> processors at the beginning of the run.
>> The program makes an intense usage of the cpus so that  hyperthreading  is 
>> usually inefficient.
>> Pietro
>> 
>> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>> 
>> From: Michal Husak<mailto:[email protected]>
>> Sent: Monday, November 16, 2020 3:19 PM
>> To: Quantum ESPRESSO users Forum<mailto:[email protected]>
>> Subject: [QE-users] Sub optimal performance on 32 core AMD machine
>> 
>> I
>> 
>> _______________________________________________
>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu [1])
>> users mailing list [email protected]
>> https://lists.quantum-espresso.org/mailman/listinfo/users
> 
> -- 
> 
> ------------------------------------------------------------
> Prof. Carlo Nervi [email protected]  Tel:+39 0116707507/8
> Fax: +39 0116707855      -      Dipartimento di Chimica, via
> P. Giuria 7, 10125 Torino, Italy.    http://lem.ch.unito.it/
> 
> ICCC2020 HAS BEEN POSTPONED AT 2022
> 
> ICCC 2022 28 August - 2 September 2022, Rimini, Italy: 
> http://www.iccc2020.com [2]
> International Conference on Coordination Chemistry (ICCC 2022)
> 
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu [1])
> users mailing list [email protected]
> https://lists.quantum-espresso.org/mailman/listinfo/users

  

Links:
------
[1] http://www.max-centre.eu
[2] http://www.iccc2020.com/

_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] Sub optimal performance on 32 core AMD machine

Reply via email to