Re: [QE-users] Sub optimal performance on 32 core AMD machine

Carlo Nervi Tue, 17 Nov 2020 11:11:24 -0800

Hello Pamela,
I don't know whether it is clear or not, so I apologize if I repeat obvious
concepts.
I just bought a Threadripper 3990X with 64 core 128 threads. As far as I
remember the 3960X should have 24 core - 48 threads.
It is very very important to don't use more than 24 cores on 3960X . Simply
forget about hyperthreading. No need to disable it in the BIOS, but simply
count the real number of cores.


I use gcc 9.3.0 and the new gcc 10 should be even better for AMD cpus.
With openblas 0.3.12 I found that my 8-cores home Ryzen 3800X is fast as a
Xeon 12 cores E5-2680 using quantum espresso 6.4.1
Carlo


Il giorno mar 17 nov 2020 alle ore 19:24 Pamela Whitfield <
[email protected]> ha scritto:

> Michal
>
> I have a very similar use-case and looked into many of the same issues
> when I got my Threadripper 3960X system at the beginning of the year to
> supplement my old dual-Xeon setup. In the past few days I've been
> revisiting compilation as I got hold of a Quadro GV100 for GPU acceleration
> of my optimizations.
>
> Basically it seems as though code compiled for Zen2 either can't handle
> code compiled for both MPI and OpenMP at all, or does so poorly even when
> it runs.
> Best performance for pw.x on v6.5 (I've been playing with GIPAW and
> there's no 6.6 compatible version yet) has been with a simple gcc OpenMPI
> compilation without openmp threading and with about 20 MPI cores on my 24
> core CPU. Compiling with GCC or PGI compiler made little difference,
> although only the more recent PGI compilers will have zen2 optimization.
> I get little benefit from Intel MKL over openblas/lapack/fftw3 even with
> the debug tweaks, etc.
> Puget Systems numbers with other programs suggest that OpenMP only
> performs better than OpenMPI with Threadripper but I find the opposite with
> QE.
> I did try disabling hyperthreading in the BIOS but that made no difference
> to the performance.
>
> GPU compilation really shows the issue with MPI/OpenMP clashing. With the
> Xeons I could compile code with MKL that would run well on a Quadro K6000
> while offloading to the CPU with MPI when needed. It could still be a
> compiler issue (have to use PGI with the GPU version) but it just doesn't
> work with the 3960X, and some things don't thread well with pure OpenMP
> (e..g dftd3 versus dftd2) so I'll still need to use separately compiled
> versions of 6.5 for different problems.
>
> BTW with a dual CPU system you may benefit from pinning threads to
> particular CPUs - it works on the dual Xeon in any case. My Threadripper
> balances the load across the cores in a pretty dynamic manner and that's on
> a single socket.
>
> Best regards
> Pam Whitfield
>
> Independent Consultant
>
>
>
>
>
> Message: 1
> Date: Mon, 16 Nov 2020 15:19:04 +0100
> From: Michal Husak <[email protected]>
> To: Quantum ESPRESSO users Forum <[email protected]>
> Subject: [QE-users] Sub optimal performance on 32 core AMD machine
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset="UTF-8"; format=flowed
>
> I had purchased a new PC with 2x 16 core AMD EPYC processors . 64
> cores with hyper threading ...
> I was hoping my QM programs (Quantum Espresso, CASTEP) will run on the new
> system faster, than on my old 4 core i7 Intel machine (8 year old) ....
>
> To my great surprise, the opposite is almost true :-(.
> My main task is scf and geometry optimization of middle sized organic
> molecular crystals (abut 100 C,H,N per unit cell) ...
>
> I was playing with OpenMPI/OpenMP setup changes ...
> I was playing with the secret MKL_DEBUG_CPU_TYPE=5 parameter
> (responsible for slow run of Intel MKL compiled code on AMD) ...
>
> Nothing helps, the best speed is obteined when I  use only 4 cores
> (OpenMPI or OpenMP - results similar) ...
> Using 16 or 32 cores gives almost no benefit ...
> The CPU load for run on 1/4/816/32 coresponds to the nubmer of CPU
> set = they try to do something ...
>
> Any idea what I should check, try optimize ?
>
> Maybe the bottleneck is memory access, not CPU power  (I have 128
> GB  almost not used RAM) ?
>
> Michal Husak
>
> UCT Prague
>
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list [email protected]
> https://lists.quantum-espresso.org/mailman/listinfo/users



-- 

------------------------------------------------------------
Prof. Carlo Nervi [email protected]  Tel:+39 0116707507/8
Fax: +39 0116707855      -      Dipartimento di Chimica, via
P. Giuria 7, 10125 Torino, Italy.    http://lem.ch.unito.it/

*ICCC2020 has been postponed at 2022*

ICCC 2022 28 August - 2 September 2022, Rimini, Italy: http://www.iccc2020.com
International Conference on Coordination Chemistry (ICCC 2022)

_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] Sub optimal performance on 32 core AMD machine

Reply via email to