Hi Chris
it might it be happening exactly the opposite.
if you don't specify anything the configure tries all the options from the best to the worse and the usage for mkl is tested as first guess if  I am not wrong. If you pass it a specific path just tries that one and deals  with it as expecting ordinary fftw library, so it may be failing in finding a working fft and turns on the internal one.

Could you send the make.inc files in the 2 cases or the config log ?
Pietro

On 03/01/2019 11:13 AM, Christoph Wolf wrote:
Dear all,

please forgive this "beginner" question but I am facing a weird problem. When compiling qe-6.4 (intel compiler, intel MPI+OpenMP) with or without intel's fftw libs I find that in openMP with 2 threads per core the intel fftw version is roughly "twice as slow" as the internal one

"internal"
     General routines
     calbec       :      2.69s CPU      2.70s WALL (    382 calls)
     fft          :      0.47s CPU      0.47s WALL (    122 calls)
     ffts         :      0.05s CPU      0.05s WALL (     12 calls)
     fftw         :     49.97s CPU     50.12s WALL (  14648 calls)
     Parallel routines
     PWSCF        :  1m45.03s CPU     1m46.59s WALL

"intel fftw"
     General routines
     calbec       :      6.36s CPU      3.20s WALL (     382 calls)
     fft          :      0.93s CPU      0.47s WALL (     121 calls)
     ffts         :      0.10s CPU      0.05s WALL (      12 calls)
     fftw         :    109.63s CPU     55.23s WALL (   14648 calls)
     Parallel routines
     PWSCF        :   3m18.32s CPU   1m41.01s WALL

as a benchmark I am running a perovskite with 120 k-points on 30 processors (one node); There is no (noticeable) difference if I export OMP_NUM_THREADS=1 (only MPI) so I guess I made some mistake during the build with regards to the libraries.

Build process is as below

module load intel19/compiler-19

module load intel19/impi-19


export FFT_LIBS="-L$MKLROOT/intel64"

export LAPACK_LIBS="-lmkl_blacs_intelmpi_lp64"

export CC=icc FC=ifort F77=ifort MPIF90=mpiifort MPICC=mpiicc


./configure --enable-parallel --with-scalapack=intel --enable-openmp


This detects BLAS_LIBS, LAPACK_LIBS, SCALAPACK_LIBS and FFT_LIBS.

I am not experienced with benchmarking so if my benchmark is garbage please suggest a suitable system!

Thanks in advance!
Chris

--
Postdoctoral Researcher
Center for Quantum Nanoscience, Institute for Basic Science
Ewha Womans University, Seoul, South Korea


_______________________________________________
users mailing list
[email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

Reply via email to