Laurence Marks píše v St 04. 04. 2018 v 16:01 +0000: > I confess to being rather doubtful that gfortran+... is comparable to > ifort+... for Intel cpu, it might be for AMD. While the mkl vector > libraries are useful in a few codes such as aim, they are minor for > the main lapw[0-2].
Well, some fast benchmark data then (serial benchmark single core): Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (haswell) Wien2k 17.1 ------------- gfortran 7.3.1 + OPENBLAS 0.2.20 + glibc 2.26 (with the custom patch to use libmvec): Time for al,bl (hamilt, cpu/wall) : 0.2 0.2 Time for legendre (hamilt, cpu/wall) : 0.1 0.2 Time for phase (hamilt, cpu/wall) : 1.2 1.2 Time for us (hamilt, cpu/wall) : 1.2 1.2 Time for overlaps (hamilt, cpu/wall) : 2.6 2.8 Time for distrib (hamilt, cpu/wall) : 0.1 0.1 Time sum iouter (hamilt, cpu/wall) : 5.5 5.8 number of local orbitals, nlo (hamilt) 304 allocate YL 2.5 MB dimensions 15 3481 3 allocate phsc 0.1 MB dimensions 3481 Time for los (hamilt, cpu/wall) : 0.4 0.3 Time for alm (hns) : 0.1 Time for vector (hns) : 0.3 Time for vector2 (hns) : 0.3 Time for VxV (hns) : 2.1 Wall Time for VxV (hns) : 0.1 245 Eigenvalues computed Seclr4(Cholesky complete (CPU)) : 1.380 40754.14 Mflops Seclr4(Transform to eig.problem (CPU)) : 4.470 37745.44 Mflops Seclr4(Compute eigenvalues (CPU)) : 12.750 17643.13 Mflops Seclr4(Backtransform (CPU)) : 0.290 10237.08 Mflops TIME HAMILT (CPU) = 5.8, HNS = 2.5, HORB = 0.0, DIAG = 18.9 TIME HAMILT (WALL) = 6.1, HNS = 2.5, HORB = 0.0, DIAG = 19.0 real 0m28.610s user 0m27.817s sys 0m0.394s ----------- Ifort 17.0.0 + MKL 2017.0: Time for al,bl (hamilt, cpu/wall) : 0.2 0.2 Time for legendre (hamilt, cpu/wall) : 0.1 0.2 Time for phase (hamilt, cpu/wall) : 1.2 1.3 Time for us (hamilt, cpu/wall) : 1.0 1.0 Time for overlaps (hamilt, cpu/wall) : 2.6 2.8 Time for distrib (hamilt, cpu/wall) : 0.1 0.1 Time sum iouter (hamilt, cpu/wall) : 5.4 5.6 number of local orbitals, nlo (hamilt) 304 allocate YL 2.5 MB dimensions 15 3481 3 allocate phsc 0.1 MB dimensions 3481 Time for los (hamilt, cpu/wall) : 0.2 0.2 Time for alm (hns) : 0.0 Time for vector (hns) : 0.4 Time for vector2 (hns) : 0.4 Time for VxV (hns) : 2.1 Wall Time for VxV (hns) : 0.1 245 Eigenvalues computed Seclr4(Cholesky complete (CPU)) : 1.110 50667.31 Mflops Seclr4(Transform to eig.problem (CPU)) : 3.580 47129.09 Mflops Seclr4(Compute eigenvalues (CPU)) : 11.320 19873.04 Mflops Seclr4(Backtransform (CPU)) : 0.250 11875.01 Mflops TIME HAMILT (CPU) = 5.7, HNS = 2.6, HORB = 0.0, DIAG = 16.3 TIME HAMILT (WALL) = 5.9, HNS = 2.6, HORB = 0.0, DIAG = 16.3 real 0m25.587s user 0m24.857s sys 0m0.321s ------------- So I apologize for my statement in the last email that was too ambitious. Indeed in this particular case the opensource stack is ~12% slower (25 vs 28 seconds). Most of this is in the DIAG part (which I believe is where OpenBLAS comes to play). However on some other (older) Intel CPUs the DIAG part can be even faster with OpenBLAS, see the already mentioned email by prof. Blaha https://www.mail-archive.com/wie n...@zeus.theochem.tuwien.ac.at/msg15106.html where he tested on i7-3930K (sandybridge), hence for those older CPUs I would expect the performance to be really comparable (with the small patch to utilize the libmvec in order to speed up the HAMILT part). In general the opensource support is usually slow to materialize hence the performance on older CPUs is better. Especially in the OpenBLAS where the optimizations for new CPUs and instruction sets are not provided by Intel (contrary to the gcc, gfrortran and glibc where Intel engineers contribute directly) while the MKL and ifort have good support from day 1. I do agree that it is better to advise users to use MKL+ifort since when they have it properly installed the siteconfig is almost always able to detect and build everything out of the box with default config. This is unfortunately not the case with the opensource libraries, where the detection does not work most of time due to distro differences and the unfortunate fact that majority of the needed libraries does not provide any good means for autodetection (e.g. proper package config files), hence the user must edit the compiler flags by hand. I just believe that the "ifort is always much faster that gfortran" dogma is no longer always true. Best regards Pavel _______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html