You are right it seems znver4 works for AMD's Here are some benchmarks, don't take them too serious, Wien2k may behave different from such simple tests.
As Laurence told -O3 doesn't do better than -O2 Note the use of some optimization switches makes sometimes only sense in case the code supports the vectorization. On Intel XEONs it seems that for complex operations ifx performs still worth than ifort (I don't know whether this is improved in 2025.2), on EPIYCs I am not that sure about improvements, the AMD compiler does not do better than the Intel one The hint of AMD on -axCORE-AVX512 seems to be useless on my EPYCs (ifx tells that it doesn't vectorize the loop) The old gfortran was realy bad. Benchmarks from MC Rutter https://www.mjr19.org.uk/ (conjg calculates the complex conjugate, mult calculate multiplications mult-dble is transformed by me from mult z (mult-cmplx) just to see what real operations do) Times in ns per operation for conjg mult-cmplx mult-dble Intel Xeon(R) E5-2697 ----------- ifort 24.2 O2 0.422 1.322 0.230 O2 host 0.150 0.387 0.122 O3 0.424 1.328 0.234 O3 host 0.151 0.387 0.123 ifx 25.1 O2 0.423 0.946 0.256 O2 host 0.906 1.050 0.155 O3 0.426 0.950 0.257 O3 host 0.910 1.062 0.157 gfortran v 7.5 O3 0.364 1.442 19.532 O3 generic 0.365 1.442 19.534 O3 avx2 0.364 0.880 10.261 AMD EPYC 9354 ------------------- ifort 24.2 O3 0.346 0.596 0.174 O3 AVX512 0.345 0.597 0.166 O3 znver4 0.346 0.600 0.173 ifx 25.1 O2 0.264 0.596 0.196 O2 znver4 0.467 0.334 0.118 O3 0.264 0.597 0.170 O3 AVX512 0.263 0.595 0.268 O3 znver4 0.467 0.334 0.117 gfortran v 14.2 O3 0.273 0.719 0.216 O3 generic 0.273 0.716 0.228 O3 avx512 0.264 0.284 0.111 O3 znver4 0.274 0.283 0.104 flang 5.0 O3 0.264 0.571 0.266 O3 znver4 0.266 0.321 0.126 ================================= Compiler switches Intel ifx 2025.1 or ifort 2024.2 -xhost -axCORE-AVX2 -axCORE-AVX512 -axCORE-AVX2,CORE-AVX512 GNU gfortran -march=znver4 -march=core-avx2 -mavx2 -mavx512f -mtune=generic AMD flang -march=znver4 Ciao Gerhard DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy: "I think the problem, to be quite honest with you, is that you have never actually known what the question is." ==================================== Dr. Gerhard H. Fecher Institut of Physics Johannes Gutenberg - University 55099 Mainz ________________________________________ Von: Wien [[email protected]] im Auftrag von Straus, Daniel B [[email protected]] Gesendet: Mittwoch, 10. September 2025 16:08 An: A Mailing list for WIEN2k users Betreff: Re: [Wien] FFTW and ifx/icx issue relevant to WIEN2k You're right in that Intel doesn't officially list any of the znver# flags as supported by the LLVM/CLANG based compilers. However, it seems that the LLVM backend supports them. See, for example, https://community.intel.com/t5/Intel-Fortran-Compiler/Compilation-error-with-fast-on-AMD-Ryzen-9-9900X-using-ifx/td-p/1712241, https://stackoverflow.com/questions/79174824/why-do-gcc-icx-and-clang-not-auto-vectorize-using-avx-512-based-instructions-on. While this is not conclusive, if you attempt to specify march=znver5 for the 2024 oneapi compilers, compilation fails because znver5 is not recognized. march=znver4 allows programs to be compiled properly in the 2024 oneapi version. That said, AMD says to specify -axCORE-AVX512 when using oneapi, so it is not clear exactly what's going on. https://docs.amd.com/r/en-US/63857-AOCC-quick-start-guide/AMD-EPYC-9xx5-Series-Processors-Compiler-Options-Quick-Reference All I know is that AVX512 code is generated when march=znver5 is passed as a flag to the oneapi 2025.2 compilers, and that's good enough for me. Trying to compile FFTW with AVX512 enabled fails when no march flag is passed to icx. Daniel Straus Assistant Professor Department of Chemistry Tulane University 5088 Percival Stern Hall 6400 Freret Street New Orleans, LA 70118 (504) 862-3585 http://straus.tulane.edu/ -----Original Message----- From: Wien <[email protected]> On Behalf Of Fecher, Gerhard Sent: Wednesday, September 10, 2025 1:45 AM To: A Mailing list for WIEN2k users <[email protected]> Subject: Re: [Wien] FFTW and ifx/icx issue relevant to WIEN2k External Sender. Be aware of links, attachments and requests. more comments I could not find that zenver5 is a valid CPU architecture for ixc or ifx on https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2025-2/march.html (this concerns also other CPU dependent compiler switches -x, -ax, -arch, there is no zenverX) It seems it was just used by a "beginner" hoshi on https://community.intel.com/t5/Intel-Fortran-Compiler/Compilation-error-with-fast-on-AMD-Ryzen-9-9900X-using-ifx/td-p/1712241 I would guess -march=znver5 (because it can be used with the GNU compilers) is just ignored why should Intel be interested to write an optimized comnpiler for AMD CPU's ? Did you ever test whether -march=znver5 changes anything ? As mentioned earlier -axCORE-AVX512, -axCORE-AVX2 or a combination of both may work on AMD processors (at least they don't slow the programm seriuously, and I didn't find dead electrons) There was already a lot of discussion on FFTW3 and ELPA at the beginning of the year Ciao Gerhard DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy: "I think the problem, to be quite honest with you, is that you have never actually known what the question is." ==================================== Dr. Gerhard H. Fecher Institut of Physics Johannes Gutenberg - University 55099 Mainz ________________________________________ Von: Wien [[email protected]] im Auftrag von Laurence Marks [[email protected]] Gesendet: Dienstag, 9. September 2025 23:20 An: A Mailing list for WIEN2k users Betreff: Re: [Wien] FFTW and ifx/icx issue relevant to WIEN2k Comments. 1. I have never seen -O3 do anything with icc/ifort except kill defenceless electrons and make the code slower. I will be happy to be proved wrong with ifx/icx. 2. I always use -mkl, rather than making mistakes chasing how intel changes its libraries. 3. I think you might have issues with -mkl_cdft (intel's version of fftw) and FFTW3 On Tue, Sep 9, 2025 at 4:10 PM Straus, Daniel B <[email protected]<mailto:[email protected]>> wrote: Sorry for the long delay in responding—I was set to receive a digest of list messages, and it only comes once every couple of weeks. Yes, this is on a Zen 5 computer, and it is running Rocky Linux 10. I am using the Intel compiler and MKL, rather than the one AMD provides. IFX and ICX support the march=znver5 flag. All the WIEN2k 24.1 patches available as of 9/1 were installed. To be clear, on my workstation, FFTW still will work with WIEN2k even if the autoconf script is not regenerated, but there may be a performance impact as it is not using the proper Intel libraries for Fortran calls to FFTW. However, 3ddens would then not compile, and if I also attempted to use ELPA, then parallel LAPW1 would not compile. Regenerating the autoconf script for FFTW and recompiling it solved both problems. You should check the config.log for your FFTW compilation to see if there is a line such as “ld: cannot find -loopopt=0” to see if this error is occurring. For me, the configure script continued even after this error, but it was using GNU default libraries rather than the Intel provided libraries. siteconfig_lapw is set to use the ifx and icx compilers, and here are the flags under “Options” in siteconfig_lapw I am using the following compiler options for WIEN2k with the IFX compiler. Current settings: M OpenMP switch: -qopenmp O Compiler options: -O3 -march=znver5 -traceback -assume buffered_io -FR -I$(MKLROOT)/include L Linker Flags: $(FOPT) -L$(MKLROOT)/lib -lpthread -lm -ldl -liomp5 -Wl,-rpath,$MKLROOT/lib P Preprocessor flags '-DParallel' R R_LIBS (LAPACK+BLAS): -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core F FFTW options: -DFFTW3 -DFFTW_OMP -I/home/software/fftw-3.3.10/include FFTW-LIBS: -L/home/software/fftw-3.3.10/lib -lfftw3 -lfftw3_omp X LIBX options: LIBXC-LIBS: For Parallel Options in siteconfig_lapw, here are the flags I am using: Your current parallel settings (options and libraries) are: C Parallel Compiler: mpiifx FP Parallel Compiler Options: -O3 -FR -march=znver5 -fc=ifx -traceback -assume buffered_io -I$(MKLROOT)/include MP MPIRUN command: mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ O Parallel OpenMP switch: -qopenmp Additional setting for SLURM batch systems (is set to 1 otherwise): CN Number of Cores: 1 Libraries: Sp SCALAPACK: -L$(MKLROOT)/lib -lmkl_scalapack_lp64 -L$(MKLROOT)/lib -lmkl_blacs_intelmpi_lp64 E ELPA options: -DELPA -I/home/software/elpa-2025.06.001/include/elpa-2025.06.001/elpa -I/home/software/elpa-2025.06.001/include/elpa-2025.06.001/modules ELPA-LIBS: -lelpa -L/home/software/elpa-2025.06.001/lib -Wl,-rpath=/home/software/elpa-2025.06.001/lib RP Parallel-Libs: $(R_LIBS) -lmkl_cdft_core In case it’s relevant here is what I passed to the configure script for FFTW3 (after regenerating the script with autoconf): module load oneapi/2025.2.0 ./configure --prefix=/home/software/fftw-3.3.10 CC="mpiicx -cc=icx" MPICC="mpiicx -cc=icx" F77="mpiifx -fc=ifx" FFLAGS="-O3 -march=znver5 -I"${MKLROOT}/include"" CFLAGS="-O3 -march=znver5 -I"${MKLROOT}/include"" CXXFLAGS="-I"${MKLROOT}/include"" LDFLAGS="-L${MKLROOT}/lib -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl" --enable-option-checking=fatal --enable-avx512 --enable-avx2 --enable-mpi --enable-openmp --enable-threads And for ELPA: module load oneapi/2025.2.0 ./configure --prefix=/home/software/elpa-2025.06.001 CC="mpiicx -cc=icx" CXX="mpiicpx -cxx=icpx" FC="mpiifx -fc=ifx" CFLAGS="-O3 -march=znver5 -I"${MKLROOT}/include"" FCFLAGS="-O3 -march=znver5 -I"${MKLROOT}/include"" CXXFLAGS="-O3 -march=znver5 -I"${MKLROOT}/include"" LDFLAGS="-L${MKLROOT}/lib -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl" --enable-option-checking=fatal --with-mpi=yes --enable-openmp=yes Hopefully this is helpful. Daniel Straus Assistant Professor Department of Chemistry Tulane University 5088 Percival Stern Hall 6400 Freret Street New Orleans, LA 70118 (504) 862-3585 http://straus.tulane.edu/ _______________________________________________ Wien mailing list [email protected]<mailto:[email protected]> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/[email protected]/index.html -- Emeritus Professor Laurence Marks (Laurie) Northwestern University Webpage<http://www.numis.northwestern.edu/> and Google Scholar link<http://scholar.google.com/citations?user=zmHhI9gAAAAJ&hl=en> "Research is to see what everybody else has seen, and to think what nobody else has thought", Albert Szent-Györgyi _______________________________________________ Wien mailing list [email protected] http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/[email protected]/index.html _______________________________________________ Wien mailing list [email protected] http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/[email protected]/index.html _______________________________________________ Wien mailing list [email protected] http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/[email protected]/index.html

