Dear Pavel, maybe it's better to ask Laurence, seems he was writing the VML things.
I didn't look into the code within the last years, what I found on a fast look is: The only place where the INTEL_VML is used any longer seems to be in Hamilt.f of LAPW1 I found that it is commented in all other cases where it was once used. If you don't use INTEL_VML, the INTEL ifort will vectorice the loops in vectf.f of LAPW1 (see code in Hamilt.f that calls it) (as I mentioned, maybe one has to link the libsvml explicitely) For example -O2 -xHost -qopt-report=1 -qopt-report-phase=vec will show you which loops were vectorized I could not see that the svml has a reduced accuracy, however, you can set the performance/accuracy level in the VML. What you can do is to set a threshhold for the loop size (similar to unroll), might need some short study of the manual. I could not see that in W2kinit.F a threshold for the loops (size of the arrays) was set, only the precision was set there for the INTEL_VML script, however, I guess that Laurence used it where only large arrays appeared. NB: I enjoy more questions about how to increase the speed or how to improve the code. Ciao Gerhard DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy: "I think the problem, to be quite honest with you, is that you have never actually known what the question is." ==================================== Dr. Gerhard H. Fecher Institut of Inorganic and Analytical Chemistry Johannes Gutenberg - University 55099 Mainz and Max Planck Institute for Chemical Physics of Solids 01187 Dresden ________________________________________ Von: Pavel Ondračka [pavel.ondra...@email.cz] Gesendet: Mittwoch, 2. Mai 2018 12:05 An: Fecher, Gerhard Betreff: Re: [Wien] Installation with MPI and GNU compilers I'm using private answer since this might be getting too technical for the list and in fact not interesting for majority of users... Fecher, Gerhard píše v St 02. 05. 2018 v 09:00 +0000: > I never checked that: does the -DINTEL_VML switch correspond to the > VML library routines of MKL > or to the > SVML library routines of the compiler The lapw1 calls directly the VML library, for example the vdcos, vdsin functions, but I have not checked the rest of Wien2k. > this makes a difference, the svml routines are automatically invoked > by the INTEL compiler if one uses -O2 optimization or higher. > (check also the usage of the switches -vec, -no-vec, -vec-report) > > The VML routines of the MKL make only sense for appropriate sizes of > the vectors, otherwise, they may even slow down the program (how much > might also depend on threads etc.). The common usage of the VML in Wien2k is to call the VML functions with a _large_ array as an argument. So if I understand it correctly the vectorization is done inside the VML and the VML chooses the best intrinsic. Since the arrays are large, there is a speedup in all cases. BTW are you sure the -O2 switch alone will give you the svml intrinsic? IMO the svml intrinsic have different accuracy (might not be strictly IEEE compliant as compared to the scalar variants) so I would expect you need to specify it explicitly with some additional flag that you are OK with this (e.g. for GCC you need the -ffast-math switch to get the vectorized sse,avx goniometric fuctions from the libmvec). > A note (for the INTEL Fortran): > I vaguely remember that the -DINTEL_VML switch did not bring any > better performance, at that time one needed to give the -lsvml (with > path to the compiler libs) explicitely. > > Ciao > Gerhard > Best regards Pavel _______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html