Hi what i usually compile up against for performance on intel compilers. FFLAGS=-O3 -ipo -xHost -ip -prec-div -prec-sqrt -opt-prefetch -mkl=parallel Do not use -g ( you cant debug, but if you havnt had use for it, then why use it? :) ) Do not use -mp, it is a deprecated option and there are other flags that should perform better. For instance -prec-div -prec-sqrt. -ipo will make the compile time a much longer process when linking in the final step, i have seen little performance gain from this. -xHost should be used if the machine you compile on is the same as the excution nodes. Otherwise you should use -xSSE3 (E5000 has this). The last -mkl=parallel enables the mkl library to use parallel execution on some of the routines (notice that this option will not affect blacs as it is already parallelised.). I have had better runs with it! But be aware that you have to set the OMP_NUM_THREADS=x, otherwise it will use all available cores on the machine. Try and test -mkl=parallel against -mkl=sequential to see the results on your setup.
Kind regards Nick 2011/10/1 Tarek Tawalbeh <[email protected]> > Dear Siesta Community > > It works now. I went back to searching the mailing list archive (more > creatively this time) and found Jesús Carrete > Montaña<http://www.mail-archive.com/[email protected]/msg03349.html>'s > very helpful post on compiling siesta with ifort12. > > Filling in the refrences my arch.make was missing the compiled executable > seems to work just fine no. > > I ran a few tests, and compared the output of siesta-3.1 with that of > siesta-2.0.1 (which was compiled with scalapack and blacs from netlib.org) > for a non-optimized C4 graphene unit cell. using the same fdf file and > pseudopotential (and same number of threads ofcourse). The output was a very > near match, for most values an exact match. > However, Siesta-3.1 was notably slower: > > 2.0.1: > timer: CPU execution times: > timer: Routine Calls Time/call Tot.time % > timer: siesta 1 2362.448 2362.448 100.00 > > 3.1: > timer: CPU execution times: > timer: Routine Calls Time/call Tot.time % > timer: siesta 1 3321.408 3321.408 100.00 > > > Now that I have a working executable, I will tinker a little with the > optimizations, I also suspect that throughout its different iterations my > arch.make file acquired some dead weight. > I will post my results if anyone is interested. > > The attached arch.make file works on Intel Xeon clusters (E5000 series) > using only mkl libs: > Intel fce 10.1.017 > Intel mkl 10.01.014 > OpenMPI 1.2.7 > > I would appreciate any comments on the arch.make file as I don't have much > experience with intel's mkl and math libraries or compilation in general for > that matter. >
