Hi what i usually compile up against for performance on intel compilers.

FFLAGS=-O3 -ipo -xHost -ip -prec-div -prec-sqrt -opt-prefetch -mkl=parallel
Do not use -g ( you cant debug, but if you havnt had use for it, then why
use it? :) )
Do not use -mp, it is a deprecated option and there are other flags that
should perform better. For instance  -prec-div -prec-sqrt.
-ipo will make the compile time a much longer process when linking in the
final step, i have seen little performance gain from this.
-xHost should be used if the machine you compile on is the same as the
excution nodes. Otherwise you should use -xSSE3 (E5000 has this).
The last -mkl=parallel enables the mkl library to use parallel execution on
some of the routines (notice that this option will not affect blacs as it is
already parallelised.). I have had better runs with it! But be aware that
you have to set the OMP_NUM_THREADS=x, otherwise it will use all available
cores on the machine.
Try and test -mkl=parallel against -mkl=sequential to see the results on
your setup.

Kind regards Nick


2011/10/1 Tarek Tawalbeh <[email protected]>

>  Dear Siesta Community
>
> It works now. I went back to searching the mailing list archive (more
> creatively this time)  and found Jesús Carrete 
> Montaña<http://www.mail-archive.com/[email protected]/msg03349.html>'s
> very helpful post on compiling siesta with ifort12.
>
> Filling in the refrences my arch.make was missing the compiled executable
> seems to work just fine no.
>
> I ran a few tests, and compared the output of siesta-3.1 with that of
> siesta-2.0.1 (which was compiled with scalapack and blacs from netlib.org)
> for a non-optimized C4 graphene unit cell. using the same fdf file and
> pseudopotential (and same number of threads ofcourse). The output was a very
> near match, for most values an exact match.
> However, Siesta-3.1 was notably slower:
>
> 2.0.1:
> timer: CPU execution times:
> timer:  Routine       Calls   Time/call    Tot.time        %
> timer:  siesta            1    2362.448    2362.448   100.00
>
> 3.1:
> timer: CPU execution times:
> timer:  Routine       Calls   Time/call    Tot.time        %
> timer:  siesta            1    3321.408    3321.408   100.00
>
>
> Now that I have a working executable, I will tinker a little with the
> optimizations, I also suspect that throughout its different iterations my
> arch.make file acquired some dead weight.
> I will post my results if anyone is interested.
>
> The attached arch.make file works on Intel Xeon clusters (E5000 series)
> using only mkl libs:
> Intel fce 10.1.017
> Intel mkl 10.01.014
> OpenMPI 1.2.7
>
> I would appreciate any comments on the arch.make file as I don't have much
> experience with intel's mkl and math libraries or compilation in general for
> that matter.
>

Responder a