Re: [SIESTA-L] The use of BlockSize on the parallel code

Marcos Veríssimo Alves Wed, 08 Sep 2010 04:26:48 -0700

Hi Ludovic,

I am then supposing that you have used gfortran as the fortran compiler.
Your optimization level seems fine, O2 is not too aggressive. I can not
think of any tweaks you could do except add the flag -DGRID_DP to FPPFLAGS.
This will make some important variables in Siesta (I think those related to
the grid, mainly) to be compiled with double precision - if you don't
specify this, they are compiled with single precision to save memory. I am
not sure this will make any difference in the number of scf steps for
different values of the blocksize - as a matter of fact I am almost sure it
won't - but it's worth giving it a try, you never know.


If it doesn't work, then lower the level of optimization to O1, and/or even
to O0 if the problem persists. Optimizations can be nice, but they can hide
pretty nasty bugs.

I can also think of more esoteric problems such as differences in system
libraries that are present on your compiling and running nodes, although I
wouldn't think of this in first place. If the DGRID_DP and the change in
optimization level do not fix your problem, you could simply try to use
completely static linking, just to make sure - the default is to link
dynamically, which does depend on the libraries present in the running
nodes, and slows your code down a bit, at the expense of making it
physically smaller on your HD.

If nothing of the above works, you can sit down and cry :) or try to be
brave and go for another compiler. Ifort sure can be buggy, but I don't
remember having noticed  - or heard - anyone complaining of anything similar
using it. Pathscale is also a good option, if you can buy it. Both will
render your calculation much faster, but be careful about optimization. Oh -
and I don't have shares of either company :)

Best of luck,

Marcos

On Tue, Sep 7, 2010 at 10:56 AM, Ludovic Briquet <[email protected]>wrote:

> Hi Marcos,
>
> The compiler was gcc 4.3.4
> The mpi was openmpi 1.3.3 for the gcc compiler and 64 bits
> Flags were:
> FFLAGS=-g -O2
> FPPFLAGS= -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT -DCDF
>
> we had to add an additional flag
> INCFLAGS=-I/cm/shared/apps/netcdf/gcc/64/4.0.1/include/
> because for some reasons the compiler could not find by itself the netcdf
> location
>
> The arch.make file is attached
>
> Thanks!
> Ludo
>
> *(See attached file: arch.make)*
>
> Marcos Veríssimo Alves <[email protected]> wrote on
> 07/09/2010 10:40:57:
>
> > Hi Ludovic,
>
> > This is in principle weird. The # of steps for SCF convergence
> > should not depend on the blocksize. Could you provide some info on
> > your compilation? Compiler, optimization flags, mpi and compilation
> > flags for mpi compilation?
> > Marcos
> > El 7 de sep de 2010, 10:23 a.m., "Ludovic Briquet" <[email protected]
> > > escribió:
>
> > Dear SIESTA users,
> >
> > I am in process of assessing how well the parallel version of SIESTA
> > 2.0.2 performs on our new cluster. To do so, I'm runing test jobs
> > and monitoring the timing of the jobs with different SIESTA and
> > cluster-related inputs.
> > The test job I'm running is a (1x2) Si(100) slab geometry
> > optimisation. The system contains 64 atoms. Functional is PBE, mesh
> > cutoff is 150 Ry, Kpoint mesh is 4x2x1. The basis set is setted as
> >
> > %Block PAO.Basis
> > Si 3 -0.46385
> > n=3 0 2 E 15.42551 4.96988
> > 7.00000 4.37722
> > 1.00000 1.00000
> > n=3 1 2 E 4.69636 3.83128
> > 7.00000 4.09123
> > 1.00000 1.00000
> > n=3 2 1 E 11.96912 0.03131
> > 4.55426
> > 1.00000
> > %EndBlock PAO.Basis
> >
> > 1- When I don't specify any BlockSize, SIESTA use a defautl value of
> > 24 and not 8 as stated in the user guide. Does anyone know the
> > reason for that?
> >
> > 2- When trying different BlockSize options, I see that the scf
> > convergence is not the same for all jobs.
> >
> > For exemple, for a 4 cpu job:
> > BlockSize 8: total run time is 15h54 - first optimization step is
> > completed with 205 scf iterations
> > BlockSize 16: total run 16h22 - - first optimization step is
> > completed with 222 scf iterations
> > BlockSize 24: total run 13h15 - first optimization step is completed
> > with 74 scf iterations
> > BlockSize 32: total run 19h09 - first optimization step is completed
> > with 399 scf iterations
> > It should be noted that the energies at the end of the scf processes
> > are all similar and the optimisation terminates in 34 steps for all jobs.
>
> >
> > I understood from the tutorial session in Santander last June that
> > the compilation of SIESTA in parallel can be a very tricky business.
> > So I was wandering if that scf behviour could in anyway be related
> > to the compilation? Or is it just a normal behaviour of SIESTA?
> >
> >
> > Cheers
> > Ludovic
>
>

Re: [SIESTA-L] The use of BlockSize on the parallel code

Responder a