Hi Ludovic, I am then supposing that you have used gfortran as the fortran compiler. Your optimization level seems fine, O2 is not too aggressive. I can not think of any tweaks you could do except add the flag -DGRID_DP to FPPFLAGS. This will make some important variables in Siesta (I think those related to the grid, mainly) to be compiled with double precision - if you don't specify this, they are compiled with single precision to save memory. I am not sure this will make any difference in the number of scf steps for different values of the blocksize - as a matter of fact I am almost sure it won't - but it's worth giving it a try, you never know.
If it doesn't work, then lower the level of optimization to O1, and/or even to O0 if the problem persists. Optimizations can be nice, but they can hide pretty nasty bugs. I can also think of more esoteric problems such as differences in system libraries that are present on your compiling and running nodes, although I wouldn't think of this in first place. If the DGRID_DP and the change in optimization level do not fix your problem, you could simply try to use completely static linking, just to make sure - the default is to link dynamically, which does depend on the libraries present in the running nodes, and slows your code down a bit, at the expense of making it physically smaller on your HD. If nothing of the above works, you can sit down and cry :) or try to be brave and go for another compiler. Ifort sure can be buggy, but I don't remember having noticed - or heard - anyone complaining of anything similar using it. Pathscale is also a good option, if you can buy it. Both will render your calculation much faster, but be careful about optimization. Oh - and I don't have shares of either company :) Best of luck, Marcos On Tue, Sep 7, 2010 at 10:56 AM, Ludovic Briquet <[email protected]>wrote: > Hi Marcos, > > The compiler was gcc 4.3.4 > The mpi was openmpi 1.3.3 for the gcc compiler and 64 bits > Flags were: > FFLAGS=-g -O2 > FPPFLAGS= -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT -DCDF > > we had to add an additional flag > INCFLAGS=-I/cm/shared/apps/netcdf/gcc/64/4.0.1/include/ > because for some reasons the compiler could not find by itself the netcdf > location > > The arch.make file is attached > > Thanks! > Ludo > > *(See attached file: arch.make)* > > Marcos Veríssimo Alves <[email protected]> wrote on > 07/09/2010 10:40:57: > > > Hi Ludovic, > > > This is in principle weird. The # of steps for SCF convergence > > should not depend on the blocksize. Could you provide some info on > > your compilation? Compiler, optimization flags, mpi and compilation > > flags for mpi compilation? > > Marcos > > El 7 de sep de 2010, 10:23 a.m., "Ludovic Briquet" <[email protected] > > > escribió: > > > Dear SIESTA users, > > > > I am in process of assessing how well the parallel version of SIESTA > > 2.0.2 performs on our new cluster. To do so, I'm runing test jobs > > and monitoring the timing of the jobs with different SIESTA and > > cluster-related inputs. > > The test job I'm running is a (1x2) Si(100) slab geometry > > optimisation. The system contains 64 atoms. Functional is PBE, mesh > > cutoff is 150 Ry, Kpoint mesh is 4x2x1. The basis set is setted as > > > > %Block PAO.Basis > > Si 3 -0.46385 > > n=3 0 2 E 15.42551 4.96988 > > 7.00000 4.37722 > > 1.00000 1.00000 > > n=3 1 2 E 4.69636 3.83128 > > 7.00000 4.09123 > > 1.00000 1.00000 > > n=3 2 1 E 11.96912 0.03131 > > 4.55426 > > 1.00000 > > %EndBlock PAO.Basis > > > > 1- When I don't specify any BlockSize, SIESTA use a defautl value of > > 24 and not 8 as stated in the user guide. Does anyone know the > > reason for that? > > > > 2- When trying different BlockSize options, I see that the scf > > convergence is not the same for all jobs. > > > > For exemple, for a 4 cpu job: > > BlockSize 8: total run time is 15h54 - first optimization step is > > completed with 205 scf iterations > > BlockSize 16: total run 16h22 - - first optimization step is > > completed with 222 scf iterations > > BlockSize 24: total run 13h15 - first optimization step is completed > > with 74 scf iterations > > BlockSize 32: total run 19h09 - first optimization step is completed > > with 399 scf iterations > > It should be noted that the energies at the end of the scf processes > > are all similar and the optimisation terminates in 34 steps for all jobs. > > > > > I understood from the tutorial session in Santander last June that > > the compilation of SIESTA in parallel can be a very tricky business. > > So I was wandering if that scf behviour could in anyway be related > > to the compilation? Or is it just a normal behaviour of SIESTA? > > > > > > Cheers > > Ludovic > >
