Re: [SIESTA-L] Changing number of SCF iterations
2014-09-21 5:15 GMT+00:00 Jan Fredin : > Nick, > > Thank you so much for the specific compile options that you currently > using for Siesta in parallel with Intel compilers and MKL. I did look > through the email archive and did not find this kind of specific > information. > > > > I have rebuilt Siesta using your compile and link options explicitly. > Unfortunately my user problem did not become numerically stable. There are > still run-to-run variations in the IterSCF. > > > > This challenged me to check my build on a similar Siesta Test dataset and > see what I get. I want to check with you on the expected numerical > stability of my results. I am using an Intel E5-2697v2 based cluster and > using Intel 14.0.3 compilers and MKL with SGI/MPT. I chose the sic-slab > test from the Tests directory. The tests marked serial and parallel are > from executables built with the compile parameters you specified. The SGI > orig are from my original build with slightly different compile > parameters. I get the following results: > > Run copy nodes ppn IterSCF Siesta elapsed > FreeEng > > Reference4 71 > 785.629 -8721.570740 > > Serial build1 1 1 42 > 652.612 -8722.207218 > > 2 1 1 42 > 652.844 -8722.207218 > > 3 1 1 42 > 653.683 -8722.207218 > > Parallel 1 1 4 42 > 323.415 -8722.207555 > > 2 1 4 42 > 324.656 -8722.207555 > > 3 1 4 42 > 324.678 -8722.207555 > > Parallel 1 220 42 > 64.514 -8722.206894 > > 2 2 2042 > 64.489 -8722.206894 > > 3 2 2042 > 64.548 -8722.206894 > > SGI orig 1 2 2042 > 63.950 -8722.207380 > > 2 2 2042 > 63.950 -8722.207380 > > 3 2 2042 > 63.950 -8722.207380 > > As you can, the code looks numerically stable, the same number of IterSCF > happen no matter how the problem is laid on the machine, the FreeEng is > constant for each layout and even though the SCF cycles converge to 10E-05, > the different layouts give the same energy to 10E-03. > > Is this the level of precision you expect on the FreeEng? > I would suspect that to be fine. > All of my calculations have a different energy and IterSCF than the > Reference values. > > Can you tell me if you consider the results I got correct? > I would trust them, you can check whether the reference has been updated (see the header). > > > The data for sic-slab make me believe that my Siesta build is numerically > stable. Is there any way you can explain the run-to-run variation I see on > the user problem? > No other than: it is not the same simulation. Siesta has many options... Look in the manual if needed. > Thanks for all your support, > > Jan > > > > *From:* Nick Papior Andersen [mailto:nickpap...@gmail.com > ] > *Sent:* Saturday, September 20, 2014 3:34 PM > *To:* Jan Fredin; siesta-l@uam.es > > *Subject:* Re: [SIESTA-L] Changing number of SCF iterations > > > > > > > > 2014-09-20 19:57 GMT+00:00 Jan Fredin : > > Hi Nick, > > > > Thanks for your feedback. Unfortunately I am running a benchmark with a > customer test. We understand that the test does not scale to the level > requested and as you suggest is fastest time to solution at about 10-12 > cores on 1 node of Ivy Bridge or Haswell but this is not the request. > > If you easily want to "scale the problem" just add this to the fdf file: > > %block SuperCell > > 2 0 0 > > 0 2 0 > > 0 0 2 > > %endblock > > which will make your system 8 times larger (in this case go to ~100 atoms). > > > > I see the same run-to-run variation of the SCF path so that I get > different IterSCF and Etot for every run, whether it is on just 1 node with > a few cores, or on 2 nodes. I clearly have a numerically unstable build of > Siesta. I would like to use Intel compilers and MKL. > > > > Do you have an arch.make that you can forward that is successful for > building nume
RE: [SIESTA-L] Changing number of SCF iterations
Nick, Thank you so much for the specific compile options that you currently using for Siesta in parallel with Intel compilers and MKL. I did look through the email archive and did not find this kind of specific information. I have rebuilt Siesta using your compile and link options explicitly. Unfortunately my user problem did not become numerically stable. There are still run-to-run variations in the IterSCF. This challenged me to check my build on a similar Siesta Test dataset and see what I get. I want to check with you on the expected numerical stability of my results. I am using an Intel E5-2697v2 based cluster and using Intel 14.0.3 compilers and MKL with SGI/MPT. I chose the sic-slab test from the Tests directory. The tests marked serial and parallel are from executables built with the compile parameters you specified. The SGI orig are from my original build with slightly different compile parameters. I get the following results: Run copy nodes ppn IterSCF Siesta elapsed FreeEng Reference4 71 785.629 -8721.570740 Serial build1 1 1 42652.612 -8722.207218 2 1 1 42 652.844 -8722.207218 3 1 1 42 653.683 -8722.207218 Parallel 1 1 4 42323.415 -8722.207555 2 1 4 42 324.656 -8722.207555 3 1 4 42 324.678 -8722.207555 Parallel 1 220 42 64.514 -8722.206894 2 2 2042 64.489 -8722.206894 3 2 2042 64.548 -8722.206894 SGI orig 1 2 2042 63.950 -8722.207380 2 2 2042 63.950 -8722.207380 3 2 2042 63.950 -8722.207380 As you can, the code looks numerically stable, the same number of IterSCF happen no matter how the problem is laid on the machine, the FreeEng is constant for each layout and even though the SCF cycles converge to 10E-05, the different layouts give the same energy to 10E-03. Is this the level of precision you expect on the FreeEng? All of my calculations have a different energy and IterSCF than the Reference values. Can you tell me if you consider the results I got correct? The data for sic-slab make me believe that my Siesta build is numerically stable. Is there any way you can explain the run-to-run variation I see on the user problem? Thanks for all your support, Jan From: Nick Papior Andersen [mailto:nickpap...@gmail.com] Sent: Saturday, September 20, 2014 3:34 PM To: Jan Fredin; siesta-l@uam.es<mailto:siesta-l@uam.es> Subject: Re: [SIESTA-L] Changing number of SCF iterations 2014-09-20 19:57 GMT+00:00 Jan Fredin mailto:jfre...@sgi.com>>: Hi Nick, Thanks for your feedback. Unfortunately I am running a benchmark with a customer test. We understand that the test does not scale to the level requested and as you suggest is fastest time to solution at about 10-12 cores on 1 node of Ivy Bridge or Haswell but this is not the request. If you easily want to "scale the problem" just add this to the fdf file: %block SuperCell 2 0 0 0 2 0 0 0 2 %endblock which will make your system 8 times larger (in this case go to ~100 atoms). I see the same run-to-run variation of the SCF path so that I get different IterSCF and Etot for every run, whether it is on just 1 node with a few cores, or on 2 nodes. I clearly have a numerically unstable build of Siesta. I would like to use Intel compilers and MKL. Do you have an arch.make that you can forward that is successful for building numerically stable Siesta on Intel Zeon Ivy Bridge nodes. You should be able to compile a stable version on Ivy bridge. I have replied previously on the mailing list about how to use MKL and intel in siesta, try and search. Basically you "could" use these flags (adapt to your system!): FFLAGS=-O3 -ip -xHost -fPIC -m64 -prec-div -prec-sqrt -opt-prefetch -I$(INTEL_PATH)/mkl/include -I$(INTEL_PATH)/mkl/include/intel64/lp64 -mkl=sequential $(INC_PATH) BLAS_LIBS=-lmkl_blas95_lp64 LAPACK_LIBS=-lmkl_lapack95_lp64 # Remember to adopt if you use intel mpi and not openmpi BLACS_LIBS=-lmkl_blacs_openmpi_lp64 SCALAPACK_LIBS=-lmkl_scalapack_lp64 LIBS = $(ADDLIB
Re: [SIESTA-L] Changing number of SCF iterations
2014-09-20 19:57 GMT+00:00 Jan Fredin : > Hi Nick, > > > > Thanks for your feedback. Unfortunately I am running a benchmark with a > customer test. We understand that the test does not scale to the level > requested and as you suggest is fastest time to solution at about 10-12 > cores on 1 node of Ivy Bridge or Haswell but this is not the request. > If you easily want to "scale the problem" just add this to the fdf file: %block SuperCell 2 0 0 0 2 0 0 0 2 %endblock which will make your system 8 times larger (in this case go to ~100 atoms). > > > I see the same run-to-run variation of the SCF path so that I get > different IterSCF and Etot for every run, whether it is on just 1 node with > a few cores, or on 2 nodes. I clearly have a numerically unstable build of > Siesta. I would like to use Intel compilers and MKL. > > Do you have an arch.make that you can forward that is successful for > building numerically stable Siesta on Intel Zeon Ivy Bridge nodes. > You should be able to compile a stable version on Ivy bridge. I have replied previously on the mailing list about how to use MKL and intel in siesta, try and search. Basically you "could" use these flags (adapt to your system!): FFLAGS=-O3 -ip -xHost -fPIC -m64 -prec-div -prec-sqrt -opt-prefetch -I$(INTEL_PATH)/mkl/include -I$(INTEL_PATH)/mkl/include/intel64/lp64 -mkl=sequential $(INC_PATH) BLAS_LIBS=-lmkl_blas95_lp64 LAPACK_LIBS=-lmkl_lapack95_lp64 # Remember to adopt if you use intel mpi and not openmpi BLACS_LIBS=-lmkl_blacs_openmpi_lp64 SCALAPACK_LIBS=-lmkl_scalapack_lp64 LIBS = $(ADDLIB) $(SCALAPACK_LIBS) $(BLACS_LIBS) $(LAPACK_LIBS) $(BLAS_LIBS) $(NETCDF_LIBS) Maybe you also need to add this: LDFLAGS=-L$(INTEL_PATH)/mkl/lib/intel64 $(LIB_PATH) -mkl=sequential -opt-matmul If you really want to test, I would suggest you to also try with OpenBLAS, just change: BLAS_LIBS=-lopenblas and add the appropriate path. :) > > > What Intel compiler and MKL release do you know works? What are the exact > compile options you use? > Se above. I use 14.0.3 or 13.1.1. > > > I would like to leave netCDF out to eliminate issues that might be coming > from netCDF code releases or build issues. Does this impact the > performance very much for small problems? > No. For small systems this has little to no effect, yet every io routine does so from a master node, hence "a lot" of communication takes place at IO routines. For small systems they take a fraction of a second (dependent on your IO speed). > > > Thank you for your support. I will definitely forward your > recommendations for effectively running Siesta to the customer. > > Jan > > > > > > *From:* Nick Papior Andersen [mailto:nickpap...@gmail.com] > *Sent:* Friday, September 19, 2014 11:54 PM > *To:* Jan Fredin > *Cc:* siesta-l@uam.es > > *Subject:* Re: [SIESTA-L] Changing number of SCF iterations > > > > Throwing 40 processors at 13 atoms is not going to yield a good > performance test. > > My rule of thumb is "never go NP > NA" where NP is number of processors, > and NA is number of atoms. > > > > If you want to test using 40 processors, increase your number of atoms to > say 60... > > Your simulation is probably restrained by communication... > > > > > > > > 2014-09-19 22:49 GMT+00:00 Jan Fredin : > > I am running a user’s test so until I get permission to release the input, > I can describe the input fdf and output. > > I think the test is a single point SCF using the following input settings: > > NumberOfAtoms 13 > > NumberOfSpecies1 > > Then defines periotic cell and atomic coordinates > > DZP basis set with > > PAO.EnergyShift 150 meV > > MeshCutoff200 Ry > > ---SCF > > SolutionMethod diagon > > XC.functionalLDA > > XC.authors CA > > DM.NumberPulay 10 > > DM.MixingWeight 0.02 > > MaxSCFIterations 500 > > SpinPolarized True > > > > It is the output that made me think it does one MD step although there is > no input about MD. Maybe this is printed in all cases, even when you are > just doing a single point SCF. > > > > ….out > > siesta: System type = slab > > initatomlists: Number of atoms, orbitals, and projectors: 13 195 > 208 > > siesta: Simulation parameters > > > siesta: > > siesta: The following are some of the parameters of the simulation. > > siesta: A complete list of the parameters used, including default values, > > siesta: can be found in file out.fdf > > siesta: > > redata: Non-Collinear-spin run = F > > redata: SpinPolarized (Up/Do
Re: [SIESTA-L] Changing number of SCF iterations
Throwing 40 processors at 13 atoms is not going to yield a good performance test. My rule of thumb is "never go NP > NA" where NP is number of processors, and NA is number of atoms. If you want to test using 40 processors, increase your number of atoms to say 60... Your simulation is probably restrained by communication... 2014-09-19 22:49 GMT+00:00 Jan Fredin : > I am running a user’s test so until I get permission to release the > input, I can describe the input fdf and output. > > I think the test is a single point SCF using the following input settings: > > NumberOfAtoms 13 > > NumberOfSpecies1 > > Then defines periotic cell and atomic coordinates > > DZP basis set with > > PAO.EnergyShift 150 meV > > MeshCutoff200 Ry > > ---SCF > > SolutionMethod diagon > > XC.functionalLDA > > XC.authors CA > > DM.NumberPulay 10 > > DM.MixingWeight 0.02 > > MaxSCFIterations 500 > > SpinPolarized True > > > > It is the output that made me think it does one MD step although there is > no input about MD. Maybe this is printed in all cases, even when you are > just doing a single point SCF. > > > > ….out > > siesta: System type = slab > > initatomlists: Number of atoms, orbitals, and projectors: 13 195 > 208 > > siesta: Simulation parameters > > > siesta: > > siesta: The following are some of the parameters of the simulation. > > siesta: A complete list of the parameters used, including default values, > > siesta: can be found in file out.fdf > > siesta: > > redata: Non-Collinear-spin run = F > > redata: SpinPolarized (Up/Down) run = T > > redata: Number of spin components= 2 > > redata: Long output = F > > redata: Number of Atomic Species =1 > > redata: Charge density info will appear in .RHO file > > redata: Write Mulliken Pop. = NO > > redata: Mesh Cutoff = 200. Ry > > redata: Net charge of the system = 0. |e| > > redata: Max. number of SCF Iter = 500 > > redata: Performing Pulay mixing using=10 iterations > > redata: Mix DM in first SCF step ? = F > > redata: Write Pulay info on disk?= F > > redata: Discard 1st Pulay DM after kick = F > > redata: New DM Mixing Weight = 0.0200 > > redata: New DM Occupancy tolerance = 0.0001 > > redata: No kicks to SCF > > redata: DM Mixing Weight for Kicks = 0.5000 > > redata: DM Tolerance for SCF = 0.000100 > > redata: Require Energy convergence for SCF = F > > redata: DM Energy tolerance for SCF = 0.000100 eV > > redata: Require Harris convergence for SCF = F > > redata: DM Harris energy tolerance for SCF = 0.000100 eV > > redata: Antiferro initial spin density = F > > redata: Using Saved Data (generic) = F > > redata: Use continuation files for DM= F > > redata: Neglect nonoverlap interactions = F > > redata: Method of Calculation= Diagonalization > > redata: Divide and Conquer = T > > redata: Electronic Temperature = 0.0019 Ry > > redata: Fix the spin of the system = F > > redata: Dynamics option = Verlet MD run > > redata: Initial MD time step =1 > > redata: Final MD time step =1 > > redata: Length of MD time step = 1. fs > > redata: Initial Temperature of MD run= 0. K > > redata: Perform a MD quench = F > > ………. > > siesta: ================== > > Begin MD step = 1 > > == > > > > Thanks for any insight you can give me on what the IterSCF would change > from run-to-run. > > Jan > > > > > > *From:* Jan Fredin > *Sent:* Friday, September 19, 2014 11:27 AM > *To:* Nick Papior Andersen; siesta-l@uam.es > *Cc:* Jan Fredin > *Subject:* RE: [SIESTA-L] Changing number of SCF iterations > > > > Nick, > > I’m sorry I was not clear enough. I ran 5 clean directory copies of the > user’s test problem and I was surprised by the run-to-run variations. I > wonder if I should expect this run-to-run variation or if I should think > something is wrong with my build. All of the test data passed. > > &
RE: [SIESTA-L] Changing number of SCF iterations
I am running a user’s test so until I get permission to release the input, I can describe the input fdf and output. I think the test is a single point SCF using the following input settings: NumberOfAtoms 13 NumberOfSpecies1 Then defines periotic cell and atomic coordinates DZP basis set with PAO.EnergyShift 150 meV MeshCutoff200 Ry ---SCF SolutionMethod diagon XC.functionalLDA XC.authors CA DM.NumberPulay 10 DM.MixingWeight 0.02 MaxSCFIterations 500 SpinPolarized True It is the output that made me think it does one MD step although there is no input about MD. Maybe this is printed in all cases, even when you are just doing a single point SCF. ….out siesta: System type = slab initatomlists: Number of atoms, orbitals, and projectors: 13 195 208 siesta: Simulation parameters siesta: siesta: The following are some of the parameters of the simulation. siesta: A complete list of the parameters used, including default values, siesta: can be found in file out.fdf siesta: redata: Non-Collinear-spin run = F redata: SpinPolarized (Up/Down) run = T redata: Number of spin components= 2 redata: Long output = F redata: Number of Atomic Species =1 redata: Charge density info will appear in .RHO file redata: Write Mulliken Pop. = NO redata: Mesh Cutoff = 200. Ry redata: Net charge of the system = 0. |e| redata: Max. number of SCF Iter = 500 redata: Performing Pulay mixing using=10 iterations redata: Mix DM in first SCF step ? = F redata: Write Pulay info on disk?= F redata: Discard 1st Pulay DM after kick = F redata: New DM Mixing Weight = 0.0200 redata: New DM Occupancy tolerance = 0.0001 redata: No kicks to SCF redata: DM Mixing Weight for Kicks = 0.5000 redata: DM Tolerance for SCF = 0.000100 redata: Require Energy convergence for SCF = F redata: DM Energy tolerance for SCF = 0.000100 eV redata: Require Harris convergence for SCF = F redata: DM Harris energy tolerance for SCF = 0.000100 eV redata: Antiferro initial spin density = F redata: Using Saved Data (generic) = F redata: Use continuation files for DM= F redata: Neglect nonoverlap interactions = F redata: Method of Calculation= Diagonalization redata: Divide and Conquer = T redata: Electronic Temperature = 0.0019 Ry redata: Fix the spin of the system = F redata: Dynamics option = Verlet MD run redata: Initial MD time step =1 redata: Final MD time step =1 redata: Length of MD time step = 1. fs redata: Initial Temperature of MD run= 0. K redata: Perform a MD quench = F ………. siesta: == Begin MD step = 1 == Thanks for any insight you can give me on what the IterSCF would change from run-to-run. Jan From: Jan Fredin Sent: Friday, September 19, 2014 11:27 AM To: Nick Papior Andersen; siesta-l@uam.es Cc: Jan Fredin Subject: RE: [SIESTA-L] Changing number of SCF iterations Nick, I’m sorry I was not clear enough. I ran 5 clean directory copies of the user’s test problem and I was surprised by the run-to-run variations. I wonder if I should expect this run-to-run variation or if I should think something is wrong with my build. All of the test data passed. 2 nodes 20 cores/node 40 MPI Ranks total Run#CLOCK End IterSCF sec/IterFinal energy Total = 1559.368 171 3.27 -12349.946166 2529.102 162 3.27 -12349.948178 3494.420 151 3.27 -12349.946742 4539.789 165 3.27 -12349.948130 5560.794 172 3.26 -12349.948101 Siesta Version:siesta-3.2-pl-5 Architecture : sgi-ice_mpt Compiler flags: ifort -g -O3 -xAVX -fno-alias -ftz -ip -traceback PARALLEL version * Running on 40 nodes in parallel >> Start of run: 16-SEP-2014 12:22:42 *** * WELCOME TO SIESTA * *** Thank you for any insight you can give me. Jan From: Nick Papior Andersen [mailto:nickpap...@gmail.com] Sent: Friday, September 19, 2014 12:33 AM To: siesta-l@uam.es<mailto:siesta-l@uam.es> Cc: Jan Fredin Subject: Re: [SIESTA-L] Changing number of SCF iterations "Changes" could be 1ms? How much are we talking about, are you running
RE: [SIESTA-L] Changing number of SCF iterations
Nick, I’m sorry I was not clear enough. I ran 5 clean directory copies of the user’s test problem and I was surprised by the run-to-run variations. I wonder if I should expect this run-to-run variation or if I should think something is wrong with my build. All of the test data passed. 2 nodes 20 cores/node 40 MPI Ranks total Run#CLOCK End IterSCF sec/IterFinal energy Total = 1559.368 171 3.27 -12349.946166 2529.102 162 3.27 -12349.948178 3494.420 151 3.27 -12349.946742 4539.789 165 3.27 -12349.948130 5560.794 172 3.26 -12349.948101 Siesta Version:siesta-3.2-pl-5 Architecture : sgi-ice_mpt Compiler flags: ifort -g -O3 -xAVX -fno-alias -ftz -ip -traceback PARALLEL version * Running on 40 nodes in parallel >> Start of run: 16-SEP-2014 12:22:42 *** * WELCOME TO SIESTA * *** Thank you for any insight you can give me. Jan From: Nick Papior Andersen [mailto:nickpap...@gmail.com] Sent: Friday, September 19, 2014 12:33 AM To: siesta-l@uam.es Cc: Jan Fredin Subject: Re: [SIESTA-L] Changing number of SCF iterations "Changes" could be 1ms? How much are we talking about, are you running in the same folder as previously? In that case it could read in the DM from the previous initialization, which then is not the same... Also, size of system, unit-cell size, meshcutoff? 2014-09-19 0:43 GMT+02:00 Jan Fredin mailto:jfre...@sgi.com>>: Hi, I am a new Siesta user and am running a small 2 node problem with 1 SCF cycle and 1 MD cycle. I find that run-to-run the IterSCF changes even with a fixed node count and core per node count. It also changes for different core counts on both 1 and 2 nodes. I have built Siesta using Intel 14 compilers and MKL for scalapack, blacs, lapack and blas, SGI MPT and netCDF/4.3.0.<http://4.3.0.> Is it expected that Siesta is not-determinate for the SCF iterations? Do you have any suggestions on what can be causing the IterSCF to vary from run-to-run? Thank You, Jan -- Dr. Jan Fredin SGI Computational Chemistry Apps Sr. Member of Technical Staff – Technical Lead Austin, TX jfre...@sgi.com<mailto:jfre...@sgi.com> 512-331-2860 -- Kind regards Nick
Re: [SIESTA-L] Changing number of SCF iterations
"Changes" could be 1ms? How much are we talking about, are you running in the same folder as previously? In that case it could read in the DM from the previous initialization, which then is not the same... Also, size of system, unit-cell size, meshcutoff? 2014-09-19 0:43 GMT+02:00 Jan Fredin : > Hi, > > I am a new Siesta user and am running a small 2 node problem with 1 SCF > cycle and 1 MD cycle. I find that run-to-run the IterSCF changes even with > a fixed node count and core per node count. It also changes for different > core counts on both 1 and 2 nodes. > > > > I have built Siesta using Intel 14 compilers and MKL for scalapack, blacs, > lapack and blas, SGI MPT and netCDF/4.3.0. > > Is it expected that Siesta is not-determinate for the SCF iterations? > > Do you have any suggestions on what can be causing the IterSCF to vary > from run-to-run? > > > > Thank You, > > Jan > > -- > > Dr. Jan Fredin > > SGI Computational Chemistry Apps > > Sr. Member of Technical Staff – Technical Lead > > Austin, TX > > jfre...@sgi.com > > 512-331-2860 > > -- Kind regards Nick