Re: [SIESTA-L] About wxml
If I'm not mistaken * appears where m should be -1, i.e. for p orbitals the quantum number m changes as -1,0,1 and it has nothing to do with spin - basically that are the populations of the basis orbitals used in your calculations. To get the meaningful PDOS you have to sum over all m and over all zetas. (otherwise you may notice that some of those values are negative)
[SIESTA-L] Obviously poor PARALLEL performance compared to VASP
Hi, I'm an experienced user of VASP. Nowadays i'm trying to learn Siesta. During my simple tests i have realized that parallellization of Siesta is obviously poor than VASP. I'm using latest intel ifort (11) and mkl 10.1. My system is qual core xeon 2.33 with infiniband (4 core in one node). I've chosen siesta-2.0.2/Tests/si64/ as input. To see the realistic effects, i have increased kpoints distorted 1 atom and increased mesh cut-off. 2CPU 4CPU 8CPU parallellization of vasp is nearly linear. However siesta's performance is poor. CLOCK results are as below. 2CPU - Start of run 0.000 -- end of scf step65.702 -- end of scf step 123.005 -- end of scf step 179.978 -- end of scf step 236.833 -- end of scf step 293.613 -- end of scf step 350.316 -- end of scf step 407.082 -- end of scf step 463.780 -- end of scf step 520.457 -- end of scf step 577.072 -- end of scf step 633.713 -- end of scf step 690.372 -- end of scf step 747.105 -- end of scf step 803.680 -- end of scf step 860.304 -- end of scf step 916.886 -- end of scf step 920.865 --- end of geometry step 920.887 4CPU - Start of run 0.000 -- end of scf step52.757 -- end of scf step99.481 -- end of scf step 145.754 -- end of scf step 191.974 -- end of scf step 238.180 -- end of scf step 284.612 -- end of scf step 330.736 -- end of scf step 377.200 -- end of scf step 423.579 -- end of scf step 469.623 -- end of scf step 515.901 -- end of scf step 561.912 -- end of scf step 608.275 -- end of scf step 654.488 -- end of scf step 700.990 -- end of scf step 747.432 -- end of scf step 749.578 --- end of geometry step 749.604 End of run 749.843 8CPU - Start of run 0.000 -- end of scf step57.490 -- end of scf step 106.014 -- end of scf step 154.452 -- end of scf step 202.971 -- end of scf step 251.328 -- end of scf step 299.604 -- end of scf step 348.336 -- end of scf step 396.550 -- end of scf step 445.203 -- end of scf step 493.459 -- end of scf step 541.900 -- end of scf step 590.203 -- end of scf step 638.980 -- end of scf step 687.550 -- end of scf step 735.906 -- end of scf step 784.315 -- end of scf step 785.593 --- end of geometry step 785.667 End of run 786.080 I've done more tests and observed no difference. Even serial version is faster than some parallel jobs :(. Am I doing something WRONG ? Thanks. I'm sending my input and output file below. # - # FDF for a cubic c-Si supercell with 64 atoms # # E. Artacho, April 1999 # - SystemName 64-atom silicon SystemLabel si64 NumberOfAtoms 64 NumberOfSpecies 1 %block ChemicalSpeciesLabel 1 14 Si %endblock ChemicalSpeciesLabel PAO.BasisSize SZ PAO.EnergyShift 300 meV LatticeConstant 5.430 Ang %block LatticeVectors 2.000 0.000 0.000 0.000 2.000 0.000 0.000 0.000 2.000 %endblock LatticeVectors %block kgrid_Monkhorst_Pack 7 0 0 0.0 0 7 0 0.0 0 0 7 0.0 %endblock kgrid_Monkhorst_Pack MeshCutoff 100.0 Ry MaxSCFIterations50 DM.MixingWeight 0.3 DM.NumberPulay 3 DM.Tolerance 1.d-4 DM.UseSaveDM SolutionMethod diagon ElectronicTemperature 25 meV MD.TypeOfRun cg MD.NumCGsteps0 MD.MaxCGDispl 0.1 Ang MD.MaxForceTol0.04 eV/Ang AtomicCoordinatesFormat ScaledCartesian %block AtomicCoordinatesAndAtomicSpecies 0.10000.10000.1000 1 # Si 1 0.2500.2500.250 1 # Si 2 0.0000.5000.500 1 # Si 3 0.2500.7500.750 1 # Si 4 0.5000.0000.500 1 # Si 5 0.7500.2500.750 1 # Si 6 0.500
Re: [SIESTA-L] Obviously poor PARALLEL performance compared to VASP
Hi, Use keyword ParallelOverK and see . Default parallelization is over orbitals, which is less efficient. Regards, Roberto On Wed, 18 Feb 2009, Mehmet Topsakal wrote: Hi, I'm an experienced user of VASP. Nowadays i'm trying to learn Siesta. During my simple tests i have realized that parallellization of Siesta is obviously poor than VASP. I'm using latest intel ifort (11) and mkl 10.1. My system is qual core xeon 2.33 with infiniband (4 core in one node). I've chosen siesta-2.0.2/Tests/si64/ as input. To see the realistic effects, i have increased kpoints distorted 1 atom and increased mesh cut-off. 2CPU 4CPU 8CPU parallellization of vasp is nearly linear. However siesta's performance is poor. CLOCK results are as below. 2CPU - Start of run 0.000 -- end of scf step65.702 -- end of scf step 123.005 -- end of scf step 179.978 -- end of scf step 236.833 -- end of scf step 293.613 -- end of scf step 350.316 -- end of scf step 407.082 -- end of scf step 463.780 -- end of scf step 520.457 -- end of scf step 577.072 -- end of scf step 633.713 -- end of scf step 690.372 -- end of scf step 747.105 -- end of scf step 803.680 -- end of scf step 860.304 -- end of scf step 916.886 -- end of scf step 920.865 --- end of geometry step 920.887 4CPU - Start of run 0.000 -- end of scf step52.757 -- end of scf step99.481 -- end of scf step 145.754 -- end of scf step 191.974 -- end of scf step 238.180 -- end of scf step 284.612 -- end of scf step 330.736 -- end of scf step 377.200 -- end of scf step 423.579 -- end of scf step 469.623 -- end of scf step 515.901 -- end of scf step 561.912 -- end of scf step 608.275 -- end of scf step 654.488 -- end of scf step 700.990 -- end of scf step 747.432 -- end of scf step 749.578 --- end of geometry step 749.604 End of run 749.843 8CPU - Start of run 0.000 -- end of scf step57.490 -- end of scf step 106.014 -- end of scf step 154.452 -- end of scf step 202.971 -- end of scf step 251.328 -- end of scf step 299.604 -- end of scf step 348.336 -- end of scf step 396.550 -- end of scf step 445.203 -- end of scf step 493.459 -- end of scf step 541.900 -- end of scf step 590.203 -- end of scf step 638.980 -- end of scf step 687.550 -- end of scf step 735.906 -- end of scf step 784.315 -- end of scf step 785.593 --- end of geometry step 785.667 End of run 786.080 I've done more tests and observed no difference. Even serial version is faster than some parallel jobs :(. Am I doing something WRONG ? Thanks. I'm sending my input and output file below. # - # FDF for a cubic c-Si supercell with 64 atoms # # E. Artacho, April 1999 # - SystemName 64-atom silicon SystemLabel si64 NumberOfAtoms 64 NumberOfSpecies 1 %block ChemicalSpeciesLabel 1 14 Si %endblock ChemicalSpeciesLabel PAO.BasisSize SZ PAO.EnergyShift 300 meV LatticeConstant 5.430 Ang %block LatticeVectors 2.000 0.000 0.000 0.000 2.000 0.000 0.000 0.000 2.000 %endblock LatticeVectors %block kgrid_Monkhorst_Pack 7 0 0 0.0 0 7 0 0.0 0 0 7 0.0 %endblock kgrid_Monkhorst_Pack MeshCutoff 100.0 Ry MaxSCFIterations50 DM.MixingWeight 0.3 DM.NumberPulay 3 DM.Tolerance 1.d-4 DM.UseSaveDM SolutionMethod diagon ElectronicTemperature 25 meV MD.TypeOfRun cg MD.NumCGsteps0 MD.MaxCGDispl 0.1 Ang MD.MaxForceTol0.04 eV/Ang AtomicCoordinatesFormat ScaledCartesian %block AtomicCoordinatesAndAtomicSpecies 0.10000.10000.1000 1 # Si 1 0.2500.2500.250 1 # Si 2 0.000
Re: [SIESTA-L] Obviously poor PARALLEL performance compared toVASP
did you try diag.paralleloverk T ? Regards Marcel Marcel Mohr Institut für Festkörperphysik, TU Berlin marcel(at)physik.tu-berlin.de Sekr. EW 5-4 TEL: +49-30-314 24442 Hardenbergstr. 36 FAX: +49-30-314 27705 10623 Berlin On Wed, 18 Feb 2009, Mehmet Topsakal wrote: Hi, I'm an experienced user of VASP. Nowadays i'm trying to learn Siesta. During my simple tests i have realized that parallellization of Siesta is obviously poor than VASP. I'm using latest intel ifort (11) and mkl 10.1. My system is qual core xeon 2.33 with infiniband (4 core in one node). I've chosen siesta-2.0.2/Tests/si64/ as input. To see the realistic effects, i have increased kpoints distorted 1 atom and increased mesh cut-off. 2CPU 4CPU 8CPU parallellization of vasp is nearly linear. However siesta's performance is poor. CLOCK results are as below. 2CPU - Start of run 0.000 -- end of scf step65.702 -- end of scf step 123.005 -- end of scf step 179.978 -- end of scf step 236.833 -- end of scf step 293.613 -- end of scf step 350.316 -- end of scf step 407.082 -- end of scf step 463.780 -- end of scf step 520.457 -- end of scf step 577.072 -- end of scf step 633.713 -- end of scf step 690.372 -- end of scf step 747.105 -- end of scf step 803.680 -- end of scf step 860.304 -- end of scf step 916.886 -- end of scf step 920.865 --- end of geometry step 920.887 4CPU - Start of run 0.000 -- end of scf step52.757 -- end of scf step99.481 -- end of scf step 145.754 -- end of scf step 191.974 -- end of scf step 238.180 -- end of scf step 284.612 -- end of scf step 330.736 -- end of scf step 377.200 -- end of scf step 423.579 -- end of scf step 469.623 -- end of scf step 515.901 -- end of scf step 561.912 -- end of scf step 608.275 -- end of scf step 654.488 -- end of scf step 700.990 -- end of scf step 747.432 -- end of scf step 749.578 --- end of geometry step 749.604 End of run 749.843 8CPU - Start of run 0.000 -- end of scf step57.490 -- end of scf step 106.014 -- end of scf step 154.452 -- end of scf step 202.971 -- end of scf step 251.328 -- end of scf step 299.604 -- end of scf step 348.336 -- end of scf step 396.550 -- end of scf step 445.203 -- end of scf step 493.459 -- end of scf step 541.900 -- end of scf step 590.203 -- end of scf step 638.980 -- end of scf step 687.550 -- end of scf step 735.906 -- end of scf step 784.315 -- end of scf step 785.593 --- end of geometry step 785.667 End of run 786.080 I've done more tests and observed no difference. Even serial version is faster than some parallel jobs :(. Am I doing something WRONG ? Thanks. I'm sending my input and output file below. # - # FDF for a cubic c-Si supercell with 64 atoms # # E. Artacho, April 1999 # - SystemName 64-atom silicon SystemLabel si64 NumberOfAtoms 64 NumberOfSpecies 1 %block ChemicalSpeciesLabel 1 14 Si %endblock ChemicalSpeciesLabel PAO.BasisSize SZ PAO.EnergyShift 300 meV LatticeConstant 5.430 Ang %block LatticeVectors 2.000 0.000 0.000 0.000 2.000 0.000 0.000 0.000 2.000 %endblock LatticeVectors %block kgrid_Monkhorst_Pack 7 0 0 0.0 0 7 0 0.0 0 0 7 0.0 %endblock kgrid_Monkhorst_Pack MeshCutoff 100.0 Ry MaxSCFIterations50 DM.MixingWeight 0.3 DM.NumberPulay 3 DM.Tolerance 1.d-4 DM.UseSaveDM SolutionMethod diagon ElectronicTemperature 25 meV MD.TypeOfRun cg MD.NumCGsteps0 MD.MaxCGDispl 0.1 Ang
Re: [SIESTA-L] Obviously poor PARALLEL performance compared toVASP
Dear All Look at the input. Your system is too small to see any benefit from parallelizing over orbitals (default behaviour): initatomlists: Number of atoms, orbitals, and projectors: 64 256 576 * Maximum dynamic memory allocated =16 MB (this is per node, so probably 8*16=128 MB or something similar) If you use paralleloverK, results should be ok. If you want to test the domain decomposition performance, a couple of suggestions: 1) Increase grid (to 200-400 Ry) 2) Increase BasisSize to DZP 3) If you are still not satisfied, even create a supercell cell, like 2 * 2 * 2 of the original cell (thus having 512 atoms). Check your results, you should get a good speedup from 2 to 4 and probably even to 8 procs. Regards Lucas On Wed, Feb 18, 2009 at 11:26 PM, Marcel Mohr mar...@physik.tu-berlin.de wrote: did you try diag.paralleloverk T ? Regards Marcel Marcel Mohr Institut für Festkörperphysik, TU Berlin marcel(at)physik.tu-berlin.de Sekr. EW 5-4 TEL: +49-30-314 24442 Hardenbergstr. 36 FAX: +49-30-314 27705 10623 Berlin On Wed, 18 Feb 2009, Mehmet Topsakal wrote: Hi, I'm an experienced user of VASP. Nowadays i'm trying to learn Siesta. During my simple tests i have realized that parallellization of Siesta is obviously poor than VASP. I'm using latest intel ifort (11) and mkl 10.1. My system is qual core xeon 2.33 with infiniband (4 core in one node). I've chosen siesta-2.0.2/Tests/si64/ as input. To see the realistic effects, i have increased kpoints distorted 1 atom and increased mesh cut-off. 2CPU 4CPU 8CPU parallellization of vasp is nearly linear. However siesta's performance is poor. CLOCK results are as below. 2CPU - Start of run 0.000 -- end of scf step65.702 -- end of scf step 123.005 -- end of scf step 179.978 -- end of scf step 236.833 -- end of scf step 293.613 -- end of scf step 350.316 -- end of scf step 407.082 -- end of scf step 463.780 -- end of scf step 520.457 -- end of scf step 577.072 -- end of scf step 633.713 -- end of scf step 690.372 -- end of scf step 747.105 -- end of scf step 803.680 -- end of scf step 860.304 -- end of scf step 916.886 -- end of scf step 920.865 --- end of geometry step 920.887 4CPU - Start of run 0.000 -- end of scf step52.757 -- end of scf step99.481 -- end of scf step 145.754 -- end of scf step 191.974 -- end of scf step 238.180 -- end of scf step 284.612 -- end of scf step 330.736 -- end of scf step 377.200 -- end of scf step 423.579 -- end of scf step 469.623 -- end of scf step 515.901 -- end of scf step 561.912 -- end of scf step 608.275 -- end of scf step 654.488 -- end of scf step 700.990 -- end of scf step 747.432 -- end of scf step 749.578 --- end of geometry step 749.604 End of run 749.843 8CPU - Start of run 0.000 -- end of scf step57.490 -- end of scf step 106.014 -- end of scf step 154.452 -- end of scf step 202.971 -- end of scf step 251.328 -- end of scf step 299.604 -- end of scf step 348.336 -- end of scf step 396.550 -- end of scf step 445.203 -- end of scf step 493.459 -- end of scf step 541.900 -- end of scf step 590.203 -- end of scf step 638.980 -- end of scf step 687.550 -- end of scf step 735.906 -- end of scf step 784.315 -- end of scf step 785.593 --- end of geometry step 785.667 End of run 786.080 I've done more tests and observed no difference. Even serial version is faster than some parallel jobs :(. Am I doing something WRONG ? Thanks. I'm sending my input and output file below. #
Re: [SIESTA-L] Obviously poor PARALLEL performance compared toVASP
Thank you very much. diag.paralleloverk T worked. New good timings are as follows: 2cpu--- Start of run 0.000 -- end of scf step35.661 -- end of scf step64.727 -- end of scf step93.779 -- end of scf step 122.809 -- end of scf step 151.815 -- end of scf step 180.833 -- end of scf step 209.842 -- end of scf step 238.843 -- end of scf step 267.856 -- end of scf step 296.861 -- end of scf step 325.862 -- end of scf step 354.869 -- end of scf step 358.900 --- end of geometry step 358.910 End of run 359.189 4cpu--- Start of run 0.000 -- end of scf step19.362 -- end of scf step33.702 -- end of scf step47.954 -- end of scf step62.193 -- end of scf step76.570 -- end of scf step90.962 -- end of scf step 105.142 -- end of scf step 119.510 -- end of scf step 133.742 -- end of scf step 147.781 -- end of scf step 162.087 -- end of scf step 176.279 -- end of scf step 178.423 --- end of geometry step 178.436 End of run 178.689 8cpu--- Start of run 0.000 -- end of scf step10.202 -- end of scf step18.363 -- end of scf step26.259 -- end of scf step34.359 -- end of scf step42.234 -- end of scf step50.079 -- end of scf step57.931 -- end of scf step65.881 -- end of scf step73.824 -- end of scf step81.722 -- end of scf step89.570 -- end of scf step97.569 -- end of scf step98.847 --- end of geometry step99.129 End of run 100.093 16cpu--- Start of run 0.000 -- end of scf step 9.298 -- end of scf step13.831 -- end of scf step18.210 -- end of scf step22.586 -- end of scf step26.970 -- end of scf step31.506 -- end of scf step35.914 -- end of scf step40.253 -- end of scf step44.620 -- end of scf step49.005 -- end of scf step53.384 -- end of scf step57.777 -- end of scf step58.435 --- end of geometry step58.505 End of run58.906 Marcel Mohr wrote: did you try diag.paralleloverk T ? Regards Marcel Marcel MohrInstitut für Festkörperphysik, TU Berlin marcel(at)physik.tu-berlin.deSekr. EW 5-4 TEL: +49-30-314 24442Hardenbergstr. 36 FAX: +49-30-314 2770510623 Berlin On Wed, 18 Feb 2009, Mehmet Topsakal wrote: Hi, I'm an experienced user of VASP. Nowadays i'm trying to learn Siesta. During my simple tests i have realized that parallellization of Siesta is obviously poor than VASP. I'm using latest intel ifort (11) and mkl 10.1. My system is qual core xeon 2.33 with infiniband (4 core in one node). I've chosen siesta-2.0.2/Tests/si64/ as input. To see the realistic effects, i have increased kpoints distorted 1 atom and increased mesh cut-off. 2CPU 4CPU 8CPU parallellization of vasp is nearly linear. However siesta's performance is poor. CLOCK results are as below. 2CPU - Start of run 0.000 -- end of scf step65.702 -- end of scf step 123.005 -- end of scf step 179.978 -- end of scf step 236.833 -- end of scf step 293.613 -- end of scf step 350.316 -- end of scf step 407.082 -- end of scf step 463.780 -- end of scf step 520.457 -- end of scf step 577.072 -- end of scf step 633.713 -- end of scf step 690.372 -- end of scf step 747.105 -- end of scf step 803.680 -- end of scf step 860.304 -- end of scf step 916.886 -- end of scf step 920.865 --- end of geometry step
Re: [SIESTA-L] About wxml
I have recently performed a calculation of the PDOS for a system with 323 atoms. Actually, not only for this one, but for other 30 systems as well (thank the heavens the other 30 systems have 90 atoms, at most...). However, I noticed that I had compiled siesta without the -DWXML_INIT_FLAG when I tried to extract the PDOS with the pdoswxml utility. The thiing is, all my calculations were made non-spin-polarized and still, the PDOS shows an '*' (asterisk) when it comes to the field of the quantum number 'm'. Why is that so? Putting the question more clearly, why is it that, in a non-spin-polarized calculation, we still have the field 'm' appearing in the PDOS file? Dear Marcos: this is so because m, even if called magnetic quantum number in textbooks, has nothing to do with spin and always runs from -l to l (and this not only in Siesta). The spin-up / spin-down values are shown in PDOS in two columns (if a spin-polarized calculation) for each (n. l, m, zeta) entry. In a more practical note, if I specify that I want only the pdos corresponding to the angular momentum channel (say) p of a certain species, does pdoswxml disregard the information on m, even if it is wrong but in the right format? For example, suppose that I replaced all the * for 0 or 1 and ask pdoswxml to get only the pdos pertaining to a given angular momentum of the system, does it still give me the pdos correctly? No point to replace them by 0 or 1 because in fact they stand for -1, -2 etc. In your system with 323 atoms, I guess the most do not have l=2. Use a simple script sed s/'m=\*'/'m=-1'/g PDOS PDOS_corrected to fix the problem for all atoms but those with l=2. For them, you'll have, after applying the above script, the sequence m=-1 m=-1 m=0 m=1 m=2 in the PDOS, instead of the correct one m=-2 m=-1 m=0 m=1 m=2 The rest you correct by hand, or invent a more sophisticated script. (Similar if you have l=3 atoms). Good luck Andrei Postnikov