Re: [SIESTA-L] About wxml

2009-02-18 Thread Oleksandr Voznyy

If I'm not mistaken * appears where m should be -1,
i.e. for p orbitals the quantum number m changes as -1,0,1 and it has 
nothing to do with spin - basically that are the populations of the 
basis orbitals used in your calculations.
To get the meaningful PDOS you have to sum over all m and over all 
zetas. (otherwise you may notice that some of those values are negative)


[SIESTA-L] Obviously poor PARALLEL performance compared to VASP

2009-02-18 Thread Mehmet Topsakal

Hi,

I'm an experienced user of VASP. Nowadays i'm trying to learn Siesta. 
During my simple tests
i have realized that parallellization of Siesta is obviously poor than 
VASP. I'm using latest intel
ifort (11) and mkl 10.1. My system is qual core xeon 2.33 with 
infiniband (4 core in one node).


I've chosen siesta-2.0.2/Tests/si64/ as input. To see the realistic 
effects, i have increased kpoints

distorted 1 atom and increased mesh cut-off.

2CPU 4CPU 8CPU parallellization of vasp is nearly linear. However 
siesta's performance is poor.


CLOCK results are as below.

2CPU -
Start of run 0.000
-- end of scf step65.702
-- end of scf step   123.005
-- end of scf step   179.978
-- end of scf step   236.833
-- end of scf step   293.613
-- end of scf step   350.316
-- end of scf step   407.082
-- end of scf step   463.780
-- end of scf step   520.457
-- end of scf step   577.072
-- end of scf step   633.713
-- end of scf step   690.372
-- end of scf step   747.105
-- end of scf step   803.680
-- end of scf step   860.304
-- end of scf step   916.886
-- end of scf step   920.865
--- end of geometry step   920.887

4CPU -
Start of run 0.000
-- end of scf step52.757
-- end of scf step99.481
-- end of scf step   145.754
-- end of scf step   191.974
-- end of scf step   238.180
-- end of scf step   284.612
-- end of scf step   330.736
-- end of scf step   377.200
-- end of scf step   423.579
-- end of scf step   469.623
-- end of scf step   515.901
-- end of scf step   561.912
-- end of scf step   608.275
-- end of scf step   654.488
-- end of scf step   700.990
-- end of scf step   747.432
-- end of scf step   749.578
--- end of geometry step   749.604
End of run   749.843

8CPU -
Start of run 0.000
-- end of scf step57.490
-- end of scf step   106.014
-- end of scf step   154.452
-- end of scf step   202.971
-- end of scf step   251.328
-- end of scf step   299.604
-- end of scf step   348.336
-- end of scf step   396.550
-- end of scf step   445.203
-- end of scf step   493.459
-- end of scf step   541.900
-- end of scf step   590.203
-- end of scf step   638.980
-- end of scf step   687.550
-- end of scf step   735.906
-- end of scf step   784.315
-- end of scf step   785.593
--- end of geometry step   785.667
End of run   786.080

I've done more tests and observed no difference. Even serial version
is faster than some parallel jobs :(.

Am I doing something WRONG ?

Thanks.

I'm sending my input and output file below.

# 
-

# FDF for a cubic c-Si supercell with 64 atoms
#
# E. Artacho, April 1999
# 
-


SystemName  64-atom silicon
SystemLabel si64

NumberOfAtoms   64
NumberOfSpecies 1

%block ChemicalSpeciesLabel
1  14  Si
%endblock ChemicalSpeciesLabel

PAO.BasisSize   SZ
PAO.EnergyShift 300 meV

LatticeConstant 5.430 Ang
%block LatticeVectors
 2.000  0.000  0.000
 0.000  2.000  0.000
 0.000  0.000  2.000
%endblock LatticeVectors

%block kgrid_Monkhorst_Pack
7 0 0 0.0
0 7 0 0.0
0 0 7 0.0
%endblock kgrid_Monkhorst_Pack

MeshCutoff  100.0 Ry

MaxSCFIterations50 
DM.MixingWeight  0.3
DM.NumberPulay   3 
DM.Tolerance 1.d-4
DM.UseSaveDM

SolutionMethod   diagon   
ElectronicTemperature  25 meV 


MD.TypeOfRun cg
MD.NumCGsteps0
MD.MaxCGDispl 0.1  Ang
MD.MaxForceTol0.04 eV/Ang

AtomicCoordinatesFormat  ScaledCartesian
%block AtomicCoordinatesAndAtomicSpecies
   0.10000.10000.1000   1 #  Si  1
   0.2500.2500.250   1 #  Si  2
   0.0000.5000.500   1 #  Si  3
   0.2500.7500.750   1 #  Si  4
   0.5000.0000.500   1 #  Si  5
   0.7500.2500.750   1 #  Si  6
   0.500

Re: [SIESTA-L] Obviously poor PARALLEL performance compared to VASP

2009-02-18 Thread R.C.Pasianot

 Hi,

 Use keyword ParallelOverK and see . Default parallelization is over
 orbitals, which is less efficient.

 Regards,

 Roberto



On Wed, 18 Feb 2009, Mehmet Topsakal wrote:


Hi,

I'm an experienced user of VASP. Nowadays i'm trying to learn Siesta. During 
my simple tests
i have realized that parallellization of Siesta is obviously poor than VASP. 
I'm using latest intel
ifort (11) and mkl 10.1. My system is qual core xeon 2.33 with infiniband (4 
core in one node).


I've chosen siesta-2.0.2/Tests/si64/ as input. To see the realistic 
effects, i have increased kpoints

distorted 1 atom and increased mesh cut-off.

2CPU 4CPU 8CPU parallellization of vasp is nearly linear. However siesta's 
performance is poor.


CLOCK results are as below.

2CPU -
Start of run 0.000
-- end of scf step65.702
-- end of scf step   123.005
-- end of scf step   179.978
-- end of scf step   236.833
-- end of scf step   293.613
-- end of scf step   350.316
-- end of scf step   407.082
-- end of scf step   463.780
-- end of scf step   520.457
-- end of scf step   577.072
-- end of scf step   633.713
-- end of scf step   690.372
-- end of scf step   747.105
-- end of scf step   803.680
-- end of scf step   860.304
-- end of scf step   916.886
-- end of scf step   920.865
--- end of geometry step   920.887

4CPU -
Start of run 0.000
-- end of scf step52.757
-- end of scf step99.481
-- end of scf step   145.754
-- end of scf step   191.974
-- end of scf step   238.180
-- end of scf step   284.612
-- end of scf step   330.736
-- end of scf step   377.200
-- end of scf step   423.579
-- end of scf step   469.623
-- end of scf step   515.901
-- end of scf step   561.912
-- end of scf step   608.275
-- end of scf step   654.488
-- end of scf step   700.990
-- end of scf step   747.432
-- end of scf step   749.578
--- end of geometry step   749.604
End of run   749.843

8CPU -
Start of run 0.000
-- end of scf step57.490
-- end of scf step   106.014
-- end of scf step   154.452
-- end of scf step   202.971
-- end of scf step   251.328
-- end of scf step   299.604
-- end of scf step   348.336
-- end of scf step   396.550
-- end of scf step   445.203
-- end of scf step   493.459
-- end of scf step   541.900
-- end of scf step   590.203
-- end of scf step   638.980
-- end of scf step   687.550
-- end of scf step   735.906
-- end of scf step   784.315
-- end of scf step   785.593
--- end of geometry step   785.667
End of run   786.080

I've done more tests and observed no difference. Even serial version
is faster than some parallel jobs :(.

Am I doing something WRONG ?

Thanks.

I'm sending my input and output file below.

# 
-

# FDF for a cubic c-Si supercell with 64 atoms
#
# E. Artacho, April 1999
# 
-


SystemName  64-atom silicon
SystemLabel si64

NumberOfAtoms   64
NumberOfSpecies 1

%block ChemicalSpeciesLabel
1  14  Si
%endblock ChemicalSpeciesLabel

PAO.BasisSize   SZ
PAO.EnergyShift 300 meV

LatticeConstant 5.430 Ang
%block LatticeVectors
2.000  0.000  0.000
0.000  2.000  0.000
0.000  0.000  2.000
%endblock LatticeVectors

%block kgrid_Monkhorst_Pack
7 0 0 0.0
0 7 0 0.0
0 0 7 0.0
%endblock kgrid_Monkhorst_Pack

MeshCutoff  100.0 Ry

MaxSCFIterations50 DM.MixingWeight  0.3
DM.NumberPulay   3 DM.Tolerance 1.d-4
DM.UseSaveDM 
SolutionMethod   diagon   ElectronicTemperature  25 meV 
MD.TypeOfRun cg

MD.NumCGsteps0
MD.MaxCGDispl 0.1  Ang
MD.MaxForceTol0.04 eV/Ang

AtomicCoordinatesFormat  ScaledCartesian
%block AtomicCoordinatesAndAtomicSpecies
  0.10000.10000.1000   1 #  Si  1
  0.2500.2500.250   1 #  Si  2
  0.000

Re: [SIESTA-L] Obviously poor PARALLEL performance compared toVASP

2009-02-18 Thread Marcel Mohr

did you try

diag.paralleloverk T  ?

Regards
Marcel


Marcel Mohr Institut für Festkörperphysik, TU Berlin
marcel(at)physik.tu-berlin.de   Sekr. EW 5-4
TEL: +49-30-314 24442   Hardenbergstr. 36
FAX: +49-30-314 27705   10623 Berlin


On Wed, 18 Feb 2009, Mehmet Topsakal wrote:


Hi,

I'm an experienced user of VASP. Nowadays i'm trying to learn Siesta. During 
my simple tests
i have realized that parallellization of Siesta is obviously poor than VASP. 
I'm using latest intel
ifort (11) and mkl 10.1. My system is qual core xeon 2.33 with infiniband (4 
core in one node).


I've chosen siesta-2.0.2/Tests/si64/ as input. To see the realistic 
effects, i have increased kpoints

distorted 1 atom and increased mesh cut-off.

2CPU 4CPU 8CPU parallellization of vasp is nearly linear. However siesta's 
performance is poor.


CLOCK results are as below.

2CPU -
Start of run 0.000
-- end of scf step65.702
-- end of scf step   123.005
-- end of scf step   179.978
-- end of scf step   236.833
-- end of scf step   293.613
-- end of scf step   350.316
-- end of scf step   407.082
-- end of scf step   463.780
-- end of scf step   520.457
-- end of scf step   577.072
-- end of scf step   633.713
-- end of scf step   690.372
-- end of scf step   747.105
-- end of scf step   803.680
-- end of scf step   860.304
-- end of scf step   916.886
-- end of scf step   920.865
--- end of geometry step   920.887

4CPU -
Start of run 0.000
-- end of scf step52.757
-- end of scf step99.481
-- end of scf step   145.754
-- end of scf step   191.974
-- end of scf step   238.180
-- end of scf step   284.612
-- end of scf step   330.736
-- end of scf step   377.200
-- end of scf step   423.579
-- end of scf step   469.623
-- end of scf step   515.901
-- end of scf step   561.912
-- end of scf step   608.275
-- end of scf step   654.488
-- end of scf step   700.990
-- end of scf step   747.432
-- end of scf step   749.578
--- end of geometry step   749.604
End of run   749.843

8CPU -
Start of run 0.000
-- end of scf step57.490
-- end of scf step   106.014
-- end of scf step   154.452
-- end of scf step   202.971
-- end of scf step   251.328
-- end of scf step   299.604
-- end of scf step   348.336
-- end of scf step   396.550
-- end of scf step   445.203
-- end of scf step   493.459
-- end of scf step   541.900
-- end of scf step   590.203
-- end of scf step   638.980
-- end of scf step   687.550
-- end of scf step   735.906
-- end of scf step   784.315
-- end of scf step   785.593
--- end of geometry step   785.667
End of run   786.080

I've done more tests and observed no difference. Even serial version
is faster than some parallel jobs :(.

Am I doing something WRONG ?

Thanks.

I'm sending my input and output file below.

# 
-

# FDF for a cubic c-Si supercell with 64 atoms
#
# E. Artacho, April 1999
# 
-


SystemName  64-atom silicon
SystemLabel si64

NumberOfAtoms   64
NumberOfSpecies 1

%block ChemicalSpeciesLabel
1  14  Si
%endblock ChemicalSpeciesLabel

PAO.BasisSize   SZ
PAO.EnergyShift 300 meV

LatticeConstant 5.430 Ang
%block LatticeVectors
2.000  0.000  0.000
0.000  2.000  0.000
0.000  0.000  2.000
%endblock LatticeVectors

%block kgrid_Monkhorst_Pack
7 0 0 0.0
0 7 0 0.0
0 0 7 0.0
%endblock kgrid_Monkhorst_Pack

MeshCutoff  100.0 Ry

MaxSCFIterations50 DM.MixingWeight  0.3
DM.NumberPulay   3 DM.Tolerance 1.d-4
DM.UseSaveDM 
SolutionMethod   diagon   ElectronicTemperature  25 meV 
MD.TypeOfRun cg

MD.NumCGsteps0
MD.MaxCGDispl 0.1  Ang

Re: [SIESTA-L] Obviously poor PARALLEL performance compared toVASP

2009-02-18 Thread Lucas Fernandez Seivane
Dear All

Look at the input. Your system is too small to see any benefit from
parallelizing over orbitals (default behaviour):
initatomlists: Number of atoms, orbitals, and projectors: 64   256   576
* Maximum dynamic memory allocated =16 MB   (this is per node, so
probably 8*16=128 MB or something similar)

If you use paralleloverK, results should be ok. If you want to test
the domain decomposition performance, a couple of suggestions:

1) Increase grid (to 200-400 Ry)
2) Increase BasisSize to DZP
3) If you are still not satisfied, even create a supercell cell, like
2 * 2 * 2  of the original cell (thus having 512 atoms).

Check your results, you should get a good speedup from 2 to 4 and
probably even to 8 procs.

Regards

Lucas

On Wed, Feb 18, 2009 at 11:26 PM, Marcel Mohr
mar...@physik.tu-berlin.de wrote:
 did you try

 diag.paralleloverk T  ?

 Regards
 Marcel

 
 Marcel Mohr Institut für Festkörperphysik, TU Berlin
 marcel(at)physik.tu-berlin.de   Sekr. EW 5-4
 TEL: +49-30-314 24442   Hardenbergstr. 36
 FAX: +49-30-314 27705   10623 Berlin


 On Wed, 18 Feb 2009, Mehmet Topsakal wrote:

 Hi,

 I'm an experienced user of VASP. Nowadays i'm trying to learn Siesta.
 During my simple tests
 i have realized that parallellization of Siesta is obviously poor than
 VASP. I'm using latest intel
 ifort (11) and mkl 10.1. My system is qual core xeon 2.33 with infiniband
 (4 core in one node).

 I've chosen siesta-2.0.2/Tests/si64/ as input. To see the realistic
 effects, i have increased kpoints
 distorted 1 atom and increased mesh cut-off.

 2CPU 4CPU 8CPU parallellization of vasp is nearly linear. However siesta's
 performance is poor.

 CLOCK results are as below.

 2CPU -
 Start of run 0.000
 -- end of scf step65.702
 -- end of scf step   123.005
 -- end of scf step   179.978
 -- end of scf step   236.833
 -- end of scf step   293.613
 -- end of scf step   350.316
 -- end of scf step   407.082
 -- end of scf step   463.780
 -- end of scf step   520.457
 -- end of scf step   577.072
 -- end of scf step   633.713
 -- end of scf step   690.372
 -- end of scf step   747.105
 -- end of scf step   803.680
 -- end of scf step   860.304
 -- end of scf step   916.886
 -- end of scf step   920.865
 --- end of geometry step   920.887

 4CPU -
 Start of run 0.000
 -- end of scf step52.757
 -- end of scf step99.481
 -- end of scf step   145.754
 -- end of scf step   191.974
 -- end of scf step   238.180
 -- end of scf step   284.612
 -- end of scf step   330.736
 -- end of scf step   377.200
 -- end of scf step   423.579
 -- end of scf step   469.623
 -- end of scf step   515.901
 -- end of scf step   561.912
 -- end of scf step   608.275
 -- end of scf step   654.488
 -- end of scf step   700.990
 -- end of scf step   747.432
 -- end of scf step   749.578
 --- end of geometry step   749.604
 End of run   749.843

 8CPU -
 Start of run 0.000
 -- end of scf step57.490
 -- end of scf step   106.014
 -- end of scf step   154.452
 -- end of scf step   202.971
 -- end of scf step   251.328
 -- end of scf step   299.604
 -- end of scf step   348.336
 -- end of scf step   396.550
 -- end of scf step   445.203
 -- end of scf step   493.459
 -- end of scf step   541.900
 -- end of scf step   590.203
 -- end of scf step   638.980
 -- end of scf step   687.550
 -- end of scf step   735.906
 -- end of scf step   784.315
 -- end of scf step   785.593
 --- end of geometry step   785.667
 End of run   786.080

 I've done more tests and observed no difference. Even serial version
 is faster than some parallel jobs :(.

 Am I doing something WRONG ?

 Thanks.

 I'm sending my input and output file below.

 #
 

Re: [SIESTA-L] Obviously poor PARALLEL performance compared toVASP

2009-02-18 Thread Mehmet Topsakal

Thank you very much.

diag.paralleloverk T worked.

New good  timings are as follows:

2cpu---
Start of run 0.000
-- end of scf step35.661
-- end of scf step64.727
-- end of scf step93.779
-- end of scf step   122.809
-- end of scf step   151.815
-- end of scf step   180.833
-- end of scf step   209.842
-- end of scf step   238.843
-- end of scf step   267.856
-- end of scf step   296.861
-- end of scf step   325.862
-- end of scf step   354.869
-- end of scf step   358.900
--- end of geometry step   358.910
End of run   359.189

4cpu---
Start of run 0.000
-- end of scf step19.362
-- end of scf step33.702
-- end of scf step47.954
-- end of scf step62.193
-- end of scf step76.570
-- end of scf step90.962
-- end of scf step   105.142
-- end of scf step   119.510
-- end of scf step   133.742
-- end of scf step   147.781
-- end of scf step   162.087
-- end of scf step   176.279
-- end of scf step   178.423
--- end of geometry step   178.436
End of run   178.689

8cpu---
Start of run 0.000
-- end of scf step10.202
-- end of scf step18.363
-- end of scf step26.259
-- end of scf step34.359
-- end of scf step42.234
-- end of scf step50.079
-- end of scf step57.931
-- end of scf step65.881
-- end of scf step73.824
-- end of scf step81.722
-- end of scf step89.570
-- end of scf step97.569
-- end of scf step98.847
--- end of geometry step99.129
End of run   100.093

16cpu---
Start of run 0.000
-- end of scf step 9.298
-- end of scf step13.831
-- end of scf step18.210
-- end of scf step22.586
-- end of scf step26.970
-- end of scf step31.506
-- end of scf step35.914
-- end of scf step40.253
-- end of scf step44.620
-- end of scf step49.005
-- end of scf step53.384
-- end of scf step57.777
-- end of scf step58.435
--- end of geometry step58.505
End of run58.906

Marcel Mohr wrote:

did you try

diag.paralleloverk T  ?

Regards
Marcel


Marcel MohrInstitut für Festkörperphysik, TU Berlin
marcel(at)physik.tu-berlin.deSekr. EW 5-4
TEL: +49-30-314 24442Hardenbergstr. 36
FAX: +49-30-314 2770510623 Berlin


On Wed, 18 Feb 2009, Mehmet Topsakal wrote:


Hi,

I'm an experienced user of VASP. Nowadays i'm trying to learn Siesta. 
During my simple tests
i have realized that parallellization of Siesta is obviously poor 
than VASP. I'm using latest intel
ifort (11) and mkl 10.1. My system is qual core xeon 2.33 with 
infiniband (4 core in one node).


I've chosen siesta-2.0.2/Tests/si64/ as input. To see the realistic 
effects, i have increased kpoints

distorted 1 atom and increased mesh cut-off.

2CPU 4CPU 8CPU parallellization of vasp is nearly linear. However 
siesta's performance is poor.


CLOCK results are as below.

2CPU -
Start of run 0.000
-- end of scf step65.702
-- end of scf step   123.005
-- end of scf step   179.978
-- end of scf step   236.833
-- end of scf step   293.613
-- end of scf step   350.316
-- end of scf step   407.082
-- end of scf step   463.780
-- end of scf step   520.457
-- end of scf step   577.072
-- end of scf step   633.713
-- end of scf step   690.372
-- end of scf step   747.105
-- end of scf step   803.680
-- end of scf step   860.304
-- end of scf step   916.886
-- end of scf step   920.865
--- end of geometry step   

Re: [SIESTA-L] About wxml

2009-02-18 Thread apostnik
 I have recently performed a calculation of the PDOS for a system with 323
 atoms. Actually, not only for this one, but for other 30 systems as well
 (thank the heavens the other 30 systems have 90 atoms, at most...).
 However, I noticed that I had compiled siesta without the -DWXML_INIT_FLAG
 when I tried to extract the PDOS with the pdoswxml utility. The thiing is,
 all my calculations were made non-spin-polarized and still, the PDOS shows
 an '*' (asterisk) when it comes to the field of the quantum number 'm'.
 Why is that so? Putting the question more clearly, why is it that, in a
 non-spin-polarized calculation, we still have the field 'm' appearing in
 the PDOS file?

Dear Marcos:
this is so because m,
even if called magnetic quantum number in textbooks,
has nothing to do with spin and always runs from -l to l
(and this not only in Siesta).
The spin-up / spin-down values are shown in PDOS
in two columns (if a spin-polarized calculation) for each
(n. l, m, zeta) entry.

 In a more practical note, if I specify that I want only the pdos
 corresponding to the angular momentum channel (say) p of a certain
 species, does pdoswxml disregard the information on m, even if it is wrong
 but in the right format? For example, suppose that I replaced all the *
 for 0 or 1 and ask pdoswxml to get only the pdos pertaining to a given
 angular momentum of the system, does it still give me the pdos correctly?

No point to replace them by 0 or 1 because in fact they stand
for -1, -2 etc. In your system with 323 atoms, I guess the most
do not have l=2. Use a simple script
  sed s/'m=\*'/'m=-1'/g  PDOS  PDOS_corrected
to fix the problem for all atoms but those with l=2.
For them, you'll have,
after applying the above script, the sequence
m=-1  m=-1  m=0  m=1  m=2
in the PDOS, instead of the correct one
m=-2  m=-1  m=0  m=1  m=2
The rest you correct by hand, or invent a more sophisticated script.
(Similar if you have l=3 atoms).

Good luck

Andrei Postnikov