Dear Paolo,
Thank you so much for reply.
Sorry for my previous unclear post. I will try to make my statement clear in this post.
At the end of this post, I attached my scf.in file.
First, I run scf for different mpi number like this
mpiexec.hydra -n ${mpinum} pw.x -in scf.in > scf.out
And then I collected all the timing result in the end of scf.out for different mpi number
1-> PWSCF : 3m47.62s CPU 3m54.05s WALL
4-> PWSCF : 56.51s CPU 57.83s WALL
8-> PWSCF : 31.30s CPU 32.78s WALL
12-> PWSCF : 24.21s CPU 25.06s WALL
16-> PWSCF : 17.67s CPU 18.60s WALL
20-> PWSCF : 14.03s CPU 15.26s WALL
24-> PWSCF : 13.53s CPU 14.44s WALL
25-> PWSCF : 12.13s CPU 14.05s WALL
28-> PWSCF : 11.80s CPU 12.69s WALL
32-> PWSCF : 13.45s CPU 16.12s WALL
cpu time vs mpi num plot is here : https://pasteboard.co/GKUXhL4.png
then I define, total cpu time = cpu_time x mpi_num, for example, for 32 mpinum result, total cpu time is 32x13.45s=430.4s
total cpu time vs mpi num plot is here : https://pasteboard.co/GKUYkD4.png
We can see that the scaling is not good. A perfect linear scaling should be a horizontal line, am I right?
So I thought maybe add k point parallelization will have better scaling. So I tried three case below, since there are 10 kpoints
mpiexec.hydra -n 30 pw.x -npool 2 -in scf.in > scf.out
mpiexec.hydra -n 30 pw.x -npool 5 -in scf.in > scf.out
mpiexec.hydra -n 30 pw.x -npool 10 -in scf.in > scf.out
The timing result is
-npool 2 -> PWSCF : 14.89s CPU 15.88s WALL
-npool 5 -> PWSCF : 27.45s CPU 28.95s WALL
-npool 10 -> PWSCF : 0m53.52s CPU 1m 8.13s WALL
Clearly, the scaling is extremely worse with npool parallelization. So what is wrong?
best regards
-----------------
below is scf.in file
&CONTROL
prefix='bi2se3_mpi',
calculation='scf',
restart_mode='from_scratch',
wf_collect=.true.,
verbosity='high',
tstress=.true.,
tprnfor=.true.,
forc_conv_thr=1d-4,
outdir='./qe_tmpdir',
pseudo_dir = './pseudo',
/
&SYSTEM
ibrav = 5,
celldm(1)=18.59579532204d0,celldm(4)=0.9113725833268d0,
nat = 5,ntyp = 3,
ecutwfc = 40,ecutrho = 433,
/
&ELECTRONS
conv_thr = 1.0d-10,
/
&IONS
/
&CELL
press_conv_thr=0.1d0
cell_dofree='all',
/
ATOMIC_SPECIES
Bi 208.98040 Bi.pbe-dn-kjpaw_psl.0.2.2.UPF
Se1 78.971 Se.pbe-n-kjpaw_psl.0.2.UPF
Se2 78.971 Se.pbe-n-kjpaw_psl.0.2.UPF
ATOMIC_POSITIONS crystal
Bi 0.4008d0 0.4008d0 0.4008d0
Bi 0.5992d0 0.5992d0 0.5992d0
Se2 0.2117d0 0.2117d0 0.2117d0
Se2 0.7883d0 0.7883d0 0.7883d0
Se1 0.d0 0.d0 0.d0
K_POINTS automatic
4 4 4 1 1 1
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
