Dear Paolo,
    Thank you so much for reply.
    Sorry for my previous unclear post. I will try to make my statement clear in this post.
    At the end of this post, I attached my scf.in file.
    
    First, I run scf for different mpi number like this
    mpiexec.hydra -n ${mpinum} pw.x -in scf.in > scf.out

    And then I collected all the timing result in the end of scf.out for different mpi number

1->     PWSCF        :  3m47.62s CPU     3m54.05s WALL
4->     PWSCF        :    56.51s CPU        57.83s WALL
8->     PWSCF        :    31.30s CPU        32.78s WALL
12->     PWSCF        :    24.21s CPU        25.06s WALL
16->     PWSCF        :    17.67s CPU        18.60s WALL
20->     PWSCF        :    14.03s CPU        15.26s WALL
24->     PWSCF        :    13.53s CPU        14.44s WALL
25->     PWSCF        :    12.13s CPU        14.05s WALL
28->     PWSCF        :    11.80s CPU        12.69s WALL
32->     PWSCF        :    13.45s CPU        16.12s WALL

cpu time vs mpi num plot is here : https://pasteboard.co/GKUXhL4.png
then I define, total cpu time = cpu_time x mpi_num, for example, for 32 mpinum result, total cpu time is 32x13.45s=430.4s
total cpu time vs mpi num plot is here : https://pasteboard.co/GKUYkD4.png
We can see that the scaling is not good. A perfect linear scaling should be a horizontal line, am I right?

So I thought maybe add k point parallelization will have better scaling. So I tried three case below, since there are 10 kpoints

mpiexec.hydra -n 30 pw.x -npool 2 -in scf.in > scf.out
mpiexec.hydra -n 30 pw.x -npool 5 -in scf.in > scf.out
mpiexec.hydra -n 30 pw.x -npool 10 -in scf.in > scf.out

The timing result is 
-npool 2  -> PWSCF        :    14.89s CPU        15.88s WALL
-npool 5 -> PWSCF        :    27.45s CPU        28.95s WALL
-npool 10 -> PWSCF        :  0m53.52s CPU     1m 8.13s WALL

Clearly, the scaling is extremely worse with npool parallelization. So what is wrong?

best regards

-----------------
below is scf.in file

&CONTROL
prefix='bi2se3_mpi',
calculation='scf',
restart_mode='from_scratch',
wf_collect=.true.,
verbosity='high',
tstress=.true.,
tprnfor=.true.,
forc_conv_thr=1d-4,
outdir='./qe_tmpdir',
pseudo_dir = './pseudo', 
/
&SYSTEM
ibrav = 5,
celldm(1)=18.59579532204d0,celldm(4)=0.9113725833268d0,
nat = 5,ntyp = 3,
ecutwfc = 40,ecutrho = 433,
/
&ELECTRONS
conv_thr = 1.0d-10,  
/
&IONS
/
&CELL
press_conv_thr=0.1d0
cell_dofree='all',
/
ATOMIC_SPECIES
Bi 208.98040   Bi.pbe-dn-kjpaw_psl.0.2.2.UPF
Se1 78.971 Se.pbe-n-kjpaw_psl.0.2.UPF
Se2 78.971 Se.pbe-n-kjpaw_psl.0.2.UPF
ATOMIC_POSITIONS crystal
Bi 0.4008d0 0.4008d0 0.4008d0
Bi 0.5992d0 0.5992d0 0.5992d0
Se2 0.2117d0 0.2117d0 0.2117d0
Se2 0.7883d0 0.7883d0 0.7883d0
Se1 0.d0 0.d0 0.d0
K_POINTS automatic 
4 4 4 1 1 1
_______________________________________________
Pw_forum mailing list
[email protected]
http://pwscf.org/mailman/listinfo/pw_forum

Reply via email to