It seems to me that scaling is quite good up to 16-20 processors for plane-wave parallelization. It is not easy to obtain better results. The effectiveness of k-point parallelization depends a lot on how much k-point-independent parts of the code weigh on the overall performances. In this specific case, k-point parallelization is not as good as it could be. Improving it requires to work on the code.
Paolo On Mon, Sep 18, 2017 at 12:02 PM, balabi <[email protected]> wrote: > Dear Paolo, > Thank you so much for reply. > Sorry for my previous unclear post. I will try to make my statement > clear in this post. > At the end of this post, I attached my scf.in file. > > First, I run scf for different mpi number like this > mpiexec.hydra -n ${mpinum} pw.x -in scf.in > scf.out > > And then I collected all the timing result in the end of scf.out for > different mpi number > > 1-> PWSCF : 3m47.62s CPU 3m54.05s WALL > 4-> PWSCF : 56.51s CPU 57.83s WALL > 8-> PWSCF : 31.30s CPU 32.78s WALL > 12-> PWSCF : 24.21s CPU 25.06s WALL > 16-> PWSCF : 17.67s CPU 18.60s WALL > 20-> PWSCF : 14.03s CPU 15.26s WALL > 24-> PWSCF : 13.53s CPU 14.44s WALL > 25-> PWSCF : 12.13s CPU 14.05s WALL > 28-> PWSCF : 11.80s CPU 12.69s WALL > 32-> PWSCF : 13.45s CPU 16.12s WALL > > cpu time vs mpi num plot is here : https://pasteboard.co/GKUXhL4.png > then I define, total cpu time = cpu_time x mpi_num, for example, for 32 > mpinum result, total cpu time is 32x13.45s=430.4s > total cpu time vs mpi num plot is here : https://pasteboard.co/GKUYkD4.png > We can see that the scaling is not good. A perfect linear scaling should > be a horizontal line, am I right? > > So I thought maybe add k point parallelization will have better scaling. > So I tried three case below, since there are 10 kpoints > > mpiexec.hydra -n 30 pw.x -npool 2 -in scf.in > scf.out > mpiexec.hydra -n 30 pw.x -npool 5 -in scf.in > scf.out > mpiexec.hydra -n 30 pw.x -npool 10 -in scf.in > scf.out > > The timing result is > -npool 2 -> PWSCF : 14.89s CPU 15.88s WALL > -npool 5 -> PWSCF : 27.45s CPU 28.95s WALL > -npool 10 -> PWSCF : 0m53.52s CPU 1m 8.13s WALL > > Clearly, the scaling is extremely worse with npool parallelization. So > what is wrong? > > best regards > > ----------------- > below is scf.in file > > &CONTROL > prefix='bi2se3_mpi', > calculation='scf', > restart_mode='from_scratch', > wf_collect=.true., > verbosity='high', > tstress=.true., > tprnfor=.true., > forc_conv_thr=1d-4, > outdir='./qe_tmpdir', > pseudo_dir = './pseudo', > / > &SYSTEM > ibrav = 5, > celldm(1)=18.59579532204d0,celldm(4)=0.9113725833268d0, > nat = 5,ntyp = 3, > ecutwfc = 40,ecutrho = 433, > / > &ELECTRONS > conv_thr = 1.0d-10, > / > &IONS > / > &CELL > press_conv_thr=0.1d0 > cell_dofree='all', > / > ATOMIC_SPECIES > Bi 208.98040 Bi.pbe-dn-kjpaw_psl.0.2.2.UPF > Se1 78.971 Se.pbe-n-kjpaw_psl.0.2.UPF > Se2 78.971 Se.pbe-n-kjpaw_psl.0.2.UPF > ATOMIC_POSITIONS crystal > Bi 0.4008d0 0.4008d0 0.4008d0 > Bi 0.5992d0 0.5992d0 0.5992d0 > Se2 0.2117d0 0.2117d0 0.2117d0 > Se2 0.7883d0 0.7883d0 0.7883d0 > Se1 0.d0 0.d0 0.d0 > K_POINTS automatic > 4 4 4 1 1 1 > > _______________________________________________ > Pw_forum mailing list > [email protected] > http://pwscf.org/mailman/listinfo/pw_forum > -- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, Univ. Udine, via delle Scienze 208, 33100 Udine, Italy Phone +39-0432-558216, fax +39-0432-558222
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
