So you mean it's not normal that bands.x takes more than 7 hours? What's suspicious is that the reported actual CPU time is much less, only 16 minutes. What could be the problem? Here's the output of a bands.x calculation:
Program BANDS v.5.1.2 starts on 5Dec2015 at 9:15:18 This program is part of the open-source Quantum ESPRESSO suite for quantum simulation of materials; please cite "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009); URL http://www.quantum-espresso.org", in publications or presentations arising from this work. More details at http://www.quantum-espresso.org/quote Parallel version (MPI), running on 64 processors R & G space division: proc/nbgrp/npool/nimage = 64 Reading data from directory: ./tmp/Ni3HTP2.save Info: using nr1, nr2, nr3 values from input Info: using nr1s, nr2s, nr3s values from input IMPORTANT: XC functional enforced from input : Exchange-correlation = SLA PW PBE PBE ( 1 4 3 4 0 0) Any further DFT definition will be discarded Please, verify this is what you really want file H.pbe-rrkjus.UPF: wavefunction(s) 1S renormalized file C.pbe-rrkjus.UPF: wavefunction(s) 2S 2P renormalized file N.pbe-rrkjus.UPF: wavefunction(s) 2S renormalized file Ni.pbe-nd-rrkjus.UPF: wavefunction(s) 4S renormalized Parallelization info -------------------- sticks: dense smooth PW G-vecs: dense smooth PW Min 588 588 151 92668 92668 12083 Max 590 590 152 92671 92671 12086 Sum 37643 37643 9677 5930831 5930831 773403 Check: negative/imaginary core charge= -0.000004 0.000000 negative rho (up, down): 2.225E-03 0.000E+00 high-symmetry point: 0.0000 0.0000 0.4981 x coordinate 0.0000 high-symmetry point: 0.3332 0.5780 0.4981 x coordinate 0.6672 high-symmetry point: 0.5000 0.2890 0.4981 x coordinate 1.0009 high-symmetry point: 0.0000 0.0000 0.4981 x coordinate 1.5784 Plottable bands written to file bands.out.gnu Bands written to file bands.out BANDS : 0h16m CPU 7h38m WALL This run was terminated on: 16:53:49 5Dec2015 =------------------------------------------------------------------------------= JOB DONE. =------------------------------------------------------------------------------= Am Samstag, 05. Dezember 2015 21:03 CET, stefano de gironcoli <[email protected]> schrieb: The only parallelization that i see in bands is the basic one over R & G. If it is different from the parallelization used previously you should use wf_collect. the code computes the overlap between the orbital at k and k+dk in order to decide how to connect them. it's an nbnd^2 operation done band by band. not very efficient evidently but it should not take hours. you can use wf_collect=.true. and increase the number of processors. stefano On 05/12/2015 12:57, Maxim Skripnik wrote:Thank you for the information. Yes, at the beginning of the pw.x output it says: Parallel version (MPI), running on 64 processors R & G space division: proc/nbgrp/npool/nimage = 64 Is bands.x parallelized at all? If so, where can I find information on that? There's nothing mentioned in the documentation: http://www.quantum-espresso.org/wp-content/uploads/Doc/pp_user_guide.pdf http://www.quantum-espresso.org/wp-content/uploads/Doc/INPUT_BANDS.html What could be the reason for bands.x taking many hours to calculate the bands? The foregoing pw.x calculation has already determined the energy for each k-point along a path (Gamma -> K -> M -> Gamma). There are 61 k-points and 129 bands. So what is bands.x actaully doing beside reformating that data? The input file job.bands looks like this: &bands prefix = 'st1' outdir = './tmp' / The calculation is initiated by mpirun -np 64 bands.x < job.bands Maxim Skripnik Department of Physics University of Konstanz Am Samstag, 05. Dezember 2015 02:37 CET, stefano de gironcoli <[email protected]> schrieb: On 04/12/2015 22:53, Maxim Skripnik wrote:Hello, I'm a bit confused by the parallelization scheme of QE. First of all, I run calculations on a cluster with usually 1 to 8 nodes, each of which has 16 cores. There is a very good scaling of pw.x e.g. for structural relaxation jobs. I do not specify any particular parallelization scheme as mentioned in the documentation, i.e. I start the calculations with mpirun -np 128 pw.x < job.pw on 8 nodes, 16 cores each. According to the documentation ni=1, nk=1 and nt=1. So in which respect are the calculations parallelized by default? Why do the calculations scale so well without specifying ni, nk, nt, nd? R and G parallelization is performed. wavefunctions' planewaves, density planewaves and slices of real space objects are distributed across 128 processors. A report of how this is done is given at the beginning of the output. Did you had a look to it ? Second question is, whether one can speed up bands.x calculations. Up to now I start these this way: mpirun -np 64 bands.x < job.bands on 4 nodes, 16 cores each. Does it make sense to define nb for bands.x? If yes, what would be reasonable values? expect no gain. band parallelization is not implemented in bands. stefano The systems of interest consist of typically ~50 atoms with periodic boundaries. Maxim Skripnik Department of Physics University of Konstanz _______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum _______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
