Dear Prof. Paolo and Prof. Lorenzo, Thank you for your thoughtful replies. I have carefully examined the program, and run it using the processor pools as well. It turns out that the memory swapping was the culprit; the total memory available for my usage was 32 GB. However, the program was using a virtual memory of over 90 GB, which was the bottleneck.
Thankfully, I was able to reduce the memory usage of the program by using the disk_io='high' and changing the mixing_ndim flag to 4, and now, the program seems to run fine. I would like to express my gratitude to you for helping me in this matter. Yours Sincerely, M Harshavardhan Fourth Year Undergraduate Engineering Physics IIT Madras On Thu, Jun 29, 2017 at 3:22 PM, Paolo Giannozzi <[email protected]> wrote: > Not sure it is a MPI problem: "fft_scatter" is where most of the > communications take place, but its wall and cpu time are not so > different. I think it is a problem of "swapping": the code requires > (much) more memory than it is available, spending most of the time > reading from disk the arrays it needs, writing to disk those it > doesn't need any longer. If disk_io='high', it might also be a problem > of I/O. > > Paolo > > On Thu, Jun 29, 2017 at 10:48 AM, Lorenzo Paulatto > <[email protected]> wrote: > > [re-sending to mailing list as I answered privately by mistake] > > > > Hello, > > > > On 29/06/17 09:57, Harsha Vardhan wrote: > >> I have observed that the c_bands and sum_bands routines are taking up > >> a huge amount of wall time, as compared to the CPU time. I am > >> attaching the time report for the completed calculation below: > >> > > > > the high wall times indicates a lot of MPI communication, which means > > that your simulation will probably run faster with less CPUs. Are you > > using as many pools as possible? Pool parallelism requires less > > communication. Here is an example syntax: > > > > mpirun -np 16 pw.x -npool 16 -in input > > > > The number of pool must be smaller than the number of CPUs and of the > > number of k-points. > > > > Also, having npool = n_kpoints - small_number is not a good idea, as > > most CPUs will have one k-point, while only small_number will have two, > > slowing everyone down (it would be more efficient to use less CPUS, i.e. > > npool=ncpus=n_kpoints/2) > > > > If you are already at maximum number of pools, you can try to reduce the > > number of MPI process and use openmp instead, be sure that the code is > > compiled with the --enable-openmp option and set the variable > > OMP_NUM_THREADS to the ratio ncpus/n_mpi_processes, e.g. with 16 CPUs: > > > > export OMP_NUM_THREADS=4 > > > > mpirun -x OMP_NUM_THREADS -np 4 pw.x -npool 4 -in input > > > > > > Finally, are you sure that you need a 9x9x1 grid of kpoints for an 8x8x1 > > supercell of graphene? This would be equivalent to using a 72x72x1 grid > > in the unit cell, which is quite enormous. > > > > > > hth > > > > -- > > Dr. Lorenzo Paulatto > > IdR @ IMPMC -- CNRS & Université Paris 6 > > phone: +33 (0)1 442 79822 / skype: paulatz > > www: http://www-int.impmc.upmc.fr/~paulatto/ > > mail: 23-24/423 Boîte courrier 115, 4 place Jussieu 75252 Paris Cédex 05 > > > > _______________________________________________ > > Pw_forum mailing list > > [email protected] > > http://pwscf.org/mailman/listinfo/pw_forum > > > > -- > Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, > Univ. Udine, via delle Scienze 208, 33100 Udine, Italy > Phone +39-0432-558216, fax +39-0432-558222 > > _______________________________________________ > Pw_forum mailing list > [email protected] > http://pwscf.org/mailman/listinfo/pw_forum >
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
