[re-sending to mailing list as I answered privately by mistake] Hello,
On 29/06/17 09:57, Harsha Vardhan wrote: > I have observed that the c_bands and sum_bands routines are taking up > a huge amount of wall time, as compared to the CPU time. I am > attaching the time report for the completed calculation below: > the high wall times indicates a lot of MPI communication, which means that your simulation will probably run faster with less CPUs. Are you using as many pools as possible? Pool parallelism requires less communication. Here is an example syntax: mpirun -np 16 pw.x -npool 16 -in input The number of pool must be smaller than the number of CPUs and of the number of k-points. Also, having npool = n_kpoints - small_number is not a good idea, as most CPUs will have one k-point, while only small_number will have two, slowing everyone down (it would be more efficient to use less CPUS, i.e. npool=ncpus=n_kpoints/2) If you are already at maximum number of pools, you can try to reduce the number of MPI process and use openmp instead, be sure that the code is compiled with the --enable-openmp option and set the variable OMP_NUM_THREADS to the ratio ncpus/n_mpi_processes, e.g. with 16 CPUs: export OMP_NUM_THREADS=4 mpirun -x OMP_NUM_THREADS -np 4 pw.x -npool 4 -in input Finally, are you sure that you need a 9x9x1 grid of kpoints for an 8x8x1 supercell of graphene? This would be equivalent to using a 72x72x1 grid in the unit cell, which is quite enormous. hth -- Dr. Lorenzo Paulatto IdR @ IMPMC -- CNRS & Université Paris 6 phone: +33 (0)1 442 79822 / skype: paulatz www: http://www-int.impmc.upmc.fr/~paulatto/ mail: 23-24/423 Boîte courrier 115, 4 place Jussieu 75252 Paris Cédex 05 _______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
