On Apr 5, 2011, at 19:54 , Markus Meinert wrote: > I used an _unshifted_ k-mesh
it doesn't matter if it is shifted or unshifted: only the number of k- points matters for k-point parallelization. > The slab has 20 k points. 20 k-points on 3 processors = 7+7+6: load balancing is not ideal. This is likely to be a minor factor, though. > But, since a single iteration takes about 100 seconds, I do not > see where the time is being spent, when the k points are independent. you do not see because you do not know where to look. Not that it is explained somewhere...have a look into the final report: * the time spent in "c_bands" and called routines is proportional to the number of k-points, so it will scale linearly with the number of "k-point pools" * the time spent in "sum_band" is only in part proportional to the number of k-points and will partially scale * the time spent in "v_of_rho", "newd", "mix_rho", is independent upon the number of of k-points and will not scale at all * k-point parallelization does not reduce memory * The rest is usually irrelevant Also note that * FFT parallelization distributes most memory * FFT parallelization speeds up (with varying efficiency) almost all routines, with the exception of "cdiaghg" or "rdiaghg" * linear-algebra parallelization (that you are not using) will (not always) speed up "cdiaghg" or "rdiaghg" and distribute more memory Alles klar? P. --- Paolo Giannozzi, Dept of Chemistry&Physics&Environment, Univ. Udine, via delle Scienze 208, 33100 Udine, Italy Phone +39-0432-558216, fax +39-0432-558222
