Dear everyone,

I seem to have a misunderstanding on how the Diag.ParallellOverK
feature works, any comment would be much appreciated.

I've got a large metallic cell, though still with 9 k-points, that
runs on a quad PC; moreover, routine diagkp shows k-points are
distributed round robin among processes. Thus I was expecting
"mpirun -np 5 ..." to run significantly faster than "mpirun -np 4 ...",
as judged from the elapsed time of individual scf steps.
Clearly, in the latter case, the 9th k-point would be taken by
process 0 while the other three would remain waiting, right?.

However, my exppectations turned out to be wrong; in fact the
2nd alternative appears to be a tiny bit faster.
Why ?.

Thanks in advance,

Roberto P.

Responder a