Also notice that parallelization on k-points has (in principle) a linear speedup on the diagonalization of H and related operations depending on the number of k-points, but not for other operations depending upon the charge density such as calculation of V[n(r)]. The latter are typically small in comparison with the former, but it depends a lot upon the specific system. FFT parallelization distributes both calculations (and yes, it distributes most memory, I stand by my statement)
P. -- Paolo Giannozzi, Democritos and University of Udine, Italy
