Sorry, I should go into a little bit more detail here. I used an _unshifted_ k-mesh. When I use a 8x8x8 mesh, yielding 58 points, I get a speedup of about 2 on three machines. With a 12x12x12 mesh with 144 points the speedup becomes better (factor 2.6 for cpu time and 2.3 for WALL). I used to think that's because of communication. In the former case, an iteration takes less than a second. With more k points, the speedup converges slowly towards 3.
The slab has 20 k points. But, since a single iteration takes about 100 seconds, I do not see where the time is being spent, when the k points are independent. Regards, Markus -- Dipl.-Phys. Markus Meinert Thin Films and Physics of Nanostructures Department of Physics Bielefeld University Universit?tsstra?e 25 33615 Bielefeld Room D2-118 e-mail: meinert at physik.uni-bielefeld.de Phone: +49 521 106 2661
