Dear all, I have a basic question about parallelism and scaling factor.
First, I am running calculations on a cubic system with 58 atoms (alat=19.5652 a.u.), 540 electrons (324 KS states) and few k-points (4x4x4 grid=4 k-points), on 32 cores (4 nodes) but I can submit on many more. I guess the best thing to do is to parallelize the calculation on the bands but maybe also on the FFTs. We have an infiniband interconnection network between the nodes. What would you suggest as values for image/pools/ntg/bands ? I have made a SCF test calculation on 16 and 32 cores. For the SCF cycle (13 steps) I get the following timing: For 16 cores: total cpu time spent up to now is 22362.4 secs For 32 cores: total cpu time spent up to now is 17932.6 secs The speedup is "only" 25%. I would have expected a better speedup for such a small number of cores. Am i wrong? What is your experience? (For additional information, if helpful: QE 5.0.1 has been compiled with openMPI, intel 12.1 and FFTW 3.2.2.) thank you for your answers. Regards Pascal -------------- next part -------------- A non-text attachment was scrubbed... Name: pascal_boulet.vcf Type: text/x-vcard Size: 413 bytes Desc: not available Url : http://pwscf.org/pipermail/pw_forum/attachments/20130205/ee0e4031/attachment.vcf
