On 02/14/2011 04:21 PM, Davide Sangalli wrote: > OK. I think it could be a "memory - cache" related problem. > > I did the same test with lower cut-off (still 6 CPUs). > Now my serial run used 0.2 % of the memory and my parallel around 1.3% > > The parallelization over the fft grid is still faster, but the kpts > parallelization now is faster than the serial run. > Serial: PWSCF : 1m 3.14s CPU > time, 1m 4.05s WALL time > fft grid parallelization: PWSCF : 15.46s CPU time, > 16.36s WALL time > kpts parallelization: PWSCF : 35.98s CPU time, > 36.47s WALL time > > Thank you and best regards, > Davide
Maybe I didn't explain myself. I was referring to the complete timings, also with the separate contribution from all the subroutines. GS > > > On 02/14/2011 02:49 PM, Gabriele Sclauzero wrote: >> Dear Davide, >> >> it might be a memory-contention problem, since CPU cache sizes are >> of the order of a few MB. Please provide the detailed timings at the >> end of the runs (that are the first thing one should look at in order >> to interpret these kind of speedup tests). >> >> Next time please take a few seconds to sign your post using full name >> and affiliation. >> >> Regards, >> >> GS >> >> >> Il giorno 14/feb/2011, alle ore 12.18, Davide Sangalli ha scritto: >> >>> Thank you for the answer. >>> >>> I did a check to be sure, but these jobs use only few MB of memory. >>> The serial run uses just 2.5% of my node memory (so around 15% for the >>> run on 6 CPUs). >>> It does not seems to me that this could be the problem. >>> Moreover in the fft parallelization the memory was not distributed >>> neither. >>> >>> Is it possible that pwscf is not properly compiled? >>> Is there any other check that you would suggest to do? >>> >>> Best regards, >>> Davide >>> >>> ************************************* >>> Largest allocated arrays est. size (Mb) dimensions >>> Kohn-Sham Wavefunctions 5.68 Mb ( 6422, 58) >>> NL pseudopotentials 13.33 Mb ( 6422, 136) >>> Each V/rho on FFT grid 7.81 Mb ( 512000) >>> Each G-vector array 1.76 Mb ( 230753) >>> G-vector shells 0.09 Mb ( 12319) >>> Largest temporary arrays est. size (Mb) dimensions >>> Auxiliary wavefunctions 22.73 Mb ( 6422, 232) >>> Each subspace H/S matrix 0.82 Mb ( 232, 232) >>> Each <psi_i|beta_j> matrix 0.12 Mb ( 136, 58) >>> Arrays for rho mixing 62.50 Mb ( 512000, 8) >>> writing wfc files to a dedicated directory >>> >>> >>> On 02/14/2011 11:34 AM, Paolo Giannozzi wrote: >>>> Davide Sangalli wrote: >>>> >>>>> What could my problem be? >>>> the only reason I can think of is that k-point parallelization doesn't >>>> (and cannot) distribute memory, so the total memory requirement will >>>> be npools*(size of serial execution). If you run on the same node six >>>> instances of a large executable, memory conflicts may slow down more >>>> than parallelization can speed up. >>>> >>>> P. >>> _______________________________________________ >>> Pw_forum mailing list >>> Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org> >>> http://www.democritos.it/mailman/listinfo/pw_forum >> >> >> ? Gabriele Sclauzero, EPFL SB ITP CSEA >> / PH H2 462, Station 3, CH-1015 Lausanne/ >> >> >> _______________________________________________ >> Pw_forum mailing list >> Pw_forum at pwscf.org >> http://www.democritos.it/mailman/listinfo/pw_forum > > > Davide Sangalli > MDM labs, IMM, CNR. > Agrate (MI) Italy > > > _______________________________________________ > Pw_forum mailing list > Pw_forum at pwscf.org > http://www.democritos.it/mailman/listinfo/pw_forum -- Gabriele Sclauzero, EPFL SB ITP CSEA PH H2 462, Station 3, CH-1015 Lausanne -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.democritos.it/pipermail/pw_forum/attachments/20110214/6d7a75c2/attachment.htm
