Hi Kostya, I am not sure if that's the case but I also noticed similar problems in the past. My impression then was that some of the difference is due to the fact that the reported wall time includes the time it takes to read the initial and write the final restart files to the disk, in contrast to the reported CPU time. In some cases, if the cluster network is very loaded, it may take several minutes (!) to write big files (hundreds of MB). In CP the restart file is not partitioned as in PW, so there is a lot of traffic in collecting data from all the nodes and then actually writing it to the disk. For long runs you don't see the effect of the additional last disk write, but when having only 20 md steps, it may become dominant. Were you also writing intermediate restart files during the 20 steps of the benchmark?
Silviu. Konstantin Kudin wrote: > Hi, > > I've done some parallel benchmarks for the CP code so I thought I'd >share them with the rest of the group. The system we have is a cluster >of dual Opterons 2.0 Ghz with 1Gbit ethernet. > > I looked at 2 different measures of time, CPU time, and wall time >computed as the difference between "This run was started" and "This run >was terminated". By the way, such wall time could probably be printed >by the code directly to be readily available. > > The system is a reasonably sized simulation cell with 20 CP >(electronic+ionic) steps total. > > The compiler is IFC 9.0, GOTO library is for BLAS, and mpich 1.2.6 >used for the MPI. The CP version is the CVS from Aug. 20, 2005. > > What is crazy is that even for 2 cpus sitting in the same box there is >lots of cpu time just lost somewhere. The strange thing is that the >quad we have at 2.2 Ghz seems to lose just as much wall time as 2 duals >talking across the network. And note how 4 cpus are barely better than >2x compared to single cpu performance if the wall clock time is >considered. > > I know Nicola Marzari has done some parallel benchmarks, but I do not >think that wall times were being paid attention to ... > > Kostya > >P.S. Any suggestions what might be going on here? > > >Ncpu CPU time Wall time >1 1h22m 1h24m >2 45m33.41s 57m13s >4 27m30.80s 44m21s >6 18m22.71s 43m18s >8 14m53.91s 45m56s > >4(quad) 37m18.56s 45m32s > > > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com >_______________________________________________ >Pw_forum mailing list >Pw_forum at pwscf.org >http://www.democritos.it/mailman/listinfo/pw_forum > > -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Zilberman Silviu 213 Frick Laboratory, Department of Chemistry Princeton University Princeton, NJ 08544 phone: 609-258-1834 fax: 609-258-6746 silviu at Princeton.EDU %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -------------- next part -------------- A non-text attachment was scrubbed... Name: silviu.vcf Type: text/x-vcard Size: 272 bytes Desc: not available Url : /pipermail/attachments/20050921/9c03c57c/attachment.vcf
