[Pw_forum] abysmal parallel performance of the CP code

Konstantin Kudin Wed, 21 Sep 2005 10:48:32 -0700 (PDT)

 Hi,

 I've done some parallel benchmarks for the CP code so I thought I'd
share them with the rest of the group. The system we have is a cluster
of dual Opterons 2.0 Ghz with 1Gbit ethernet.


 I looked at 2 different measures of time, CPU time, and wall time
computed as the difference between "This run was started" and "This run
was terminated". By the way, such wall time could probably be printed
by the code directly to be readily available.

 The system is a reasonably sized simulation cell with 20 CP
(electronic+ionic) steps total.

 The compiler is IFC 9.0, GOTO library is for BLAS, and mpich 1.2.6
used for the MPI. The CP version is the CVS from Aug. 20, 2005.

 What is crazy is that even for 2 cpus sitting in the same box there is
lots of cpu time just lost somewhere. The strange thing is that the
quad we have at 2.2 Ghz seems to lose just as much wall time as 2 duals
talking across the network. And note how 4 cpus are barely better than
2x compared to single cpu performance if the wall clock time is
considered.

 I know Nicola Marzari has done some parallel benchmarks, but I do not
think that wall times were being paid attention to ...

 Kostya

P.S. Any suggestions what might be going on here?


Ncpu    CPU time        Wall time
1       1h22m           1h24m
2       45m33.41s       57m13s
4       27m30.80s       44m21s
6       18m22.71s       43m18s
8       14m53.91s       45m56s

4(quad) 37m18.56s       45m32s

 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

[Pw_forum] abysmal parallel performance of the CP code

Reply via email to