Please see http://www.mcs.anl.gov/petsc/documentation/faq.html#computers and note the information about "binding" options for MPICH and OpenMPI that can sometimes improve the streams performance (and hence other algorithms performance) a good amount.
Barry > On Apr 23, 2015, at 6:02 AM, siddhesh godbole <[email protected]> > wrote: > > Matt > > So that means the time on 10 processes in merely 1.8 times the time on 1 > process?? this is quite difficult to digest! Okay so if memory bandwidth a > controlling factor here how will forming a cluster with same machines solve > this problem? > my cpu has max memory bandwidth of 59 GB/s . > > > Apologies if the question are too silly! > > Siddhesh M Godbole > > 5th year Dual Degree, > Civil Eng & Applied Mech. > IIT Madras > > On Thu, Apr 23, 2015 at 4:22 PM, Matthew Knepley <[email protected]> wrote: > On Thu, Apr 23, 2015 at 5:47 AM, siddhesh godbole > <[email protected]> wrote: > Hello, > > I want to know about the test which is conducted just after the PETSC is > configured on the system to assess the possible speedup by MPI processes. I > have saved the result file which says: > Number of MPI processes 10 > Process 0 iitm > Process 1 iitm > Process 2 iitm > Process 3 iitm > Process 4 iitm > Process 5 iitm > Process 6 iitm > Process 7 iitm > Process 8 iitm > Process 9 iitm > Function Rate (MB/s) > Copy: 24186.8271 > Scale: 23914.0401 > Add: 27271.7149 > Triad: 27787.1630 > ------------------------------------------------ > np speedup > 1 1.0 > 2 1.75 > 3 1.86 > 4 1.84 > 5 1.85 > 6 1.83 > 7 1.76 > 8 1.79 > 9 1.8 > 10 1.8 > Estimation of possible speedup of MPI programs based on Streams benchmark. > > 1) What parameters the speedup depends on? > > I am not sure what you are asking here. Speedup is defined as the time T on 1 > process divided > by the time T_p on p processes: > > S = T/T_p > > 2) what are the hardware requirements for higher speedup? ( i was expecting > atleast 5 times speedup after generating 10 processes. > > STREAMS measures the speedup of vectors operations, which are very similar to > sparse matrix operations. Both > are limited by memory bandwidth. > > 3) what could possibly be done to improve this ? > > 1) You could buy more nodes, since each node has a path to memory > > 2) You could change algorithms, but this has proven very difficult > > Thanks, > > Matt > > i have intel® Core™ i7-4930K CPU @ 3.40GHz × 12 with 32 GB of RAM and 1 TB > disk space. > > > Thanks > Siddhesh M Godbole > > 5th year Dual Degree, > Civil Eng & Applied Mech. > IIT Madras > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener >
