Hello,

I'm using the OpenMPI version that is distributed with Fedora 8
(openmpi-1.2.4-1.fc8) on a dual Xeon 5335 (which is a quad core CPU), and
therefore I have 8 cores in a shared memory environment.

AFAIK by default OpenMPI correctly uses shared memory communication (sm)
without any extra parameter to mpirun, however the programs take longer and
don't scale well for more than 4 processors. Here are some example timings
for a simple MPI program (appended to this email):

time mpirun -np N ./mpitest
(the timings are the same for time mpirun --mca btl self,sm -np N)

N     t(s)    t1/t
-------------------------------
1      35.7    1.0
2      18.8    1.9
3      12.7    2.8
4      10.2    3.5
5       8.2    4.4
6       8.0    4.4
7       7.2    5.0
8       6.4    5.6

You can see that processes 5 and up barely speeds up the process. However
with tcp it has a nearly perfect scalling:

time mpirun --mca btl self,tcp -np N

N    t(s)    t1/t
-------------------------------
1      34.8   1.0
2      17.7   2.0
3      11.7   3.0
4      8.8    4.0
5      7.0    5.0
6      6.0    5.8
7      5.2    6.8
8      4.5    7.8

Why is this happening? Is this a bug?

Best regards,
João Silva

P.S. Test program appended:

----------------------------------------------------------
#include "stdio.h"
#include "math.h"
#include "mpi.h"

#define N 1000000000

int main(int argc, char* argv[]){
        int i;

        /* Init MPI */
        int np,p;
        MPI_Init(&argc,&argv);
        MPI_Comm_size(MPI_COMM_WORLD,&np);
        MPI_Comm_rank(MPI_COMM_WORLD,&p);

        printf("Process #%d of %d\n", p+1, np);

        for (i = p*N/np; i < (p+1)*N/np; i++) {
                exp(i);
        }

        return 0;
}
----------------------------------------------------------


Reply via email to