Hello, I'm using the OpenMPI version that is distributed with Fedora 8 (openmpi-1.2.4-1.fc8) on a dual Xeon 5335 (which is a quad core CPU), and therefore I have 8 cores in a shared memory environment.
AFAIK by default OpenMPI correctly uses shared memory communication (sm) without any extra parameter to mpirun, however the programs take longer and don't scale well for more than 4 processors. Here are some example timings for a simple MPI program (appended to this email): time mpirun -np N ./mpitest (the timings are the same for time mpirun --mca btl self,sm -np N) N t(s) t1/t ------------------------------- 1 35.7 1.0 2 18.8 1.9 3 12.7 2.8 4 10.2 3.5 5 8.2 4.4 6 8.0 4.4 7 7.2 5.0 8 6.4 5.6 You can see that processes 5 and up barely speeds up the process. However with tcp it has a nearly perfect scalling: time mpirun --mca btl self,tcp -np N N t(s) t1/t ------------------------------- 1 34.8 1.0 2 17.7 2.0 3 11.7 3.0 4 8.8 4.0 5 7.0 5.0 6 6.0 5.8 7 5.2 6.8 8 4.5 7.8 Why is this happening? Is this a bug? Best regards, João Silva P.S. Test program appended: ---------------------------------------------------------- #include "stdio.h" #include "math.h" #include "mpi.h" #define N 1000000000 int main(int argc, char* argv[]){ int i; /* Init MPI */ int np,p; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&np); MPI_Comm_rank(MPI_COMM_WORLD,&p); printf("Process #%d of %d\n", p+1, np); for (i = p*N/np; i < (p+1)*N/np; i++) { exp(i); } return 0; } ----------------------------------------------------------