One thing to look for is the process distribution. Based on the application communication pattern, the process distribution can have a tremendous impact on the execution time. Imagine that the application split the processes in two equal groups based on the rank and only communicate in each group. If such a group end-up on the same node, then it will use sm for communications. On the opposite, if they end- up spread across the nodes they will use TCP (which obviously has a bigger latency and a smaller bandwidth) and the overall performance will be greatly impacted.

By default, Open MPI use the following strategy to distribute processes: if a node has several processors, then consecutive ranks will be started on the same node. As an example in your case (2 nodes with 4 processors each), the ranks 0-3 will be started on the first host, while the ranks 4-7 on the second one. I don't know what is the default distribution for MPICH2 ...

Anyway, there is a easy way to check if the process distribution is the root of your problem. Please execute your application twice, once providing to mpirun the --bynode argument, and once with the --byslot.

  george.

On Oct 8, 2008, at 9:10 AM, Sangamesh B wrote:

Hi All,

I wanted to switch from mpich2/mvapich2 to OpenMPI, as OpenMPI supports both ethernet and infiniband. Before doing that I tested an application 'GROMACS' to compare the performance of MPICH2 & OpenMPI. Both have been compiled with GNU compilers.

After this benchmark, I came to know that OpenMPI is slower than MPICH2.

This benchmark is run on a AMD dual core, dual opteron processor. Both have compiled with default configurations.

The job is run on 2 nodes - 8 cores.

OpenMPI - 25 m 39 s.
MPICH2  -  15 m 53 s.

Any comments ..?

Thanks,
Sangamesh
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to