Hi Tom,

users-requ...@open-mpi.org wrote:
I am pretty sure that LAM exploits the fact that the virtual processors are all
sharing the same memory,  so communication is via memory and/or the PCI bus
of the system, while my OPENMPI configuration doesn't exploit this.  Is this
a reasonable diagnosis of the dramatic difference in performance?  More

It would be more likely that OpenMPI is using shared memory and polling on it whereas LAM is using sockets, or at least blocking on something.

Polling is a bad thing when oversubscribing processor. When you block on a socket (or any OS handle), the process immediately yield the CPU and is removed from the scheduler. When you poll waiting for a send or receive to complete, you are burning cycles on the CPU and the scheduler will wait for the next quantum of time before running another process.

So, if you send a message between 2 processes sharing the same processor, the latency will be in the order of half of the scheduler quantum (10ms on Linux) if they are both polling. Things are much faster when processes are polling on different CPUs (1-2 us) but the blocking socket overhead (~20us) is way better than the quantum of time when you don't have several processors.

importantly, how to I reconfigure OPENMPI to match the LAM performance.

Try disabling the shared memory device in OpenMPI. Unfortunately, I have no clue how to do it.

Patrick
--
Patrick Geoffray
Myricom, Inc.
http://www.myri.com

Reply via email to