If I correctly understand how you run your application I think I know where the problem is coming from. In few words you're using buffered send over shared memory.

First, buffered send has only one main benefit, it double the amount of memory required for the communication. A side effect is that it increase the number of memory copies. The original buffer have to be copied in the attached buffer, then from this attached buffer the data will be moved into the shared memory region, and from there the receiver can finally copy the data in the receive buffer. In total there are 3 memory copies involved in this operation, which automatically limit the available bandwidth to a 1/3 of the available memory bandwidth on the architecture. Additionally, if the amount of data involved in this communication is large enough, the cache will be completely thrashed by the end of the communication.

Second, using buffered send requires asynchronous progress. If your code doesn't call any MPI communication functions, there is no reason that the data transfer take place at least not until the MPI_Buffer_detach function is called (or any other communication related MPI function).

  george.


On Jun 4, 2008, at 1:55 PM, Jeff Squyres wrote:

Thanks for all the detailed information!

It is quite likely that our bsend performance has never been tuned; we
simply implemented it, verified that it works, and then moved on -- we
hadn't considered that real applications would actually use it.  :-\

But that being said, 60% difference is a bit odd.  Have you tried
running with "--mca mpi_leave_pinned 1"?  If all your sends are
MPI_BSEND, it *may* not make a difference, but it could make a
difference on the receive side.

What are the typical communication patterns for your application?



On Jun 2, 2008, at 3:39 PM, Ayer, Timothy C. wrote:



We a performing a comparison of HPMPI versus OpenMPI using
Infiniband and
seeing a performance hit in the vicinity of 60% (OpenMPI is slower)
on
controlled benchmarks.  Since everything else is similar, we
suspect a
problem with the way we are using or have installed OpenMPI.

Please find attached the following info as requested from
http://www.open-mpi.org/community/help/
<http://www.open-mpi.org/community/help/>

Application:  in house CFD solver using both point-point and
collective
operations. Also, for historical reasons it makes extensive use of
BSEND.
We recognize that BSEND's can be inefficient but it is not
practical to
change them at this time.  We are trying to understand why the
performance
is so significantly different from HPMPI.  The application is mixed
FORTRAN 90 and C built with Portland Group compilers.

HPMPI Version info:

mpirun: HP MPI 02.02.05.00 Linux x86-64
major version 202 minor version 5

OpenMPI Version info:

mpirun (Open MPI) 1.2.4
Report bugs to http://www.open-mpi.org/community/help/
<http://www.open-mpi.org/community/help/>



Configuration info :

The benchmark was a 4-processor job run on a single dual-socket
dual core
HP DL140G3 (Woodcrest 3.0) with 4 GB of memory.  Each rank requires
approximately 250MB of memory.

1) Output from ompi_info --all

See attached file ompi_info_output.txt
<< File: ompi_info_output.txt >>

Below is the output requested in the FAQ section:

In order for us to help you, it is most helpful if you can run a
few steps
before sending an e-mail to both perform some basic troubleshooting
and
provide us with enough information about your environment to help
you.
Please include answers to the following questions in your e-mail:


1.      Which OpenFabrics version are you running? Please specify where
you
got the software from (e.g., from the OpenFabrics community web
site, from
a vendor, or it was already included in your Linux distribution).

We obtained the software from  www.openfabrics.org <www.openfabrics.org


Output from ofed_info command:

OFED-1.1

openib-1.1 (REV=9905)
# User space
https://openib.org/svn/gen2/branches/1.1/src/userspace
<https://openib.org/svn/gen2/branches/1.1/src/userspace>
Git:
ref: refs/heads/ofed_1_1
commit a083ec1174cb4b5a5052ef5de9a8175df82e864a

# MPI
mpi_osu-0.9.7-mlx2.2.0.tgz
openmpi-1.1.1-1.src.rpm
mpitests-2.0-0.src.rpm



2.      What distro and version of Linux are you running? What is your
kernel version?

Linux xxxxxxxx 2.6.9-64.EL.IT133935.jbtest.1smp #1 SMP Fri Oct 19
11:28:12
EDT 2007 x86_64 x86_64 x86_64 GNU/Linux


3.      Which subnet manager are you running? (e.g., OpenSM, a
vendor-specific subnet manager, etc.)

We believe this to be HP or Voltaire but we are not certain how to
determine this.


4.      What is the output of the ibv_devinfo command on a known "good"
node
and a known "bad" node? (NOTE: there must be at least one port
listed as
"PORT_ACTIVE" for Open MPI to work. If there is not at least one
PORT_ACTIVE port, something is wrong with your OpenFabrics
environment and
Open MPI will not be able to run).

hca_id: mthca0
      fw_ver:                         1.2.0
      node_guid:                      001a:4bff:ff0b:5f9c
      sys_image_guid:                 001a:4bff:ff0b:5f9f
      vendor_id:                      0x08f1
      vendor_part_id:                 25204
      hw_ver:                         0xA0
      board_id:                       VLT0030010001
      phys_port_cnt:                  1
              port:   1
                      state:                  PORT_ACTIVE (4)
                      max_mtu:                2048 (4)
                      active_mtu:             2048 (4)
                      sm_lid:                 1
                      port_lid:               161
                      port_lmc:               0x00


5.      What is the output of the ifconfig command on a known "good" node
and a known "bad" node? (mainly relevant for IPoIB installations)
Note
that some Linux distributions do not put ifconfig in the default
path for
normal users; look for it in /sbin/ifconfig or /usr/sbin/ifconfig.

eth0      Link encap:Ethernet  HWaddr 00:XX:XX:XX:XX:XX
        inet addr:X.Y.Z.Q  Bcast:X.Y.Z.255  Mask:255.255.255.0
        inet6 addr: X::X:X:X:X/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
        RX packets:1021733054 errors:0 dropped:10717 overruns:0
frame:0
        TX packets:1047320834 errors:0 dropped:0 overruns:0
carrier:0
        collisions:0 txqueuelen:1000
        RX bytes:1035986839096 (964.8 GiB)  TX bytes:1068055599116
(994.7 GiB)
        Interrupt:169

ib0       Link encap:UNSPEC  HWaddr
80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
        inet addr:A.B.C.D  Bcast:A.B.C.255  Mask:255.255.255.0
        inet6 addr: X::X:X:X:X/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
        RX packets:137021 errors:0 dropped:0 overruns:0 frame:0
        TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:128
        RX bytes:12570947 (11.9 MiB)  TX bytes:1504 (1.4 KiB)

lo        Link encap:Local Loopback
        inet addr:127.0.0.1  Mask:255.0.0.0
        inet6 addr: ::1/128 Scope:Host
        UP LOOPBACK RUNNING  MTU:16436  Metric:1
        RX packets:1498664 errors:0 dropped:0 overruns:0 frame:0
        TX packets:1498664 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:1190810468 (1.1 GiB)  TX bytes:1190810468 (1.1 GiB)


6.      If running under Bourne shells, what is the output of the "ulimit
-l" command?
If running under C shells, what is the output of the "limit | grep
memorylocked" command?
(NOTE: If the value is not "unlimited", this FAQ entry
<http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-
pages>  and
this FAQ entry
<http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-more

).

memorylocked 3500000 kbytes

Gather up this information and see this page
<http://www.open-mpi.org/community/help/> about how to submit a help
request to the user's mailing list.


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to