First, buffered send has only one main benefit, it double the amount of memory required for the communication. A side effect is that it increase the number of memory copies. The original buffer have to be copied in the attached buffer, then from this attached buffer the data will be moved into the shared memory region, and from there the receiver can finally copy the data in the receive buffer. In total there are 3 memory copies involved in this operation, which automatically limit the available bandwidth to a 1/3 of the available memory bandwidth on the architecture. Additionally, if the amount of data involved in this communication is large enough, the cache will be completely thrashed by the end of the communication.
Second, using buffered send requires asynchronous progress. If your code doesn't call any MPI communication functions, there is no reason that the data transfer take place at least not until the MPI_Buffer_detach function is called (or any other communication related MPI function).
george. On Jun 4, 2008, at 1:55 PM, Jeff Squyres wrote:
Thanks for all the detailed information! It is quite likely that our bsend performance has never been tuned; we simply implemented it, verified that it works, and then moved on -- we hadn't considered that real applications would actually use it. :-\ But that being said, 60% difference is a bit odd. Have you tried running with "--mca mpi_leave_pinned 1"? If all your sends are MPI_BSEND, it *may* not make a difference, but it could make a difference on the receive side. What are the typical communication patterns for your application? On Jun 2, 2008, at 3:39 PM, Ayer, Timothy C. wrote:We a performing a comparison of HPMPI versus OpenMPI using Infiniband and seeing a performance hit in the vicinity of 60% (OpenMPI is slower) on controlled benchmarks. Since everything else is similar, we suspect a problem with the way we are using or have installed OpenMPI. Please find attached the following info as requested from http://www.open-mpi.org/community/help/ <http://www.open-mpi.org/community/help/> Application: in house CFD solver using both point-point and collective operations. Also, for historical reasons it makes extensive use of BSEND. We recognize that BSEND's can be inefficient but it is not practical to change them at this time. We are trying to understand why the performance is so significantly different from HPMPI. The application is mixed FORTRAN 90 and C built with Portland Group compilers. HPMPI Version info: mpirun: HP MPI 02.02.05.00 Linux x86-64 major version 202 minor version 5 OpenMPI Version info: mpirun (Open MPI) 1.2.4 Report bugs to http://www.open-mpi.org/community/help/ <http://www.open-mpi.org/community/help/> Configuration info : The benchmark was a 4-processor job run on a single dual-socket dual core HP DL140G3 (Woodcrest 3.0) with 4 GB of memory. Each rank requires approximately 250MB of memory. 1) Output from ompi_info --all See attached file ompi_info_output.txt << File: ompi_info_output.txt >> Below is the output requested in the FAQ section: In order for us to help you, it is most helpful if you can run a few steps before sending an e-mail to both perform some basic troubleshooting and provide us with enough information about your environment to help you. Please include answers to the following questions in your e-mail: 1. Which OpenFabrics version are you running? Please specify where you got the software from (e.g., from the OpenFabrics community web site, from a vendor, or it was already included in your Linux distribution). We obtained the software from www.openfabrics.org <www.openfabrics.orgOutput from ofed_info command: OFED-1.1 openib-1.1 (REV=9905) # User space https://openib.org/svn/gen2/branches/1.1/src/userspace <https://openib.org/svn/gen2/branches/1.1/src/userspace> Git: ref: refs/heads/ofed_1_1 commit a083ec1174cb4b5a5052ef5de9a8175df82e864a # MPI mpi_osu-0.9.7-mlx2.2.0.tgz openmpi-1.1.1-1.src.rpm mpitests-2.0-0.src.rpm 2. What distro and version of Linux are you running? What is your kernel version? Linux xxxxxxxx 2.6.9-64.EL.IT133935.jbtest.1smp #1 SMP Fri Oct 19 11:28:12 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux 3. Which subnet manager are you running? (e.g., OpenSM, a vendor-specific subnet manager, etc.) We believe this to be HP or Voltaire but we are not certain how to determine this. 4. What is the output of the ibv_devinfo command on a known "good" node and a known "bad" node? (NOTE: there must be at least one port listed as "PORT_ACTIVE" for Open MPI to work. If there is not at least one PORT_ACTIVE port, something is wrong with your OpenFabrics environment and Open MPI will not be able to run). hca_id: mthca0 fw_ver: 1.2.0 node_guid: 001a:4bff:ff0b:5f9c sys_image_guid: 001a:4bff:ff0b:5f9f vendor_id: 0x08f1 vendor_part_id: 25204 hw_ver: 0xA0 board_id: VLT0030010001 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 161 port_lmc: 0x00 5. What is the output of the ifconfig command on a known "good" node and a known "bad" node? (mainly relevant for IPoIB installations) Note that some Linux distributions do not put ifconfig in the default path for normal users; look for it in /sbin/ifconfig or /usr/sbin/ifconfig. eth0 Link encap:Ethernet HWaddr 00:XX:XX:XX:XX:XX inet addr:X.Y.Z.Q Bcast:X.Y.Z.255 Mask:255.255.255.0 inet6 addr: X::X:X:X:X/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1021733054 errors:0 dropped:10717 overruns:0 frame:0 TX packets:1047320834 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1035986839096 (964.8 GiB) TX bytes:1068055599116 (994.7 GiB) Interrupt:169 ib0 Link encap:UNSPEC HWaddr 80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:A.B.C.D Bcast:A.B.C.255 Mask:255.255.255.0 inet6 addr: X::X:X:X:X/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:137021 errors:0 dropped:0 overruns:0 frame:0 TX packets:20 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:12570947 (11.9 MiB) TX bytes:1504 (1.4 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1498664 errors:0 dropped:0 overruns:0 frame:0 TX packets:1498664 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1190810468 (1.1 GiB) TX bytes:1190810468 (1.1 GiB) 6. If running under Bourne shells, what is the output of the "ulimit -l" command? If running under C shells, what is the output of the "limit | grep memorylocked" command? (NOTE: If the value is not "unlimited", this FAQ entry <http://www.open-mpi.org/faq/?category=openfabrics#ib-locked- pages> and this FAQ entry <http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-more). memorylocked 3500000 kbytes Gather up this information and see this page<http://www.open-mpi.org/community/help/> about how to submit a helprequest to the user's mailing list._______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users-- Jeff Squyres Cisco Systems _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
smime.p7s
Description: S/MIME cryptographic signature