Well, here's what I see with the IMB PingPong test using two ConnectX DDR cards
in each of 2 machines. I'm just quoting the last line at 10 repetitions of
the 4194304 bytes.
Scali_MPI_Connect-5.6.4-59151: (multi rail setup in /etc/dat.conf)
#bytes #repetitions t[usec] Mbytes/sec
4194304 10 2198.24 1819.63
mvapich2-1.2rc2: (MV2_NUM_HCAS=2 MV2_NUM_PORTS=1)
#bytes #repetitions t[usec] Mbytes/sec
4194304 10 2427.24 1647.96
OpenMPI SVN 19772:
#bytes #repetitions t[usec] Mbytes/sec
4194304 10 3676.35 1088.03
Repeatable within bounds.
This is OFED-1.3.1 and I peered at
/sys/class/infiniband/mlx4_0/ports/1/counters/port_rcv_packets
and
/sys/class/infiniband/mlx4_1/ports/1/counters/port_rcv_packets
on one of the 2 machines and looked at what happened for Scali
and OpenMPI.
Scali packets:
HCA 0 port1 = 115116625 - 114903198 = 213427
HCA 1 port1 = 78099566 - 77886143 = 213423
--------------------------------------------
426850
OpenMPI packets:
HCA 0 port1 = 115233624 - 115116625 = 116999
HCA 1 port1 = 78216425 - 78099566 = 116859
--------------------------------------------
233858
Scali is set up so that data larger than 8192 bytes is striped
across the 2 HCAs using 8192 bytes per HCA in a round robin fashion.
So, it seems that OpenMPI is using both ports but strangley ends
up with a Mbytes/sec rate which is worse than a single HCA only.
If I use a --mca btl_openib_if_exclude mlx41:1, we get
#bytes #repetitions t[usec] Mbytes/sec
4194304 10 3080.59 1298.45
So, what's taking so long? Is this a threading question?
DM
On Sun, 19 Oct 2008, Jeff Squyres wrote:
On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote:
Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine
to approach double the bandwidth on simple tests such as IMB PingPong?
Yes. OMPI will automatically (and aggressively) use as many active ports as
you have. So you shouldn't need to list devices+ports -- OMPI will simply
use all ports that it finds in the active state. If your ports are on
physically separate IB networks, then each IB network will require a
different subnet ID so that OMPI can compute reachability properly.
--
Jeff Squyres
Cisco Systems
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users