Well, here's what I see with the IMB PingPong test using two ConnectX DDR cards
in each of 2 machines. I'm just quoting the last line at 10 repetitions of
the 4194304 bytes.

Scali_MPI_Connect-5.6.4-59151: (multi rail setup in /etc/dat.conf)
       #bytes #repetitions      t[usec]   Mbytes/sec
      4194304           10      2198.24      1819.63
mvapich2-1.2rc2: (MV2_NUM_HCAS=2 MV2_NUM_PORTS=1)
       #bytes #repetitions      t[usec]   Mbytes/sec
      4194304           10      2427.24      1647.96
OpenMPI SVN 19772:
       #bytes #repetitions      t[usec]   Mbytes/sec
      4194304           10      3676.35      1088.03

Repeatable within bounds.

This is OFED-1.3.1 and I peered at
/sys/class/infiniband/mlx4_0/ports/1/counters/port_rcv_packets
and
/sys/class/infiniband/mlx4_1/ports/1/counters/port_rcv_packets
on one of the 2 machines and looked at what happened for Scali
and OpenMPI.

Scali packets:
HCA 0 port1 = 115116625 - 114903198 = 213427
HCA 1 port1 =  78099566 -  77886143 = 213423
--------------------------------------------
                                      426850
OpenMPI packets:
HCA 0 port1 = 115233624 - 115116625 = 116999
HCA 1 port1 =  78216425 -  78099566 = 116859
--------------------------------------------
                                      233858

Scali is set up so that data larger than 8192 bytes is striped
across the 2 HCAs using 8192 bytes per HCA in a round robin fashion.

So, it seems that OpenMPI is using both ports but strangley ends
up with a Mbytes/sec rate which is worse than a single HCA only.
If I use a --mca btl_openib_if_exclude mlx41:1, we get
       #bytes #repetitions      t[usec]   Mbytes/sec
      4194304           10      3080.59      1298.45

So, what's taking so long? Is this a threading question?

DM

On Sun, 19 Oct 2008, Jeff Squyres wrote:

On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote:

Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine
to approach double the bandwidth on simple tests such as IMB PingPong?


Yes. OMPI will automatically (and aggressively) use as many active ports as you have. So you shouldn't need to list devices+ports -- OMPI will simply use all ports that it finds in the active state. If your ports are on physically separate IB networks, then each IB network will require a different subnet ID so that OMPI can compute reachability properly.

--
Jeff Squyres
Cisco Systems

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to