using 2 HCAs on the same PCI-Exp bus (as well as 2 ports from the same HCA) will not improve performance, PCI-Exp is the bottleneck.
On Mon, Oct 20, 2008 at 2:28 AM, Mostyn Lewis <mostyn.le...@sun.com> wrote: > Well, here's what I see with the IMB PingPong test using two ConnectX DDR > cards > in each of 2 machines. I'm just quoting the last line at 10 repetitions of > the 4194304 bytes. > > Scali_MPI_Connect-5.6.4-59151: (multi rail setup in /etc/dat.conf) > #bytes #repetitions t[usec] Mbytes/sec > 4194304 10 2198.24 1819.63 > mvapich2-1.2rc2: (MV2_NUM_HCAS=2 MV2_NUM_PORTS=1) > #bytes #repetitions t[usec] Mbytes/sec > 4194304 10 2427.24 1647.96 > OpenMPI SVN 19772: > #bytes #repetitions t[usec] Mbytes/sec > 4194304 10 3676.35 1088.03 > > Repeatable within bounds. > > This is OFED-1.3.1 and I peered at > /sys/class/infiniband/mlx4_0/ports/1/counters/port_rcv_packets > and > /sys/class/infiniband/mlx4_1/ports/1/counters/port_rcv_packets > on one of the 2 machines and looked at what happened for Scali > and OpenMPI. > > Scali packets: > HCA 0 port1 = 115116625 - 114903198 = 213427 > HCA 1 port1 = 78099566 - 77886143 = 213423 > -------------------------------------------- > 426850 > OpenMPI packets: > HCA 0 port1 = 115233624 - 115116625 = 116999 > HCA 1 port1 = 78216425 - 78099566 = 116859 > -------------------------------------------- > 233858 > > Scali is set up so that data larger than 8192 bytes is striped > across the 2 HCAs using 8192 bytes per HCA in a round robin fashion. > > So, it seems that OpenMPI is using both ports but strangley ends > up with a Mbytes/sec rate which is worse than a single HCA only. > If I use a --mca btl_openib_if_exclude mlx41:1, we get > #bytes #repetitions t[usec] Mbytes/sec > 4194304 10 3080.59 1298.45 > > So, what's taking so long? Is this a threading question? > > DM > > > On Sun, 19 Oct 2008, Jeff Squyres wrote: > > On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote: >> >> Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine >>> to approach double the bandwidth on simple tests such as IMB PingPong? >>> >> >> >> Yes. OMPI will automatically (and aggressively) use as many active ports >> as you have. So you shouldn't need to list devices+ports -- OMPI will >> simply use all ports that it finds in the active state. If your ports are >> on physically separate IB networks, then each IB network will require a >> different subnet ID so that OMPI can compute reachability properly. >> >> -- >> Jeff Squyres >> Cisco Systems >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >