Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?
using 2 HCAs on the same PCI-Exp bus (as well as 2 ports from the same HCA) will not improve performance, PCI-Exp is the bottleneck. On Mon, Oct 20, 2008 at 2:28 AM, Mostyn Lewiswrote: > Well, here's what I see with the IMB PingPong test using two ConnectX DDR > cards > in each of 2 machines. I'm just quoting the last line at 10 repetitions of > the 4194304 bytes. > > Scali_MPI_Connect-5.6.4-59151: (multi rail setup in /etc/dat.conf) > #bytes #repetitions t[usec] Mbytes/sec > 4194304 10 2198.24 1819.63 > mvapich2-1.2rc2: (MV2_NUM_HCAS=2 MV2_NUM_PORTS=1) > #bytes #repetitions t[usec] Mbytes/sec > 4194304 10 2427.24 1647.96 > OpenMPI SVN 19772: > #bytes #repetitions t[usec] Mbytes/sec > 4194304 10 3676.35 1088.03 > > Repeatable within bounds. > > This is OFED-1.3.1 and I peered at > /sys/class/infiniband/mlx4_0/ports/1/counters/port_rcv_packets > and > /sys/class/infiniband/mlx4_1/ports/1/counters/port_rcv_packets > on one of the 2 machines and looked at what happened for Scali > and OpenMPI. > > Scali packets: > HCA 0 port1 = 115116625 - 114903198 = 213427 > HCA 1 port1 = 78099566 - 77886143 = 213423 > > 426850 > OpenMPI packets: > HCA 0 port1 = 115233624 - 115116625 = 116999 > HCA 1 port1 = 78216425 - 78099566 = 116859 > > 233858 > > Scali is set up so that data larger than 8192 bytes is striped > across the 2 HCAs using 8192 bytes per HCA in a round robin fashion. > > So, it seems that OpenMPI is using both ports but strangley ends > up with a Mbytes/sec rate which is worse than a single HCA only. > If I use a --mca btl_openib_if_exclude mlx41:1, we get > #bytes #repetitions t[usec] Mbytes/sec > 4194304 10 3080.59 1298.45 > > So, what's taking so long? Is this a threading question? > > DM > > > On Sun, 19 Oct 2008, Jeff Squyres wrote: > > On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote: >> >> Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine >>> to approach double the bandwidth on simple tests such as IMB PingPong? >>> >> >> >> Yes. OMPI will automatically (and aggressively) use as many active ports >> as you have. So you shouldn't need to list devices+ports -- OMPI will >> simply use all ports that it finds in the active state. If your ports are >> on physically separate IB networks, then each IB network will require a >> different subnet ID so that OMPI can compute reachability properly. >> >> -- >> Jeff Squyres >> Cisco Systems >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?
Well, here's what I see with the IMB PingPong test using two ConnectX DDR cards in each of 2 machines. I'm just quoting the last line at 10 repetitions of the 4194304 bytes. Scali_MPI_Connect-5.6.4-59151: (multi rail setup in /etc/dat.conf) #bytes #repetitions t[usec] Mbytes/sec 4194304 10 2198.24 1819.63 mvapich2-1.2rc2: (MV2_NUM_HCAS=2 MV2_NUM_PORTS=1) #bytes #repetitions t[usec] Mbytes/sec 4194304 10 2427.24 1647.96 OpenMPI SVN 19772: #bytes #repetitions t[usec] Mbytes/sec 4194304 10 3676.35 1088.03 Repeatable within bounds. This is OFED-1.3.1 and I peered at /sys/class/infiniband/mlx4_0/ports/1/counters/port_rcv_packets and /sys/class/infiniband/mlx4_1/ports/1/counters/port_rcv_packets on one of the 2 machines and looked at what happened for Scali and OpenMPI. Scali packets: HCA 0 port1 = 115116625 - 114903198 = 213427 HCA 1 port1 = 78099566 - 77886143 = 213423 426850 OpenMPI packets: HCA 0 port1 = 115233624 - 115116625 = 116999 HCA 1 port1 = 78216425 - 78099566 = 116859 233858 Scali is set up so that data larger than 8192 bytes is striped across the 2 HCAs using 8192 bytes per HCA in a round robin fashion. So, it seems that OpenMPI is using both ports but strangley ends up with a Mbytes/sec rate which is worse than a single HCA only. If I use a --mca btl_openib_if_exclude mlx41:1, we get #bytes #repetitions t[usec] Mbytes/sec 4194304 10 3080.59 1298.45 So, what's taking so long? Is this a threading question? DM On Sun, 19 Oct 2008, Jeff Squyres wrote: On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote: Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine to approach double the bandwidth on simple tests such as IMB PingPong? Yes. OMPI will automatically (and aggressively) use as many active ports as you have. So you shouldn't need to list devices+ports -- OMPI will simply use all ports that it finds in the active state. If your ports are on physically separate IB networks, then each IB network will require a different subnet ID so that OMPI can compute reachability properly. -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?
Jeff Squyres wrote: On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote: Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine to approach double the bandwidth on simple tests such as IMB PingPong? Yes. OMPI will automatically (and aggressively) use as many active ports as you have. So you shouldn't need to list devices+ports -- OMPI will simply use all ports that it finds in the active state. If your ports are on physically separate IB networks, then each IB network will require a different subnet ID so that OMPI can compute reachability properly. Does this apply to all fabrics, or, at which level is this implemented in ompi? (ie: multiple GigE nics...but I doubt it applies given the restricted intricacies of the IP implementation) Eric
Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?
On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote: Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine to approach double the bandwidth on simple tests such as IMB PingPong? Yes. OMPI will automatically (and aggressively) use as many active ports as you have. So you shouldn't need to list devices+ports -- OMPI will simply use all ports that it finds in the active state. If your ports are on physically separate IB networks, then each IB network will require a different subnet ID so that OMPI can compute reachability properly. -- Jeff Squyres Cisco Systems
[OMPI users] Can 2 IB HCAs give twice the bandwidth?
Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine to approach double the bandwidth on simple tests such as IMB PingPong? Regards, DM