Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?

2008-10-22 Thread Mike Dubman
using 2 HCAs on the same PCI-Exp bus (as well as 2 ports from the same HCA)
will not improve performance, PCI-Exp is the bottleneck.


On Mon, Oct 20, 2008 at 2:28 AM, Mostyn Lewis  wrote:

> Well, here's what I see with the IMB PingPong test using two ConnectX DDR
> cards
> in each of 2 machines. I'm just quoting the last line at 10 repetitions of
> the 4194304 bytes.
>
> Scali_MPI_Connect-5.6.4-59151: (multi rail setup in /etc/dat.conf)
>   #bytes #repetitions  t[usec]   Mbytes/sec
>  4194304   10  2198.24  1819.63
> mvapich2-1.2rc2: (MV2_NUM_HCAS=2 MV2_NUM_PORTS=1)
>   #bytes #repetitions  t[usec]   Mbytes/sec
>  4194304   10  2427.24  1647.96
> OpenMPI SVN 19772:
>   #bytes #repetitions  t[usec]   Mbytes/sec
>  4194304   10  3676.35  1088.03
>
> Repeatable within bounds.
>
> This is OFED-1.3.1 and I peered at
> /sys/class/infiniband/mlx4_0/ports/1/counters/port_rcv_packets
> and
> /sys/class/infiniband/mlx4_1/ports/1/counters/port_rcv_packets
> on one of the 2 machines and looked at what happened for Scali
> and OpenMPI.
>
> Scali packets:
> HCA 0 port1 = 115116625 - 114903198 = 213427
> HCA 1 port1 =  78099566 -  77886143 = 213423
> 
>  426850
> OpenMPI packets:
> HCA 0 port1 = 115233624 - 115116625 = 116999
> HCA 1 port1 =  78216425 -  78099566 = 116859
> 
>  233858
>
> Scali is set up so that data larger than 8192 bytes is striped
> across the 2 HCAs using 8192 bytes per HCA in a round robin fashion.
>
> So, it seems that OpenMPI is using both ports but strangley ends
> up with a Mbytes/sec rate which is worse than a single HCA only.
> If I use a --mca btl_openib_if_exclude mlx41:1, we get
>   #bytes #repetitions  t[usec]   Mbytes/sec
>  4194304   10  3080.59  1298.45
>
> So, what's taking so long? Is this a threading question?
>
> DM
>
>
> On Sun, 19 Oct 2008, Jeff Squyres wrote:
>
>  On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote:
>>
>>  Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine
>>> to approach double the bandwidth on simple tests such as IMB PingPong?
>>>
>>
>>
>> Yes.  OMPI will automatically (and aggressively) use as many active ports
>> as you have.  So you shouldn't need to list devices+ports -- OMPI will
>> simply use all ports that it finds in the active state.  If your ports are
>> on physically separate IB networks, then each IB network will require a
>> different subnet ID so that OMPI can compute reachability properly.
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?

2008-10-19 Thread Mostyn Lewis

Well, here's what I see with the IMB PingPong test using two ConnectX DDR cards
in each of 2 machines. I'm just quoting the last line at 10 repetitions of
the 4194304 bytes.

Scali_MPI_Connect-5.6.4-59151: (multi rail setup in /etc/dat.conf)
   #bytes #repetitions  t[usec]   Mbytes/sec
  4194304   10  2198.24  1819.63
mvapich2-1.2rc2: (MV2_NUM_HCAS=2 MV2_NUM_PORTS=1)
   #bytes #repetitions  t[usec]   Mbytes/sec
  4194304   10  2427.24  1647.96
OpenMPI SVN 19772:
   #bytes #repetitions  t[usec]   Mbytes/sec
  4194304   10  3676.35  1088.03

Repeatable within bounds.

This is OFED-1.3.1 and I peered at
/sys/class/infiniband/mlx4_0/ports/1/counters/port_rcv_packets
and
/sys/class/infiniband/mlx4_1/ports/1/counters/port_rcv_packets
on one of the 2 machines and looked at what happened for Scali
and OpenMPI.

Scali packets:
HCA 0 port1 = 115116625 - 114903198 = 213427
HCA 1 port1 =  78099566 -  77886143 = 213423

  426850
OpenMPI packets:
HCA 0 port1 = 115233624 - 115116625 = 116999
HCA 1 port1 =  78216425 -  78099566 = 116859

  233858

Scali is set up so that data larger than 8192 bytes is striped
across the 2 HCAs using 8192 bytes per HCA in a round robin fashion.

So, it seems that OpenMPI is using both ports but strangley ends
up with a Mbytes/sec rate which is worse than a single HCA only.
If I use a --mca btl_openib_if_exclude mlx41:1, we get
   #bytes #repetitions  t[usec]   Mbytes/sec
  4194304   10  3080.59  1298.45

So, what's taking so long? Is this a threading question?

DM

On Sun, 19 Oct 2008, Jeff Squyres wrote:


On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote:


Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine
to approach double the bandwidth on simple tests such as IMB PingPong?



Yes.  OMPI will automatically (and aggressively) use as many active ports as 
you have.  So you shouldn't need to list devices+ports -- OMPI will simply 
use all ports that it finds in the active state.  If your ports are on 
physically separate IB networks, then each IB network will require a 
different subnet ID so that OMPI can compute reachability properly.


--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?

2008-10-19 Thread Eric Thibodeau

Jeff Squyres wrote:

On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote:


Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine
to approach double the bandwidth on simple tests such as IMB PingPong?



Yes.  OMPI will automatically (and aggressively) use as many active 
ports as you have.  So you shouldn't need to list devices+ports -- 
OMPI will simply use all ports that it finds in the active state.  If 
your ports are on physically separate IB networks, then each IB 
network will require a different subnet ID so that OMPI can compute 
reachability properly.


Does this apply to all fabrics, or, at which level is this implemented 
in ompi? (ie: multiple GigE nics...but I doubt it applies given the 
restricted intricacies of the IP implementation)


Eric


Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?

2008-10-19 Thread Jeff Squyres

On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote:

Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per  
machine

to approach double the bandwidth on simple tests such as IMB PingPong?



Yes.  OMPI will automatically (and aggressively) use as many active  
ports as you have.  So you shouldn't need to list devices+ports --  
OMPI will simply use all ports that it finds in the active state.  If  
your ports are on physically separate IB networks, then each IB  
network will require a different subnet ID so that OMPI can compute  
reachability properly.


--
Jeff Squyres
Cisco Systems



[OMPI users] Can 2 IB HCAs give twice the bandwidth?

2008-10-18 Thread Mostyn Lewis

Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine
to approach double the bandwidth on simple tests such as IMB PingPong?

Regards,
DM