[OMPI users] RoCE device performance with large message size

2017-10-10 Thread Brendan Myers
Hello All,

I have a RoCE interoperability event starting next week and I was wondering
if anyone had any ideas to help me with a new vendor I am trying to help get
ready. 

I am using:

* Open MPI 2.1

* Intel MPI Benchmarks 2018

* OFED 3.18 (requirement from vendor)

* SLES 11 SP3 (requirement from vendor)

 

The problem seems to be that the device does not handle larger message sizes
well and I am sure they will be working on this but I am hoping there may be
a way to complete an IMB run with some Open MPI parameter tweaking.

Sample of IMB output from a Sendrecv benchmark:

 

262144  160   131.07   132.24   131.80  3964.56

   524288   80   277.42   284.57   281.57
3684.71

  1048576   40   461.16   474.83   470.02
4416.59

  20971523  1112.15   4294965.49   2147851.04
0.98

  41943042  2815.25   8589929.73   3222731.54
0.98

 

In red text is what looks like the problematic results. This happens on many
of the benchmarks at larger message sizes and causes either a major slowdown
or it causes the job to abort with error:

 

The InfiniBand retry count between two MPI processes has been exceeded.

 

If anyone has any thoughts on how I can complete the benchmarks without the
job aborting I would appreciate it. If anyone has ideas as to why a RoCE
device might show this issue I would take any information on offer. If more
data is required please let me know what is relevant.

 

 

Thank you,

Brendan T. W. Myers

 

 

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

2017-02-07 Thread Brendan Myers
Hello Howard,

I am able to run my Open MPI job to completion over TCP as you suggested for a 
sanity/configuration double check.  I also am able to complete the job using 
the RoCE fabric if I swap the breakout cable with 2 regular RoCE cables.  I am 
willing to test some custom builds to help iron out this problem.  Thank you 
again for your time and effort.

 

Brendan 

 

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Howard 
Pritchard
Sent: Friday, February 03, 2017 12:53 PM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hello Brendan,

 

Sorry for the delay in responding.  I've been on travel the past two weeks.

 

I traced through the debug output you sent.  It provided enough information

to show that for some reason, when using the breakout cable, Open MPI

is unable to complete initialization it needs to use the openib BTL.  It

correctly detects that the first port is not available, but for port 1, it

still fails to initialize.

 

To debug this further, I'd need to provide you with a custom Open MPI

to try that would have more debug output in the suspect area.

 

If you'd like to go this route let me know and I'll build a one of library

to try to debug this problem.

 

One thing to do just as a sanity check is to try tcp:

 

mpirun --mca btl tcp,self,sm 

 

with the breakout cable.  If that doesn't work, then I think there may

be some network setup problem that needs to be resolved first before

trying custom Open MPI tarballs.

 

Thanks,

 

Howard

 

 

 

 

2017-02-01 15:08 GMT-07:00 Brendan Myers <brendan.my...@soft-forge.com 
<mailto:brendan.my...@soft-forge.com> >:

Hello Howard,

I was wondering if you have been able to look at this issue at all, or if 
anyone has any ideas on what to try next.

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org 
<mailto:users-boun...@lists.open-mpi.org> ] On Behalf Of Brendan Myers
Sent: Tuesday, January 24, 2017 11:11 AM


To: 'Open MPI Users' <users@lists.open-mpi.org 
<mailto:users@lists.open-mpi.org> >
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hello Howard,

Here is the error output after building with debug enabled.  These CX4 Mellanox 
cards view each port as a separate device and I am using port 1 on the card 
which is device mlx5_0. 

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Howard 
Pritchard
Sent: Tuesday, January 24, 2017 8:21 AM
To: Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hello Brendan,

 

This helps some, but looks like we need more debug output.

 

Could you build a debug version of Open MPI by adding --enable-debug

to the config options and rerun the test with the breakout cable setup

and keeping the --mca btl_base_verbose 100 command line option?

 

Thanks

 

Howard

 

 

2017-01-23 8:23 GMT-07:00 Brendan Myers <brendan.my...@soft-forge.com 
<mailto:brendan.my...@soft-forge.com> >:

Hello Howard,

Thank you for looking into this. Attached is the output you requested.  Also, I 
am using Open MPI 2.0.1.

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org 
<mailto:users-boun...@lists.open-mpi.org> ] On Behalf Of Howard Pritchard
Sent: Friday, January 20, 2017 6:35 PM
To: Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hi Brendan

 

I doubt this kind of config has gotten any testing with OMPI.  Could you rerun 
with

 

--mca btl_base_verbose 100

 

added to the command line and post the output to the list?

 

Howard

 

 

Brendan Myers <brendan.my...@soft-forge.com 
<mailto:brendan.my...@soft-forge.com> > schrieb am Fr. 20. Jan. 2017 um 15:04:

Hello,

I am attempting to get Open MPI to run over 2 nodes using a switch and a single 
breakout cable with this design:

(100GbE)QSFP <> 2x (50GbE)QSFP   

 

Hardware Layout:

Breakout cable module A connects to switch (100GbE)

Breakout cable module B1 connects to node 1 RoCE NIC (50GbE)

Breakout cable module B2 connects to node 2 RoCE NIC (50GbE)

Switch is Mellanox SN 2700 100GbE RoCE switch

 

* I  am able to pass RDMA traffic between the nodes with perftest 
(ib_write_bw) when using the breakout cable as the IC from both nodes to the 
switch.

* When attempting to run a job using the breakout cable as the IC Open 
MPI aborts with failure to initialize open fabrics device errors.

* If I replace the breakout cable with 2 standard QSFP cables the Open 
MPI job will complete correctly.  

 

 

This is the command I use, it works unless I attempt a run with the

Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

2017-02-01 Thread Brendan Myers
Hello Howard,

I was wondering if you have been able to look at this issue at all, or if 
anyone has any ideas on what to try next.

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Brendan Myers
Sent: Tuesday, January 24, 2017 11:11 AM
To: 'Open MPI Users' <users@lists.open-mpi.org>
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hello Howard,

Here is the error output after building with debug enabled.  These CX4 Mellanox 
cards view each port as a separate device and I am using port 1 on the card 
which is device mlx5_0. 

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Howard 
Pritchard
Sent: Tuesday, January 24, 2017 8:21 AM
To: Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hello Brendan,

 

This helps some, but looks like we need more debug output.

 

Could you build a debug version of Open MPI by adding --enable-debug

to the config options and rerun the test with the breakout cable setup

and keeping the --mca btl_base_verbose 100 command line option?

 

Thanks

 

Howard

 

 

2017-01-23 8:23 GMT-07:00 Brendan Myers <brendan.my...@soft-forge.com 
<mailto:brendan.my...@soft-forge.com> >:

Hello Howard,

Thank you for looking into this. Attached is the output you requested.  Also, I 
am using Open MPI 2.0.1.

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org 
<mailto:users-boun...@lists.open-mpi.org> ] On Behalf Of Howard Pritchard
Sent: Friday, January 20, 2017 6:35 PM
To: Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hi Brendan

 

I doubt this kind of config has gotten any testing with OMPI.  Could you rerun 
with

 

--mca btl_base_verbose 100

 

added to the command line and post the output to the list?

 

Howard

 

 

Brendan Myers <brendan.my...@soft-forge.com 
<mailto:brendan.my...@soft-forge.com> > schrieb am Fr. 20. Jan. 2017 um 15:04:

Hello,

I am attempting to get Open MPI to run over 2 nodes using a switch and a single 
breakout cable with this design:

(100GbE)QSFP <> 2x (50GbE)QSFP   

 

Hardware Layout:

Breakout cable module A connects to switch (100GbE)

Breakout cable module B1 connects to node 1 RoCE NIC (50GbE)

Breakout cable module B2 connects to node 2 RoCE NIC (50GbE)

Switch is Mellanox SN 2700 100GbE RoCE switch

 

* I  am able to pass RDMA traffic between the nodes with perftest 
(ib_write_bw) when using the breakout cable as the IC from both nodes to the 
switch.

* When attempting to run a job using the breakout cable as the IC Open 
MPI aborts with failure to initialize open fabrics device errors.

* If I replace the breakout cable with 2 standard QSFP cables the Open 
MPI job will complete correctly.  

 

 

This is the command I use, it works unless I attempt a run with the breakout 
cable used as IC:

mpirun --mca btl openib,self,sm --mca btl_openib_receive_queues 
P,65536,120,64,32 --mca btl_openib_cpc_include rdmacm  -hostfile mpi-hosts-ce 
/usr/local/bin/IMB-MPI1

 

If anyone has any idea as to why using a breakout cable is causing my jobs to 
fail please let me know.

 

Thank you,

 

Brendan T. W. Myers

brendan.my...@soft-forge.com <mailto:brendan.my...@soft-forge.com> 

Software Forge Inc

 

___

users mailing list

users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> 

https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> 
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

 

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

2017-01-24 Thread Brendan Myers
Hello Howard,

Here is the error output after building with debug enabled.  These CX4 Mellanox 
cards view each port as a separate device and I am using port 1 on the card 
which is device mlx5_0. 

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Howard 
Pritchard
Sent: Tuesday, January 24, 2017 8:21 AM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hello Brendan,

 

This helps some, but looks like we need more debug output.

 

Could you build a debug version of Open MPI by adding --enable-debug

to the config options and rerun the test with the breakout cable setup

and keeping the --mca btl_base_verbose 100 command line option?

 

Thanks

 

Howard

 

 

2017-01-23 8:23 GMT-07:00 Brendan Myers <brendan.my...@soft-forge.com 
<mailto:brendan.my...@soft-forge.com> >:

Hello Howard,

Thank you for looking into this. Attached is the output you requested.  Also, I 
am using Open MPI 2.0.1.

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org 
<mailto:users-boun...@lists.open-mpi.org> ] On Behalf Of Howard Pritchard
Sent: Friday, January 20, 2017 6:35 PM
To: Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hi Brendan

 

I doubt this kind of config has gotten any testing with OMPI.  Could you rerun 
with

 

--mca btl_base_verbose 100

 

added to the command line and post the output to the list?

 

Howard

 

 

Brendan Myers <brendan.my...@soft-forge.com 
<mailto:brendan.my...@soft-forge.com> > schrieb am Fr. 20. Jan. 2017 um 15:04:

Hello,

I am attempting to get Open MPI to run over 2 nodes using a switch and a single 
breakout cable with this design:

(100GbE)QSFP <> 2x (50GbE)QSFP   

 

Hardware Layout:

Breakout cable module A connects to switch (100GbE)

Breakout cable module B1 connects to node 1 RoCE NIC (50GbE)

Breakout cable module B2 connects to node 2 RoCE NIC (50GbE)

Switch is Mellanox SN 2700 100GbE RoCE switch

 

* I  am able to pass RDMA traffic between the nodes with perftest 
(ib_write_bw) when using the breakout cable as the IC from both nodes to the 
switch.

* When attempting to run a job using the breakout cable as the IC Open 
MPI aborts with failure to initialize open fabrics device errors.

* If I replace the breakout cable with 2 standard QSFP cables the Open 
MPI job will complete correctly.  

 

 

This is the command I use, it works unless I attempt a run with the breakout 
cable used as IC:

mpirun --mca btl openib,self,sm --mca btl_openib_receive_queues 
P,65536,120,64,32 --mca btl_openib_cpc_include rdmacm  -hostfile mpi-hosts-ce 
/usr/local/bin/IMB-MPI1

 

If anyone has any idea as to why using a breakout cable is causing my jobs to 
fail please let me know.

 

Thank you,

 

Brendan T. W. Myers

brendan.my...@soft-forge.com <mailto:brendan.my...@soft-forge.com> 

Software Forge Inc

 

___

users mailing list

users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> 

https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> 
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

 

Script started on Tue 24 Jan 2017 10:55:52 AM EST
[root@sm-node-8 ~]mpirun --allow-run-as-root --mca btl openib,self,sm --mca 
btl_openib_gid_index 0 --mca btl_base_verbose 100 --mca 
btl_openib_receive_queues P,65536,120,64,32 --mca btl_openib_cpc_include rdmacm 
-hostfile mpi-hosts-ce /usr/local/bin/IMB-MPI1
[sm-node-7:28120] mca: base: components_register: registering framework btl 
components
[sm-node-7:28120] mca: base: components_register: found loaded component self
[sm-node-7:28120] mca: base: components_register: component self register 
function successful
[sm-node-7:28120] mca: base: components_register: found loaded component openib
[sm-node-7:28121] mca: base: components_register: registering framework btl 
components
[sm-node-7:28121] mca: base: components_register: found loaded component self
[sm-node-7:28121] mca: base: components_register: component self register 
function successful
[sm-node-7:28121] mca: base: components_register: found loaded component openib
[sm-node-7:28120] mca: base: components_register: component openib register 
function successful
[sm-node-7:28120] mca: base: components_register: found loaded component sm
[sm-node-7:28120] mca: base: components_register: component sm register 
function successful
[sm-node-7:28120] mca: base: components_open: opening btl components
[sm-node-7:28120] mca: base: components_open: found loaded component self
[sm-node-7:28120] mca: base: components_open: component self open 

Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

2017-01-23 Thread Brendan Myers
Hello Howard,

Thank you for looking into this. Attached is the output you requested.  Also, I 
am using Open MPI 2.0.1.

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Howard 
Pritchard
Sent: Friday, January 20, 2017 6:35 PM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hi Brendan

 

I doubt this kind of config has gotten any testing with OMPI.  Could you rerun 
with

 

--mca btl_base_verbose 100

 

added to the command line and post the output to the list?

 

Howard

 

 

Brendan Myers <brendan.my...@soft-forge.com 
<mailto:brendan.my...@soft-forge.com> > schrieb am Fr. 20. Jan. 2017 um 15:04:

Hello,

I am attempting to get Open MPI to run over 2 nodes using a switch and a single 
breakout cable with this design:

(100GbE)QSFP <> 2x (50GbE)QSFP   

 

Hardware Layout:

Breakout cable module A connects to switch (100GbE)

Breakout cable module B1 connects to node 1 RoCE NIC (50GbE)

Breakout cable module B2 connects to node 2 RoCE NIC (50GbE)

Switch is Mellanox SN 2700 100GbE RoCE switch

 

* I  am able to pass RDMA traffic between the nodes with perftest 
(ib_write_bw) when using the breakout cable as the IC from both nodes to the 
switch.

* When attempting to run a job using the breakout cable as the IC Open 
MPI aborts with failure to initialize open fabrics device errors.

* If I replace the breakout cable with 2 standard QSFP cables the Open 
MPI job will complete correctly.  

 

 

This is the command I use, it works unless I attempt a run with the breakout 
cable used as IC:

mpirun --mca btl openib,self,sm --mca btl_openib_receive_queues 
P,65536,120,64,32 --mca btl_openib_cpc_include rdmacm  -hostfile mpi-hosts-ce 
/usr/local/bin/IMB-MPI1

 

If anyone has any idea as to why using a breakout cable is causing my jobs to 
fail please let me know.

 

Thank you,

 

Brendan T. W. Myers

brendan.my...@soft-forge.com <mailto:brendan.my...@soft-forge.com> 

Software Forge Inc

 

___

users mailing list

users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> 

https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[sm-node-8:13428] mca: base: components_register: registering framework btl 
components
[sm-node-8:13428] mca: base: components_register: found loaded component self
[sm-node-8:13428] mca: base: components_register: component self register 
function successful
[sm-node-8:13428] mca: base: components_register: found loaded component openib
[sm-node-7:28343] mca: base: components_register: registering framework btl 
components
[sm-node-7:28343] mca: base: components_register: found loaded component self
[sm-node-7:28343] mca: base: components_register: component self register 
function successful
[sm-node-7:28343] mca: base: components_register: found loaded component openib
[sm-node-8:13428] mca: base: components_register: component openib register 
function successful
[sm-node-8:13428] mca: base: components_register: found loaded component sm
[sm-node-8:13428] mca: base: components_register: component sm register 
function successful
[sm-node-8:13428] mca: base: components_open: opening btl components
[sm-node-8:13428] mca: base: components_open: found loaded component self
[sm-node-8:13428] mca: base: components_open: component self open function 
successful
[sm-node-8:13428] mca: base: components_open: found loaded component openib
[sm-node-8:13428] mca: base: components_open: component openib open function 
successful
[sm-node-8:13428] mca: base: components_open: found loaded component sm
[sm-node-8:13428] mca: base: components_open: component sm open function 
successful
[sm-node-8:13428] select: initializing btl component self
[sm-node-7:28342] mca: base: components_register: registering framework btl 
components
[sm-node-7:28342] mca: base: components_register: found loaded component self
[sm-node-8:13429] mca: base: components_register: registering framework btl 
components
[sm-node-8:13429] mca: base: components_register: found loaded component self
[sm-node-8:13428] select: init of component self returned success
[sm-node-8:13428] select: initializing btl component openib
[sm-node-7:28343] mca: base: components_register: component openib register 
function successful
[sm-node-7:28343] mca: base: components_register: found loaded component sm
[sm-node-8:13429] mca: base: components_register: component self register 
function successful
[sm-node-7:28342] mca: base: components_register: component self register 
function successful
[sm-node-7:28342] mca: base: components_register: found loaded component openib
[sm-node-8:13429] mca: base: components_register: found loaded component openib
[sm-node-7:28343] mca: base: components_register: component sm register 
function successful
[sm-node-7:28343] mca: base: components_open: ope

[OMPI users] Open MPI over RoCE using breakout cable and switch

2017-01-20 Thread Brendan Myers
Hello,

I am attempting to get Open MPI to run over 2 nodes using a switch and a
single breakout cable with this design:

(100GbE)QSFP <> 2x (50GbE)QSFP   

 

Hardware Layout:

Breakout cable module A connects to switch (100GbE)

Breakout cable module B1 connects to node 1 RoCE NIC (50GbE)

Breakout cable module B2 connects to node 2 RoCE NIC (50GbE)

Switch is Mellanox SN 2700 100GbE RoCE switch

 

* I  am able to pass RDMA traffic between the nodes with perftest
(ib_write_bw) when using the breakout cable as the IC from both nodes to the
switch.

* When attempting to run a job using the breakout cable as the IC
Open MPI aborts with failure to initialize open fabrics device errors.

* If I replace the breakout cable with 2 standard QSFP cables the
Open MPI job will complete correctly.  

 

 

This is the command I use, it works unless I attempt a run with the breakout
cable used as IC:

mpirun --mca btl openib,self,sm --mca btl_openib_receive_queues
P,65536,120,64,32 --mca btl_openib_cpc_include rdmacm  -hostfile
mpi-hosts-ce /usr/local/bin/IMB-MPI1

 

If anyone has any idea as to why using a breakout cable is causing my jobs
to fail please let me know.

 

Thank you,

 

Brendan T. W. Myers

brendan.my...@soft-forge.com  

Software Forge Inc

 

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] rdmacm and udcm failure in 2.0.1 on RoCE

2016-12-16 Thread Brendan Myers
Hello,

I can confirm that using these flags:

--mca btl_openib_receive_queues P,65536,120,64,32 --mca btl_openib_cpc_include 
rdmacm

I am able to run Open MPI version 2.0.1 over a RoCE fabric.  Hope this helps

 

Thank you,

Brendan Myers

Software Forge

 

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Dave Turner
Sent: Thursday, December 15, 2016 4:41 PM
To: users@lists.open-mpi.org
Subject: Re: [OMPI users] rdmacm and udcm failure in 2.0.1 on RoCE

 

 

Nathan:  Thanks for providing the debug flags.  I've attached the 

output (NetPIPE.debug1) which basically shows that for RoCE the

udcm_component_query() will always fail.  Can someone verify if

this is correct that udcm is not supported for RoCE?  When I change

the test to force usage it does not work (NetPIPE.debug2).

 

[hero35][[38845,1],0][connect/btl_openib_connect_udcm.c:452:udcm_component_query]
 UD CPC only supported on InfiniBand; skipped on mlx4_0:1

[hero35][[38845,1],0][connect/btl_openib_connect_udcm.c:501:udcm_component_query]
 unavailable for use on mlx4_0:1; skipped

 

from btl_openib_connect_udcm.c

 

 438 static int udcm_component_query(mca_btl_openib_module_t *btl,

 439 opal_btl_openib_connect_base_module_t 
**cpc)

 440 {

 441 udcm_module_t *m = NULL;

 442 int rc = OPAL_ERR_NOT_SUPPORTED;

 443

 444 do {

 445 /* If we do not have struct ibv_device.transport_device, then

 446we're in an old version of OFED that is IB only (i.e., no

 447iWarp), so we can safely assume that we can use this CPC. */

 448 #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) && 
HAVE_DECL_IBV_LINK_LAYER_ETHERN ET

 449 if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) {

 450 BTL_VERBOSE(("UD CPC only supported on InfiniBand; skipped on 
%s:%d",

 451  ibv_get_device_name(btl->device->ib_dev),

 452  btl->port_num));

 453 break;

 454 }

 455 #endif

 

from base.h

 

#ifdef OPAL_HAVE_RDMAOE

#define BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)   \

(((IBV_TRANSPORT_IB != ((btl)->device->ib_dev->transport_type)) || \

(IBV_LINK_LAYER_ETHERNET == ((btl)->ib_port_attr.link_layer))) ?   \

true : false)

#else

#define BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)   \

((IBV_TRANSPORT_IB != ((btl)->device->ib_dev->transport_type)) ?   \

true : false)

#endif

 

So clearly for RoCE the transport is InfiniBand and the link layer is Ethernet

so this will show that NOT_IB() is true, meaning that udcm is evidently

not supported for RoCE.  udcm definitely fails under 1.10.4 for RoCE in

our tests.  That means we need rdmacm to work which it evidently does

not at the moment for 2.0.1.  Could someone please verify that rdmacm

is not currently working in 2.0.1?  And therefore I'm assuming that 

2.0.1 has not been successfully tested on RoCE???

 

   Dave

 

 


--

Message: 1
Date: Wed, 14 Dec 2016 21:12:16 -0700
From: Nathan Hjelm <hje...@me.com <mailto:hje...@me.com> >
To: drdavetur...@gmail.com <mailto:drdavetur...@gmail.com> , Open MPI Users 
<users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >
Subject: Re: [OMPI users] rdmacm and udcm failure in 2.0.1 on RoCE
Message-ID: <32528c5d-14bc-42ce-b19a-684b81801...@me.com 
<mailto:32528c5d-14bc-42ce-b19a-684b81801...@me.com> >
Content-Type: text/plain; charset=utf-8

Can you configure with ?enable-debug and run with ?mca btl_base_verbose 100 and 
provide the output? It may indicate why neither udcm nor rdmacm are available.

-Nathan


> On Dec 14, 2016, at 2:47 PM, Dave Turner <drdavetur...@gmail.com 
> <mailto:drdavetur...@gmail.com> > wrote:
>
> --
> No OpenFabrics connection schemes reported that they were able to be
> used on a specific port.  As such, the openib BTL (OpenFabrics
> support) will be disabled for this port.
>
>   Local host:   elf22
>   Local device: mlx4_2
>   Local port:   1
>   CPCs attempted:   rdmacm, udcm
> --
>
> We have had no problems using 1.10.4 on RoCE but 2.0.1 fails to
> find either connection manager.  I've read that rdmacm may have
> issues under 2.0.1 so udcm may be the only one working.  Are there
> any known issues with that on RoCE?  Or does this just mean we
> don't have RoCE configured correctly?
>
>   Dave Turner
>
> --
> Work: davetur...@ksu.edu <mailto:davetur...@ksu.edu>  (785) 532-7791 
> <tel:%28785%29%205

[OMPI users] How to verify RDMA traffic (RoCE) is being sent over a fabric when running OpenMPI

2016-11-08 Thread Brendan Myers
Hello,

I am trying to figure out how I can verify that the OpenMPI traffic is
actually being transmitted over my RoCE fabric connecting my cluster.  My
MPI job runs quickly and error free but I cannot seem to verify that
significant amounts of data is being transferred to the other endpoint in my
RoCE fabric.  I am able to see what I believe to be the oob data when I
remove the oob exclusion from my command when I analyze my RoCE interface
using the tools listed below.

Software:

* CentOS 7.2

* Open MPI 2.0.1

Command:

* mpirun   --mca btl openib,self,sm --mca oob_tcp_if_exclude eth3
--mca btl_openib_receive_queues P,65536,120,64,32 --mca
btl_openib_cpc_include rdmacm -np 4 -hostfile mpi-hosts-ce
/usr/local/bin/IMB-MPI1

o   Eth3 is my RoCE interface

o   The 2 nodes involved RoCE interfaces are defined in my mpi-hosts-ce file

Ways I have looked to verify data transference:

* Through the port counters on my RoCE switch

o   Sees data being sent when using ib_write_bw but not when using Open MPI

* Through ibdump

o   Sees data being sent when using ib_write_bw but not when using Open MPI

* Through Wireshark

o   Sees data being sent when using ib_write_bw but not when using Open MPI

 

I do not have much experience with Open MPI and apologize if I have left out
necessary information.  I will respond with any data requested.  I
appreciate the time spent to read and respond to this.

 

 

Thank you,

 

Brendan T. W. Myers

brendan.my...@soft-forge.com  

Software Forge Inc

 

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users