Hello Howard,

Here is the error output after building with debug enabled.  These CX4 Mellanox 
cards view each port as a separate device and I am using port 1 on the card 
which is device mlx5_0. 

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Howard 
Pritchard
Sent: Tuesday, January 24, 2017 8:21 AM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hello Brendan,

 

This helps some, but looks like we need more debug output.

 

Could you build a debug version of Open MPI by adding --enable-debug

to the config options and rerun the test with the breakout cable setup

and keeping the --mca btl_base_verbose 100 command line option?

 

Thanks

 

Howard

 

 

2017-01-23 8:23 GMT-07:00 Brendan Myers <brendan.my...@soft-forge.com 
<mailto:brendan.my...@soft-forge.com> >:

Hello Howard,

Thank you for looking into this. Attached is the output you requested.  Also, I 
am using Open MPI 2.0.1.

 

Thank you,

Brendan

 

From: users [mailto:users-boun...@lists.open-mpi.org 
<mailto:users-boun...@lists.open-mpi.org> ] On Behalf Of Howard Pritchard
Sent: Friday, January 20, 2017 6:35 PM
To: Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >
Subject: Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

 

Hi Brendan

 

I doubt this kind of config has gotten any testing with OMPI.  Could you rerun 
with

 

--mca btl_base_verbose 100

 

added to the command line and post the output to the list?

 

Howard

 

 

Brendan Myers <brendan.my...@soft-forge.com 
<mailto:brendan.my...@soft-forge.com> > schrieb am Fr. 20. Jan. 2017 um 15:04:

Hello,

I am attempting to get Open MPI to run over 2 nodes using a switch and a single 
breakout cable with this design:

(100GbE)QSFP <----> 2x (50GbE)QSFP       

 

Hardware Layout:

Breakout cable module A connects to switch (100GbE)

Breakout cable module B1 connects to node 1 RoCE NIC (50GbE)

Breakout cable module B2 connects to node 2 RoCE NIC (50GbE)

Switch is Mellanox SN 2700 100GbE RoCE switch

 

*         I  am able to pass RDMA traffic between the nodes with perftest 
(ib_write_bw) when using the breakout cable as the IC from both nodes to the 
switch.

*         When attempting to run a job using the breakout cable as the IC Open 
MPI aborts with failure to initialize open fabrics device errors.

*         If I replace the breakout cable with 2 standard QSFP cables the Open 
MPI job will complete correctly.  

 

 

This is the command I use, it works unless I attempt a run with the breakout 
cable used as IC:

mpirun --mca btl openib,self,sm --mca btl_openib_receive_queues 
P,65536,120,64,32 --mca btl_openib_cpc_include rdmacm  -hostfile mpi-hosts-ce 
/usr/local/bin/IMB-MPI1

 

If anyone has any idea as to why using a breakout cable is causing my jobs to 
fail please let me know.

 

Thank you,

 

Brendan T. W. Myers

brendan.my...@soft-forge.com <mailto:brendan.my...@soft-forge.com> 

Software Forge Inc

 

_______________________________________________

users mailing list

users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> 

https://rfd.newmexicoconsortium.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> 
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

 

Script started on Tue 24 Jan 2017 10:55:52 AM EST
[root@sm-node-8 ~]mpirun --allow-run-as-root --mca btl openib,self,sm --mca 
btl_openib_gid_index 0 --mca btl_base_verbose 100 --mca 
btl_openib_receive_queues P,65536,120,64,32 --mca btl_openib_cpc_include rdmacm 
-hostfile mpi-hosts-ce /usr/local/bin/IMB-MPI1
[sm-node-7:28120] mca: base: components_register: registering framework btl 
components
[sm-node-7:28120] mca: base: components_register: found loaded component self
[sm-node-7:28120] mca: base: components_register: component self register 
function successful
[sm-node-7:28120] mca: base: components_register: found loaded component openib
[sm-node-7:28121] mca: base: components_register: registering framework btl 
components
[sm-node-7:28121] mca: base: components_register: found loaded component self
[sm-node-7:28121] mca: base: components_register: component self register 
function successful
[sm-node-7:28121] mca: base: components_register: found loaded component openib
[sm-node-7:28120] mca: base: components_register: component openib register 
function successful
[sm-node-7:28120] mca: base: components_register: found loaded component sm
[sm-node-7:28120] mca: base: components_register: component sm register 
function successful
[sm-node-7:28120] mca: base: components_open: opening btl components
[sm-node-7:28120] mca: base: components_open: found loaded component self
[sm-node-7:28120] mca: base: components_open: component self open function 
successful
[sm-node-7:28120] mca: base: components_open: found loaded component openib
[sm-node-7:28120] mca: base: components_open: component openib open function 
successful
[sm-node-7:28120] mca: base: components_open: found loaded component sm
[sm-node-7:28120] mca: base: components_open: component sm open function 
successful
[sm-node-7:28120] select: initializing btl component self
[sm-node-7:28120] select: init of component self returned success
[sm-node-7:28120] select: initializing btl component openib
[sm-node-7:28121] mca: base: components_register: component openib register 
function successful
[sm-node-7:28121] mca: base: components_register: found loaded component sm
[sm-node-7:28121] mca: base: components_register: component sm register 
function successful
[sm-node-7:28121] mca: base: components_open: opening btl components
[sm-node-7:28121] mca: base: components_open: found loaded component self
[sm-node-7:28121] mca: base: components_open: component self open function 
successful
[sm-node-7:28121] mca: base: components_open: found loaded component openib
[sm-node-7:28121] mca: base: components_open: component openib open function 
successful
[sm-node-7:28121] mca: base: components_open: found loaded component sm
[sm-node-7:28121] mca: base: components_open: component sm open function 
successful
[sm-node-7:28121] select: initializing btl component self
[sm-node-7:28121] select: init of component self returned success
[sm-node-7:28121] select: initializing btl component openib
[sm-node-7:28120] Checking distance from this process to device=mlx5_1
[sm-node-7:28120] Process is bound: distance to device is 0.000000
[sm-node-7:28120] Checking distance from this process to device=mlx5_0
[sm-node-7:28120] Process is bound: distance to device is 0.000000
[sm-node-7:28122] mca: base: components_register: registering framework btl 
components
[sm-node-7:28122] mca: base: components_register: found loaded component self
[sm-node-7:28122] mca: base: components_register: component self register 
function successful
[sm-node-7:28121] Checking distance from this process to device=mlx5_1
[sm-node-7:28121] Process is bound: distance to device is 0.000000
[sm-node-7:28121] Checking distance from this process to device=mlx5_0
[sm-node-7:28121] Process is bound: distance to device is 0.000000
[sm-node-8:10915] mca: base: components_register: registering framework btl 
components
[sm-node-8:10915] mca: base: components_register: found loaded component self
[sm-node-8:10917] mca: base: components_register: registering framework btl 
components
[sm-node-8:10917] mca: base: components_register: found loaded component self
[sm-node-7:28122] mca: base: components_register: found loaded component openib
[sm-node-8:10917] mca: base: components_register: component self register 
function successful
[sm-node-8:10915] mca: base: components_register: component self register 
function successful
[sm-node-8:10915] mca: base: components_register: found loaded component openib
[sm-node-8:10916] mca: base: components_register: registering framework btl 
components
[sm-node-8:10916] mca: base: components_register: found loaded component self
[sm-node-8:10917] mca: base: components_register: found loaded component openib
[sm-node-8:10916] mca: base: components_register: component self register 
function successful
[sm-node-8:10916] mca: base: components_register: found loaded component openib
[sm-node-7:28123] mca: base: components_register: registering framework btl 
components
[sm-node-7:28123] mca: base: components_register: found loaded component self
[sm-node-7:28122] mca: base: components_register: component openib register 
function successful
[sm-node-7:28123] mca: base: components_register: component self register 
function successful
[sm-node-7:28123] mca: base: components_register: found loaded component openib
[sm-node-7:28122] mca: base: components_register: found loaded component sm
[sm-node-7:28122] mca: base: components_register: component sm register 
function successful
[sm-node-7:28122] mca: base: components_open: opening btl components
[sm-node-7:28122] mca: base: components_open: found loaded component self
[sm-node-7:28122] mca: base: components_open: component self open function 
successful
[sm-node-7:28122] mca: base: components_open: found loaded component openib
[sm-node-7:28122] mca: base: components_open: component openib open function 
successful
[sm-node-7:28122] mca: base: components_open: found loaded component sm
[sm-node-7:28122] mca: base: components_open: component sm open function 
successful
[sm-node-7:28122] select: initializing btl component self
[sm-node-7:28122] select: init of component self returned success
[sm-node-8:10915] mca: base: components_register: component openib register 
function successful
[sm-node-8:10915] mca: base: components_register: found loaded component sm
[sm-node-7:28122] select: initializing btl component openib
[sm-node-8:10915] mca: base: components_register: component sm register 
function successful
[sm-node-8:10918] mca: base: components_register: registering framework btl 
components
[sm-node-8:10918] mca: base: components_register: found loaded component self
[sm-node-8:10917] mca: base: components_register: component openib register 
function successful
[sm-node-8:10917] mca: base: components_register: found loaded component sm
[sm-node-8:10915] mca: base: components_open: opening btl components
[sm-node-8:10915] mca: base: components_open: found loaded component self
[sm-node-8:10915] mca: base: components_open: component self open function 
successful
[sm-node-8:10915] mca: base: components_open: found loaded component openib
[sm-node-8:10915] mca: base: components_open: component openib open function 
successful
[sm-node-8:10915] mca: base: components_open: found loaded component sm
[sm-node-8:10915] mca: base: components_open: component sm open function 
successful
[sm-node-8:10915] select: initializing btl component self
[sm-node-8:10915] select: init of component self returned success
[sm-node-8:10915] select: initializing btl component openib
[sm-node-8:10918] mca: base: components_register: component self register 
function successful
[sm-node-8:10918] mca: base: components_register: found loaded component openib
[sm-node-7:28123] mca: base: components_register: component openib register 
function successful
[sm-node-8:10917] mca: base: components_register: component sm register 
function successful
[sm-node-8:10917] mca: base: components_open: opening btl components
[sm-node-8:10917] mca: base: components_open: found loaded component self
[sm-node-8:10917] mca: base: components_open: component self open function 
successful
[sm-node-8:10917] mca: base: components_open: found loaded component openib
[sm-node-8:10917] mca: base: components_open: component openib open function 
successful
[sm-node-8:10917] mca: base: components_open: found loaded component sm
[sm-node-8:10917] mca: base: components_open: component sm open function 
successful
[sm-node-8:10917] select: initializing btl component self
[sm-node-8:10917] select: init of component self returned success
[sm-node-8:10917] select: initializing btl component openib
[sm-node-8:10916] mca: base: components_register: component openib register 
function successful
[sm-node-8:10916] mca: base: components_register: found loaded component sm
[sm-node-7:28123] mca: base: components_register: found loaded component sm
[sm-node-7:28122] Checking distance from this process to device=mlx5_1
[sm-node-8:10916] mca: base: components_register: component sm register 
function successful
[sm-node-8:10916] mca: base: components_open: opening btl components
[sm-node-7:28123] mca: base: components_register: component sm register 
function successful
[sm-node-7:28123] mca: base: components_open: opening btl components
[sm-node-8:10916] mca: base: components_open: found loaded component self
[sm-node-8:10916] mca: base: components_open: component self open function 
successful
[sm-node-8:10916] mca: base: components_open: found loaded component openib
[sm-node-8:10916] mca: base: components_open: component openib open function 
successful
[sm-node-8:10916] mca: base: components_open: found loaded component sm
[sm-node-8:10916] mca: base: components_open: component sm open function 
successful
[sm-node-8:10916] select: initializing btl component self
[sm-node-8:10916] select: init of component self returned success
[sm-node-8:10916] select: initializing btl component openib
[sm-node-7:28122] Process is bound: distance to device is 0.000000
[sm-node-7:28122] Checking distance from this process to device=mlx5_0
[sm-node-7:28122] Process is bound: distance to device is 0.000000
[sm-node-8:10915] Checking distance from this process to device=mlx5_1
[sm-node-8:10915] Process is bound: distance to device is 0.000000
[sm-node-8:10915] Checking distance from this process to device=mlx5_0
[sm-node-8:10915] Process is bound: distance to device is 0.000000
[sm-node-7:28123] mca: base: components_open: found loaded component self
[sm-node-7:28123] mca: base: components_open: component self open function 
successful
[sm-node-7:28123] mca: base: components_open: found loaded component openib
[sm-node-7:28123] mca: base: components_open: component openib open function 
successful
[sm-node-7:28123] mca: base: components_open: found loaded component sm
[sm-node-7:28123] mca: base: components_open: component sm open function 
successful
[sm-node-7:28123] select: initializing btl component self
[sm-node-7:28123] select: init of component self returned success
[sm-node-7:28123] select: initializing btl component openib
[sm-node-8:10917] Checking distance from this process to device=mlx5_1
[sm-node-8:10917] Process is bound: distance to device is 0.000000
[sm-node-8:10917] Checking distance from this process to device=mlx5_0
[sm-node-8:10917] Process is bound: distance to device is 0.000000
[sm-node-8:10918] mca: base: components_register: component openib register 
function successful
[sm-node-8:10918] mca: base: components_register: found loaded component sm
[sm-node-7:28123] Checking distance from this process to device=mlx5_1
[sm-node-8:10916] Checking distance from this process to device=mlx5_1
[sm-node-8:10916] Process is bound: distance to device is 0.000000
[sm-node-8:10916] Checking distance from this process to device=mlx5_0
[sm-node-8:10916] Process is bound: distance to device is 0.000000
[sm-node-7:28123] Process is bound: distance to device is 0.000000
[sm-node-7:28123] Checking distance from this process to device=mlx5_0
[sm-node-7:28123] Process is bound: distance to device is 0.000000
[sm-node-8:10918] mca: base: components_register: component sm register 
function successful
[sm-node-8:10918] mca: base: components_open: opening btl components
[sm-node-8:10918] mca: base: components_open: found loaded component self
[sm-node-8:10918] mca: base: components_open: component self open function 
successful
[sm-node-8:10918] mca: base: components_open: found loaded component openib
[sm-node-8:10918] mca: base: components_open: component openib open function 
successful
[sm-node-8:10918] mca: base: components_open: found loaded component sm
[sm-node-8:10918] mca: base: components_open: component sm open function 
successful
[sm-node-8:10918] select: initializing btl component self
[sm-node-8:10918] select: init of component self returned success
[sm-node-8:10918] select: initializing btl component openib
[sm-node-7:28124] mca: base: components_register: registering framework btl 
components
[sm-node-7:28124] mca: base: components_register: found loaded component self
[sm-node-8:10920] mca: base: components_register: registering framework btl 
components
[sm-node-8:10920] mca: base: components_register: found loaded component self
[sm-node-7:28124] mca: base: components_register: component self register 
function successful
[sm-node-8:10919] mca: base: components_register: registering framework btl 
components
[sm-node-8:10919] mca: base: components_register: found loaded component self
[sm-node-7:28124] mca: base: components_register: found loaded component openib
[sm-node-8:10920] mca: base: components_register: component self register 
function successful
[sm-node-8:10920] mca: base: components_register: found loaded component openib
[sm-node-8:10919] mca: base: components_register: component self register 
function successful
[sm-node-8:10919] mca: base: components_register: found loaded component openib
[sm-node-8:10918] Checking distance from this process to device=mlx5_1
[sm-node-8:10918] Process is bound: distance to device is 0.000000
[sm-node-8:10918] Checking distance from this process to device=mlx5_0
[sm-node-8:10918] Process is bound: distance to device is 0.000000
[sm-node-8:10920] mca: base: components_register: component openib register 
function successful
[sm-node-8:10920] mca: base: components_register: found loaded component sm
[sm-node-7:28124] mca: base: components_register: component openib register 
function successful
[sm-node-7:28124] mca: base: components_register: found loaded component sm
[sm-node-7:28124] mca: base: components_register: component sm register 
function successful
[sm-node-8:10920] mca: base: components_register: component sm register 
function successful
[sm-node-8:10920] mca: base: components_open: opening btl components
[sm-node-8:10920] mca: base: components_open: found loaded component self
[sm-node-8:10920] mca: base: components_open: component self open function 
successful
[sm-node-8:10920] mca: base: components_open: found loaded component openib
[sm-node-7:28124] mca: base: components_open: opening btl components
[sm-node-7:28124] mca: base: components_open: found loaded component self
[sm-node-7:28124] mca: base: components_open: component self open function 
successful
[sm-node-7:28124] mca: base: components_open: found loaded component openib
[sm-node-7:28124] mca: base: components_open: component openib open function 
successful
[sm-node-7:28124] mca: base: components_open: found loaded component sm
[sm-node-7:28124] mca: base: components_open: component sm open function 
successful
[sm-node-8:10920] mca: base: components_open: component openib open function 
successful
[sm-node-8:10920] mca: base: components_open: found loaded component sm
[sm-node-8:10920] mca: base: components_open: component sm open function 
successful
[sm-node-8:10920] select: initializing btl component self
[sm-node-8:10920] select: init of component self returned success
[sm-node-8:10920] select: initializing btl component openib
[sm-node-7:28124] select: initializing btl component self
[sm-node-7:28124] select: init of component self returned success
[sm-node-7:28124] select: initializing btl component openib
[sm-node-8:10919] mca: base: components_register: component openib register 
function successful
[sm-node-8:10919] mca: base: components_register: found loaded component sm
[sm-node-7:28125] mca: base: components_register: registering framework btl 
components
[sm-node-7:28125] mca: base: components_register: found loaded component self
[sm-node-8:10919] mca: base: components_register: component sm register 
function successful
[sm-node-8:10919] mca: base: components_open: opening btl components
[sm-node-8:10919] mca: base: components_open: found loaded component self
[sm-node-8:10919] mca: base: components_open: component self open function 
successful
[sm-node-8:10919] mca: base: components_open: found loaded component openib
[sm-node-8:10919] mca: base: components_open: component openib open function 
successful
[sm-node-8:10919] mca: base: components_open: found loaded component sm
[sm-node-8:10919] mca: base: components_open: component sm open function 
successful
[sm-node-8:10919] select: initializing btl component self
[sm-node-8:10919] select: init of component self returned success
[sm-node-8:10919] select: initializing btl component openib
[sm-node-7:28125] mca: base: components_register: component self register 
function successful
[sm-node-7:28125] mca: base: components_register: found loaded component openib
[sm-node-7:28124] Checking distance from this process to device=mlx5_1
[sm-node-8:10920] Checking distance from this process to device=mlx5_1
[sm-node-8:10920] Process is bound: distance to device is 0.000000
[sm-node-8:10920] Checking distance from this process to device=mlx5_0
[sm-node-8:10920] Process is bound: distance to device is 0.000000
[sm-node-7:28124] Process is bound: distance to device is 0.000000
[sm-node-7:28124] Checking distance from this process to device=mlx5_0
[sm-node-7:28124] Process is bound: distance to device is 0.000000
[sm-node-7:28125] mca: base: components_register: component openib register 
function successful
[sm-node-8:10919] Checking distance from this process to device=mlx5_1
[sm-node-8:10919] Process is bound: distance to device is 0.000000
[sm-node-8:10919] Checking distance from this process to device=mlx5_0
[sm-node-8:10919] Process is bound: distance to device is 0.000000
[sm-node-7:28125] mca: base: components_register: found loaded component sm
[sm-node-7:28125] mca: base: components_register: component sm register 
function successful
[sm-node-7:28125] mca: base: components_open: opening btl components
[sm-node-7:28125] mca: base: components_open: found loaded component self
[sm-node-7:28125] mca: base: components_open: component self open function 
successful
[sm-node-7:28125] mca: base: components_open: found loaded component openib
[sm-node-7:28125] mca: base: components_open: component openib open function 
successful
[sm-node-7:28125] mca: base: components_open: found loaded component sm
[sm-node-7:28125] mca: base: components_open: component sm open function 
successful
[sm-node-7:28125] select: initializing btl component self
[sm-node-7:28125] select: init of component self returned success
[sm-node-7:28125] select: initializing btl component openib
[sm-node-7:28125] Checking distance from this process to device=mlx5_1
[sm-node-7:28125] Process is bound: distance to device is 0.000000
[sm-node-7:28125] Checking distance from this process to device=mlx5_0
[sm-node-7:28125] Process is bound: distance to device is 0.000000
[sm-node-7][[30805,1],1][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-8][[30805,1],6][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-8][[30805,1],8][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-7][[30805,1],0][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-7][[30805,1],2][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-7][[30805,1],3][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-8][[30805,1],9][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-8][[30805,1],10][btl_openib_component.c:999:device_destruct] device 
was successfully released
[sm-node-8][[30805,1],11][btl_openib_component.c:999:device_destruct] device 
was successfully released
[sm-node-7][[30805,1],4][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-8][[30805,1],7][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-7][[30805,1],5][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-8][[30805,1],6][btl_openib_component.c:645:init_one_port] looking for 
mlx5_0:1 GID index 0
[sm-node-8][[30805,1],8][btl_openib_component.c:645:init_one_port] looking for 
mlx5_0:1 GID index 0
[sm-node-8][[30805,1],6][btl_openib_component.c:676:init_one_port] my IB 
subnet_id for HCA mlx5_0 port 1 is 0000000000000000
[sm-node-8][[30805,1],8][btl_openib_component.c:676:init_one_port] my IB 
subnet_id for HCA mlx5_0 port 1 is 0000000000000000
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   sm-node-8
  Local device: mlx5_0
--------------------------------------------------------------------------
[sm-node-8][[30805,1],6][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-8:10915] select: init of component openib returned failure
[sm-node-8][[30805,1],6][connect/btl_openib_connect_rdmacm.c:2191:rdmacm_component_finalize]
 rdmacm_component_finalize
[sm-node-8][[30805,1],8][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-8:10915] mca: base: close: component openib closed
[sm-node-8:10915] mca: base: close: unloading component openib
[sm-node-8:10917] select: init of component openib returned failure
[sm-node-8][[30805,1],8][connect/btl_openib_connect_rdmacm.c:2191:rdmacm_component_finalize]
 rdmacm_component_finalize
[sm-node-8:10917] mca: base: close: component openib closed
[sm-node-8:10917] mca: base: close: unloading component openib
[sm-node-8:10915] select: initializing btl component sm
[sm-node-8:10917] select: initializing btl component sm
[sm-node-8:10917] select: init of component sm returned success
[sm-node-8:10915] select: init of component sm returned success
[sm-node-7][[30805,1],1][btl_openib_component.c:645:init_one_port] looking for 
mlx5_0:1 GID index 0
[sm-node-7][[30805,1],2][btl_openib_component.c:645:init_one_port] looking for 
mlx5_0:1 GID index 0
[sm-node-7][[30805,1],2][btl_openib_component.c:676:init_one_port] my IB 
subnet_id for HCA mlx5_0 port 1 is 0000000000000000
[sm-node-7][[30805,1],3][btl_openib_component.c:645:init_one_port] looking for 
mlx5_0:1 GID index 0
[sm-node-7][[30805,1],3][btl_openib_component.c:676:init_one_port] my IB 
subnet_id for HCA mlx5_0 port 1 is 0000000000000000
[sm-node-7][[30805,1],1][btl_openib_component.c:676:init_one_port] my IB 
subnet_id for HCA mlx5_0 port 1 is 0000000000000000
[sm-node-7][[30805,1],1][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-7][[30805,1],3][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-7][[30805,1],3][connect/btl_openib_connect_rdmacm.c:2191:rdmacm_component_finalize]
 rdmacm_component_finalize
[sm-node-7:28123] select: init of component openib returned failure
[sm-node-7:28123] mca: base: close: component openib closed
[sm-node-7:28123] mca: base: close: unloading component openib
[sm-node-7:28121] select: init of component openib returned failure
[sm-node-7:28121] mca: base: close: component openib closed
[sm-node-7:28121] mca: base: close: unloading component openib
[sm-node-7][[30805,1],1][connect/btl_openib_connect_rdmacm.c:2191:rdmacm_component_finalize]
 rdmacm_component_finalize
[sm-node-7:28121] select: initializing btl component sm
[sm-node-7:28121] select: init of component sm returned success
[sm-node-7][[30805,1],2][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-7][[30805,1],2][connect/btl_openib_connect_rdmacm.c:2191:rdmacm_component_finalize]
 rdmacm_component_finalize
[sm-node-7:28123] select: initializing btl component sm
[sm-node-7:28123] select: init of component sm returned success
[sm-node-7:28122] select: init of component openib returned failure
[sm-node-7:28122] mca: base: close: component openib closed
[sm-node-7:28122] mca: base: close: unloading component openib
[sm-node-7:28122] select: initializing btl component sm
[sm-node-7:28122] select: init of component sm returned success
[sm-node-8][[30805,1],9][btl_openib_component.c:645:init_one_port] looking for 
mlx5_0:1 GID index 0
[sm-node-8][[30805,1],10][btl_openib_component.c:645:init_one_port] looking for 
mlx5_0:1 GID index 0
[sm-node-8][[30805,1],11][btl_openib_component.c:645:init_one_port] looking for 
mlx5_0:1 GID index 0
[sm-node-8][[30805,1],10][btl_openib_component.c:676:init_one_port] my IB 
subnet_id for HCA mlx5_0 port 1 is 0000000000000000
[sm-node-8][[30805,1],9][btl_openib_component.c:676:init_one_port] my IB 
subnet_id for HCA mlx5_0 port 1 is 0000000000000000
[sm-node-8][[30805,1],11][btl_openib_component.c:676:init_one_port] my IB 
subnet_id for HCA mlx5_0 port 1 is 0000000000000000
[sm-node-7][[30805,1],0][btl_openib_component.c:645:init_one_port] looking for 
mlx5_0:1 GID index 0
[sm-node-7][[30805,1],0][btl_openib_component.c:676:init_one_port] my IB 
subnet_id for HCA mlx5_0 port 1 is 0000000000000000
[sm-node-8][[30805,1],9][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-8][[30805,1],9][connect/btl_openib_connect_rdmacm.c:2191:rdmacm_component_finalize]
 rdmacm_component_finalize
[sm-node-8:10918] select: init of component openib returned failure
[sm-node-8:10918] mca: base: close: component openib closed
[sm-node-8:10918] mca: base: close: unloading component openib
[sm-node-8][[30805,1],10][btl_openib_component.c:999:device_destruct] device 
was successfully released
[sm-node-8][[30805,1],10][connect/btl_openib_connect_rdmacm.c:2191:rdmacm_component_finalize]
 rdmacm_component_finalize
[sm-node-8:10919] select: init of component openib returned failure
[sm-node-8:10919] mca: base: close: component openib closed
[sm-node-8:10919] mca: base: close: unloading component openib
[sm-node-8][[30805,1],11][btl_openib_component.c:999:device_destruct] device 
was successfully released
[sm-node-8][[30805,1],11][connect/btl_openib_connect_rdmacm.c:2191:rdmacm_component_finalize]
 rdmacm_component_finalize
[sm-node-8:10920] select: init of component openib returned failure
[sm-node-8:10920] mca: base: close: component openib closed
[sm-node-8:10919] select: initializing btl component sm
[sm-node-8:10919] select: init of component sm returned success
[sm-node-8:10918] select: initializing btl component sm
[sm-node-8:10918] select: init of component sm returned success
[sm-node-8:10920] mca: base: close: unloading component openib
[sm-node-8:10920] select: initializing btl component sm
[sm-node-8:10920] select: init of component sm returned success
[sm-node-7][[30805,1],0][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-7:28120] select: init of component openib returned failure
[sm-node-7:28120] mca: base: close: component openib closed
[sm-node-7][[30805,1],0][connect/btl_openib_connect_rdmacm.c:2191:rdmacm_component_finalize]
 rdmacm_component_finalize
[sm-node-7:28120] mca: base: close: unloading component openib
[sm-node-7:28120] select: initializing btl component sm
[sm-node-7][[30805,1],4][btl_openib_component.c:645:init_one_port] looking for 
mlx5_0:1 GID index 0
[sm-node-7][[30805,1],5][btl_openib_component.c:645:init_one_port] looking for 
mlx5_0:1 GID index 0
[sm-node-7][[30805,1],5][btl_openib_component.c:676:init_one_port] my IB 
subnet_id for HCA mlx5_0 port 1 is 0000000000000000
[sm-node-7][[30805,1],4][btl_openib_component.c:676:init_one_port] my IB 
subnet_id for HCA mlx5_0 port 1 is 0000000000000000
[sm-node-7][[30805,1],4][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-7:28124] select: init of component openib returned failure
[sm-node-7:28124] mca: base: close: component openib closed
[sm-node-7:28124] mca: base: close: unloading component openib
[sm-node-7][[30805,1],4][connect/btl_openib_connect_rdmacm.c:2191:rdmacm_component_finalize]
 rdmacm_component_finalize
[sm-node-7][[30805,1],5][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-7][[30805,1],5][connect/btl_openib_connect_rdmacm.c:2191:rdmacm_component_finalize]
 rdmacm_component_finalize
[sm-node-7:28125] select: init of component openib returned failure
[sm-node-7:28125] mca: base: close: component openib closed
[sm-node-7:28125] mca: base: close: unloading component openib
[sm-node-7:28124] select: initializing btl component sm
[sm-node-7:28124] select: init of component sm returned success
[sm-node-8][[30805,1],7][btl_openib_component.c:645:init_one_port] looking for 
mlx5_0:1 GID index 0
[sm-node-7:28125] select: initializing btl component sm
[sm-node-7:28125] select: init of component sm returned success
[sm-node-8][[30805,1],7][btl_openib_component.c:676:init_one_port] my IB 
subnet_id for HCA mlx5_0 port 1 is 0000000000000000
[sm-node-7:28120] select: init of component sm returned success
[sm-node-8][[30805,1],7][btl_openib_component.c:999:device_destruct] device was 
successfully released
[sm-node-8:10916] select: init of component openib returned failure
[sm-node-8][[30805,1],7][connect/btl_openib_connect_rdmacm.c:2191:rdmacm_component_finalize]
 rdmacm_component_finalize
[sm-node-8:10916] mca: base: close: component openib closed
[sm-node-8:10916] mca: base: close: unloading component openib
[sm-node-8:10916] select: initializing btl component sm
[sm-node-8:10916] select: init of component sm returned success
[sm-node-8:10918] mca: bml: Using self btl for send to [[30805,1],9] on node 
sm-node-8
[sm-node-8:10917] mca: bml: Using self btl for send to [[30805,1],8] on node 
sm-node-8
[sm-node-8:10916] mca: bml: Using self btl for send to [[30805,1],7] on node 
sm-node-8
[sm-node-8:10920] mca: bml: Using self btl for send to [[30805,1],11] on node 
sm-node-8
[sm-node-8:10915] mca: bml: Using self btl for send to [[30805,1],6] on node 
sm-node-8
[sm-node-8:10919] mca: bml: Using self btl for send to [[30805,1],10] on node 
sm-node-8
[sm-node-7:28124] mca: bml: Using self btl for send to [[30805,1],4] on node 
sm-node-7
[sm-node-7:28122] mca: bml: Using self btl for send to [[30805,1],2] on node 
sm-node-7
[sm-node-7:28125] mca: bml: Using self btl for send to [[30805,1],5] on node 
sm-node-7
[sm-node-7:28120] mca: bml: Using self btl for send to [[30805,1],0] on node 
sm-node-7
[sm-node-7:28121] mca: bml: Using self btl for send to [[30805,1],1] on node 
sm-node-7
[sm-node-7:28123] mca: bml: Using self btl for send to [[30805,1],3] on node 
sm-node-7
[sm-node-7:28123] mca: bml: Using sm btl for send to [[30805,1],0] on node 
sm-node-7-ce
[sm-node-7:28123] mca: bml: Using sm btl for send to [[30805,1],1] on node 
sm-node-7-ce
[sm-node-7:28123] mca: bml: Using sm btl for send to [[30805,1],2] on node 
sm-node-7-ce
[sm-node-7:28123] mca: bml: Using sm btl for send to [[30805,1],4] on node 
sm-node-7-ce
[sm-node-7:28123] mca: bml: Using sm btl for send to [[30805,1],5] on node 
sm-node-7-ce
[sm-node-8:10917] mca: bml: Using sm btl for send to [[30805,1],6] on node 
sm-node-8
[sm-node-8:10917] mca: bml: Using sm btl for send to [[30805,1],7] on node 
sm-node-8
[sm-node-8:10917] mca: bml: Using sm btl for send to [[30805,1],9] on node 
sm-node-8
[sm-node-8:10917] mca: bml: Using sm btl for send to [[30805,1],10] on node 
sm-node-8
[sm-node-8:10917] mca: bml: Using sm btl for send to [[30805,1],11] on node 
sm-node-8
[sm-node-8:10918] mca: bml: Using sm btl for send to [[30805,1],6] on node 
sm-node-8
[sm-node-8:10918] mca: bml: Using sm btl for send to [[30805,1],7] on node 
sm-node-8
[sm-node-8:10918] mca: bml: Using sm btl for send to [[30805,1],8] on node 
sm-node-8
[sm-node-8:10918] mca: bml: Using sm btl for send to [[30805,1],10] on node 
sm-node-8
[sm-node-8:10918] mca: bml: Using sm btl for send to [[30805,1],11] on node 
sm-node-8
[sm-node-8:10916] mca: bml: Using sm btl for send to [[30805,1],6] on node 
sm-node-8
[sm-node-8:10916] mca: bml: Using sm btl for send to [[30805,1],8] on node 
sm-node-8
[sm-node-8:10916] mca: bml: Using sm btl for send to [[30805,1],9] on node 
sm-node-8
[sm-node-8:10916] mca: bml: Using sm btl for send to [[30805,1],10] on node 
sm-node-8
[sm-node-8:10916] mca: bml: Using sm btl for send to [[30805,1],11] on node 
sm-node-8
[sm-node-7:28121] mca: bml: Using sm btl for send to [[30805,1],0] on node 
sm-node-7-ce
[sm-node-7:28121] mca: bml: Using sm btl for send to [[30805,1],2] on node 
sm-node-7-ce
[sm-node-7:28121] mca: bml: Using sm btl for send to [[30805,1],3] on node 
sm-node-7-ce
[sm-node-8:10920] mca: bml: Using sm btl for send to [[30805,1],6] on node 
sm-node-8
[sm-node-8:10920] mca: bml: Using sm btl for send to [[30805,1],7] on node 
sm-node-8
[sm-node-8:10920] mca: bml: Using sm btl for send to [[30805,1],8] on node 
sm-node-8
[sm-node-8:10920] mca: bml: Using sm btl for send to [[30805,1],9] on node 
sm-node-8
[sm-node-8:10920] mca: bml: Using sm btl for send to [[30805,1],10] on node 
sm-node-8
[sm-node-7:28122] mca: bml: Using sm btl for send to [[30805,1],0] on node 
sm-node-7-ce
[sm-node-7:28122] mca: bml: Using sm btl for send to [[30805,1],1] on node 
sm-node-7-ce
[sm-node-7:28122] mca: bml: Using sm btl for send to [[30805,1],3] on node 
sm-node-7-ce
[sm-node-7:28122] mca: bml: Using sm btl for send to [[30805,1],4] on node 
sm-node-7-ce
[sm-node-7:28122] mca: bml: Using sm btl for send to [[30805,1],5] on node 
sm-node-7-ce
[sm-node-7:28121] mca: bml: Using sm btl for send to [[30805,1],4] on node 
sm-node-7-ce
[sm-node-7:28121] mca: bml: Using sm btl for send to [[30805,1],5] on node 
sm-node-7-ce
[sm-node-7:28124] mca: bml: Using sm btl for send to [[30805,1],0] on node 
sm-node-7-ce
[sm-node-7:28124] mca: bml: Using sm btl for send to [[30805,1],1] on node 
sm-node-7-ce
[sm-node-7:28124] mca: bml: Using sm btl for send to [[30805,1],2] on node 
sm-node-7-ce
[sm-node-7:28124] mca: bml: Using sm btl for send to [[30805,1],3] on node 
sm-node-7-ce
[sm-node-7:28124] mca: bml: Using sm btl for send to [[30805,1],5] on node 
sm-node-7-ce
[sm-node-8:10919] mca: bml: Using sm btl for send to [[30805,1],6] on node 
sm-node-8
[sm-node-8:10919] mca: bml: Using sm btl for send to [[30805,1],7] on node 
sm-node-8
[sm-node-8:10919] mca: bml: Using sm btl for send to [[30805,1],8] on node 
sm-node-8
[sm-node-8:10919] mca: bml: Using sm btl for send to [[30805,1],9] on node 
sm-node-8
[sm-node-8:10919] mca: bml: Using sm btl for send to [[30805,1],11] on node 
sm-node-8
[sm-node-7:28125] mca: bml: Using sm btl for send to [[30805,1],0] on node 
sm-node-7-ce
[sm-node-7:28125] mca: bml: Using sm btl for send to [[30805,1],1] on node 
sm-node-7-ce
[sm-node-7:28125] mca: bml: Using sm btl for send to [[30805,1],2] on node 
sm-node-7-ce
[sm-node-7:28125] mca: bml: Using sm btl for send to [[30805,1],3] on node 
sm-node-7-ce
[sm-node-7:28125] mca: bml: Using sm btl for send to [[30805,1],4] on node 
sm-node-7-ce
[sm-node-7:28120] mca: bml: Using sm btl for send to [[30805,1],1] on node 
sm-node-7-ce
[sm-node-7:28120] mca: bml: Using sm btl for send to [[30805,1],2] on node 
sm-node-7-ce
[sm-node-7:28120] mca: bml: Using sm btl for send to [[30805,1],3] on node 
sm-node-7-ce
[sm-node-7:28120] mca: bml: Using sm btl for send to [[30805,1],4] on node 
sm-node-7-ce
[sm-node-7:28120] mca: bml: Using sm btl for send to [[30805,1],5] on node 
sm-node-7-ce
[sm-node-8:10915] mca: bml: Using sm btl for send to [[30805,1],7] on node 
sm-node-8
[sm-node-8:10915] mca: bml: Using sm btl for send to [[30805,1],8] on node 
sm-node-8
[sm-node-8:10915] mca: bml: Using sm btl for send to [[30805,1],9] on node 
sm-node-8
[sm-node-8:10915] mca: bml: Using sm btl for send to [[30805,1],10] on node 
sm-node-8
[sm-node-8:10915] mca: bml: Using sm btl for send to [[30805,1],11] on node 
sm-node-8
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[30805,1],0]) is on host: sm-node-7
  Process 2 ([[30805,1],8]) is on host: sm-node-8
  BTLs attempted: self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
[sm-node-7:28120] *** An error occurred in MPI_Bcast
[sm-node-7:28120] *** reported by process [139670060335105,21474836480]
[sm-node-7:28120] *** on communicator MPI_COMM_WORLD
[sm-node-7:28120] *** MPI_ERR_INTERN: internal error
[sm-node-7:28120] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
now abort,
[sm-node-7:28120] ***    and potentially your MPI job)
[sm-node-8:10908] 11 more processes have sent help message 
help-mpi-btl-openib.txt / error in device init
[sm-node-8:10908] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
help / error messages
[sm-node-8:10908] 3 more processes have sent help message help-mca-bml-r2.txt / 
unreachable proc
[sm-node-8:10908] 3 more processes have sent help message help-mpi-errors.txt / 
mpi_errors_are_fatal
]0;root@sm-node-8:~]7;file://sm-node-8/root[root@sm-node-8 ~]# exit
exit

Script done on Tue 24 Jan 2017 10:56:00 AM EST
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to