Encountered a problem when trying to run OpenMPI 1.5.4 with RoCE over 10GbE 
fabric.

Got this run time error:

An invalid CPC name was specified via the btl_openib_cpc_include MCA
parameter.

  Local host:                   atl3-14
  btl_openib_cpc_include value: rdmacm
  Invalid name:                 rdmacm
  All possible valid names:     oob,xoob
--------------------------------------------------------------------------
[atl3-14:07184] mca: base: components_open: component btl / openib open 
function failed
[atl3-12:09178] mca: base: components_open: component btl / openib open 
function failed

Used these options to mpirun:
  "--mca btl openib,self,sm --mca btl_openib_cpc_include rdmacm -mca 
btl_openib_if_include mlx4_0:2"

We have a Mellanox LOM with two ports, first is an IB port, second is an 10GbE 
port.
Running over the IB port and TCP over the 10GbE port work fine.

Built OpenMPI with this option "--enable-openib-rdmacm".
Our system has OFED 1.5.2 with librdmacm-1.0.13-1

I noticed this output from configure script:
checking rdma/rdma_cma.h usability... no
checking rdma/rdma_cma.h presence... no
checking for rdma/rdma_cma.h... no
checking whether IBV_LINK_LAYER_ETHERNET is declared... yes
checking if RDMAoE support is enabled... yes
checking for infiniband/driver.h... yes
checking if ConnectX XRC support is enabled... yes
checking if dynamic SL is enabled... no
checking if OpenFabrics RDMACM support is enabled... no

Are we missing a build option or a piece of software?
Config.log and output from "ompi_info --all" attached.

% ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.9.1000
        node_guid:                      78e7:d103:0021:4464
        sys_image_guid:                 78e7:d103:0021:4467
        vendor_id:                      0x02c9
        vendor_part_id:                 26438
        hw_ver:                         0xB0
        board_id:                       HP_0200000003
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 34
                        port_lid:               11
                        port_lmc:               0x00
                        link_layer:             IB

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

% /sbin/ifconfig
eth0      Link encap:Ethernet  HWaddr 78:E7:D1:21:44:60
          inet addr:16.113.180.147  Bcast:16.113.183.255  Mask:255.255.252.0
          inet6 addr: fe80::7ae7:d1ff:fe21:4460/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1861763 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1776402 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:712448939 (679.4 MiB)  TX bytes:994111004 (948.0 MiB)
          Memory:fb9e0000-fba00000

eth2      Link encap:Ethernet  HWaddr 78:E7:D1:21:44:65
          inet addr:10.10.0.147  Bcast:10.10.0.255  Mask:255.255.255.0
          inet6 addr: fe80::78e7:d100:121:4465/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8519814 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8555715 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:12370127778 (11.5 GiB)  TX bytes:12372246315 (11.5 GiB)

ib0       Link encap:InfiniBand  HWaddr 
80:00:00:4D:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:192.168.0.147  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::7ae7:d103:21:4465/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:16384  Metric:1
          RX packets:1989 errors:0 dropped:0 overruns:0 frame:0
          TX packets:208 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:275196 (268.7 KiB)  TX bytes:19202 (18.7 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:42224 errors:0 dropped:0 overruns:0 frame:0
          TX packets:42224 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:3115668 (2.9 MiB)  TX bytes:3115668 (2.9 MiB)

Thanks,

-Jeff


/**********************************************************/
/* Jeff Konz                          jeffrey.k...@hp.com */
/* Solutions Architect                   HPC Benchmarking */
/* Americas Shared Solutions Architecture (SSA)           */
/* Hewlett-Packard Company                                */
/* Office: 248-491-7480              Mobile: 248-345-6857 */
/**********************************************************/



Attachment: config.log.gz
Description: config.log.gz

Attachment: ompi_info.txt.gz
Description: ompi_info.txt.gz

Reply via email to