Encountered a problem when trying to run OpenMPI 1.5.4 with RoCE over 10GbE fabric.
Got this run time error: An invalid CPC name was specified via the btl_openib_cpc_include MCA parameter. Local host: atl3-14 btl_openib_cpc_include value: rdmacm Invalid name: rdmacm All possible valid names: oob,xoob -------------------------------------------------------------------------- [atl3-14:07184] mca: base: components_open: component btl / openib open function failed [atl3-12:09178] mca: base: components_open: component btl / openib open function failed Used these options to mpirun: "--mca btl openib,self,sm --mca btl_openib_cpc_include rdmacm -mca btl_openib_if_include mlx4_0:2" We have a Mellanox LOM with two ports, first is an IB port, second is an 10GbE port. Running over the IB port and TCP over the 10GbE port work fine. Built OpenMPI with this option "--enable-openib-rdmacm". Our system has OFED 1.5.2 with librdmacm-1.0.13-1 I noticed this output from configure script: checking rdma/rdma_cma.h usability... no checking rdma/rdma_cma.h presence... no checking for rdma/rdma_cma.h... no checking whether IBV_LINK_LAYER_ETHERNET is declared... yes checking if RDMAoE support is enabled... yes checking for infiniband/driver.h... yes checking if ConnectX XRC support is enabled... yes checking if dynamic SL is enabled... no checking if OpenFabrics RDMACM support is enabled... no Are we missing a build option or a piece of software? Config.log and output from "ompi_info --all" attached. % ibv_devinfo hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.9.1000 node_guid: 78e7:d103:0021:4464 sys_image_guid: 78e7:d103:0021:4467 vendor_id: 0x02c9 vendor_part_id: 26438 hw_ver: 0xB0 board_id: HP_0200000003 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 34 port_lid: 11 port_lmc: 0x00 link_layer: IB port: 2 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet % /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 78:E7:D1:21:44:60 inet addr:16.113.180.147 Bcast:16.113.183.255 Mask:255.255.252.0 inet6 addr: fe80::7ae7:d1ff:fe21:4460/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1861763 errors:0 dropped:0 overruns:0 frame:0 TX packets:1776402 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:712448939 (679.4 MiB) TX bytes:994111004 (948.0 MiB) Memory:fb9e0000-fba00000 eth2 Link encap:Ethernet HWaddr 78:E7:D1:21:44:65 inet addr:10.10.0.147 Bcast:10.10.0.255 Mask:255.255.255.0 inet6 addr: fe80::78e7:d100:121:4465/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:8519814 errors:0 dropped:0 overruns:0 frame:0 TX packets:8555715 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:12370127778 (11.5 GiB) TX bytes:12372246315 (11.5 GiB) ib0 Link encap:InfiniBand HWaddr 80:00:00:4D:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:192.168.0.147 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::7ae7:d103:21:4465/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:16384 Metric:1 RX packets:1989 errors:0 dropped:0 overruns:0 frame:0 TX packets:208 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:275196 (268.7 KiB) TX bytes:19202 (18.7 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:42224 errors:0 dropped:0 overruns:0 frame:0 TX packets:42224 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3115668 (2.9 MiB) TX bytes:3115668 (2.9 MiB) Thanks, -Jeff /**********************************************************/ /* Jeff Konz jeffrey.k...@hp.com */ /* Solutions Architect HPC Benchmarking */ /* Americas Shared Solutions Architecture (SSA) */ /* Hewlett-Packard Company */ /* Office: 248-491-7480 Mobile: 248-345-6857 */ /**********************************************************/
config.log.gz
Description: config.log.gz
ompi_info.txt.gz
Description: ompi_info.txt.gz