Attached is the output of running with verbose 100, mpirun --mca 
btl_openib_cpc_include rdmacm --mca btl_base_verbose 100 NPmpi
[nyx0665.engin.umich.edu:06399] mca: base: components_open: Looking for btl 
components
[nyx0666.engin.umich.edu:07210] mca: base: components_open: Looking for btl 
components
[nyx0665.engin.umich.edu:06399] mca: base: components_open: opening btl 
components
[nyx0665.engin.umich.edu:06399] mca: base: components_open: found loaded 
component ofud
[nyx0665.engin.umich.edu:06399] mca: base: components_open: component ofud has 
no register function
[nyx0665.engin.umich.edu:06399] mca: base: components_open: component ofud open 
function successful
[nyx0665.engin.umich.edu:06399] mca: base: components_open: found loaded 
component openib
[nyx0665.engin.umich.edu:06399] mca: base: components_open: component openib 
has no register function
[nyx0665.engin.umich.edu:06399] mca: base: components_open: component openib 
open function successful
[nyx0665.engin.umich.edu:06399] mca: base: components_open: found loaded 
component self
[nyx0665.engin.umich.edu:06399] mca: base: components_open: component self has 
no register function
[nyx0665.engin.umich.edu:06399] mca: base: components_open: component self open 
function successful
[nyx0665.engin.umich.edu:06399] mca: base: components_open: found loaded 
component sm
[nyx0665.engin.umich.edu:06399] mca: base: components_open: component sm has no 
register function
[nyx0665.engin.umich.edu:06399] mca: base: components_open: component sm open 
function successful
[nyx0665.engin.umich.edu:06399] mca: base: components_open: found loaded 
component tcp
[nyx0665.engin.umich.edu:06399] mca: base: components_open: component tcp has 
no register function
[nyx0665.engin.umich.edu:06399] mca: base: components_open: component tcp open 
function successful
[nyx0666.engin.umich.edu:07210] mca: base: components_open: opening btl 
components
[nyx0666.engin.umich.edu:07210] mca: base: components_open: found loaded 
component ofud
[nyx0666.engin.umich.edu:07210] mca: base: components_open: component ofud has 
no register function
[nyx0666.engin.umich.edu:07210] mca: base: components_open: component ofud open 
function successful
[nyx0666.engin.umich.edu:07210] mca: base: components_open: found loaded 
component openib
[nyx0666.engin.umich.edu:07210] mca: base: components_open: component openib 
has no register function
[nyx0666.engin.umich.edu:07210] mca: base: components_open: component openib 
open function successful
[nyx0666.engin.umich.edu:07210] mca: base: components_open: found loaded 
component self
[nyx0666.engin.umich.edu:07210] mca: base: components_open: component self has 
no register function
[nyx0666.engin.umich.edu:07210] mca: base: components_open: component self open 
function successful
[nyx0666.engin.umich.edu:07210] mca: base: components_open: found loaded 
component sm
[nyx0666.engin.umich.edu:07210] mca: base: components_open: component sm has no 
register function
[nyx0666.engin.umich.edu:07210] mca: base: components_open: component sm open 
function successful
[nyx0666.engin.umich.edu:07210] mca: base: components_open: found loaded 
component tcp
[nyx0666.engin.umich.edu:07210] mca: base: components_open: component tcp has 
no register function
[nyx0666.engin.umich.edu:07210] mca: base: components_open: component tcp open 
function successful
[nyx0665.engin.umich.edu:06399] select: initializing btl component ofud
[nyx0665.engin.umich.edu:06399] select: init of component ofud returned failure
[nyx0665.engin.umich.edu:06399] select: module ofud unloaded
[nyx0665.engin.umich.edu:06399] select: initializing btl component openib
[nyx0666.engin.umich.edu:07210] select: initializing btl component ofud
[nyx0666.engin.umich.edu:07210] select: init of component ofud returned failure
[nyx0666.engin.umich.edu:07210] select: module ofud unloaded
[nyx0666.engin.umich.edu:07210] select: initializing btl component openib
[nyx0665.engin.umich.edu:06399] openib BTL: rdmacm IP address not found on port
[nyx0665.engin.umich.edu:06399] openib BTL: rdmacm CPC unavailable for use on 
mthca0:1; skipped
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           nyx0665.engin.umich.edu
  Local device:         mthca0
  Local port:           1
  CPCs attempted:       rdmacm
--------------------------------------------------------------------------
[nyx0665.engin.umich.edu:06399] select: init of component openib returned 
failure
[nyx0665.engin.umich.edu:06399] select: module openib unloaded
[nyx0665.engin.umich.edu:06399] select: initializing btl component self
[nyx0665.engin.umich.edu:06399] select: init of component self returned success
[nyx0665.engin.umich.edu:06399] select: initializing btl component sm
[nyx0665.engin.umich.edu:06399] select: init of component sm returned success
[nyx0665.engin.umich.edu:06399] select: initializing btl component tcp
[nyx0665.engin.umich.edu:06399] select: init of component tcp returned success
[nyx0666.engin.umich.edu:07210] openib BTL: rdmacm IP address not found on port
[nyx0666.engin.umich.edu:07210] openib BTL: rdmacm CPC unavailable for use on 
mthca0:1; skipped
[nyx0666.engin.umich.edu:07210] select: init of component openib returned 
failure
[nyx0666.engin.umich.edu:07210] select: module openib unloaded
[nyx0666.engin.umich.edu:07210] select: initializing btl component self
[nyx0666.engin.umich.edu:07210] select: init of component self returned success
[nyx0666.engin.umich.edu:07210] select: initializing btl component sm
[nyx0666.engin.umich.edu:07210] select: init of component sm returned success
[nyx0666.engin.umich.edu:07210] select: initializing btl component tcp
[nyx0666.engin.umich.edu:07210] select: init of component tcp returned success
0: nyx0665
1: nyx0666
[nyx0666.engin.umich.edu:07210] btl: tcp: attempting to connect() to address 
10.164.2.153 on port 516
[nyx0665.engin.umich.edu:06399] btl: tcp: attempting to connect() to address 
10.164.2.154 on port 4
Now starting the main loop
  0:       1 bytes   1948 times -->      0.14 Mbps in      53.29 usec
  1:       2 bytes   1876 times -->      0.29 Mbps in      52.74 usec
  2:       3 bytes   1896 times -->      0.43 Mbps in      53.04 usec
  3:       4 bytes   1256 times -->      0.57 Mbps in      53.55 usec
  4:       6 bytes   1400 times -->      0.85 Mbps in      54.03 usec
  5:       8 bytes    925 times --> mpirun: killing job...

--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 6399 on node 
nyx0665.engin.umich.edu exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished

[nyx0665.engin.umich.edu:06398] 1 more process has sent help message 
help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[nyx0665.engin.umich.edu:06398] Set MCA parameter "orte_base_help_aggregate" to 
0 to see all help / error messages
2 total processes killed (some possibly by mpirun during cleanup)

We were being bit by a number of codes hanging in collectives, and was resolved 
by using rdmacm.  We tried setting this as default till the two bugs in 
bugzilla are resolved as a work around. Then we hit this problem on our DDR/SDR 
gear.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Apr 28, 2011, at 8:07 AM, Jeff Squyres wrote:

> On Apr 27, 2011, at 10:02 AM, Brock Palen wrote:
> 
>> Argh, our messed up environment with three generations on infiniband bit us,
>> Setting openib_cpc_include to rdmacm causes ib to not be used on our old DDR 
>> ib on some of our hosts.  Note that jobs will never run across our old DDR 
>> ib and our new QDR stuff where rdmacm does work.
> 
> Hmm -- odd.  I use RDMACM on some old DDR (and SDR!) IB hardware and it seems 
> to work fine.
> 
> Do you have any indication as to why OMPI is refusing to use rdmacm on your 
> older hardware, other than "No OF connection schemes reported..."?  Try 
> running with --mca btl_base_verbose 100 (beware: it will be a truckload of 
> output).  Make sure that you have rdmacm support available on those machines, 
> both in OMPI and in OFED/the OS.
> 
>> I am doing some testing with:
>> export OMPI_MCA_btl_openib_cpc_include=rdmacm,oob,xoob
>> 
>> What I want to know is there a way to tell mpirun to 'dump all resolved mca 
>> settings'  Or something similar. 
> 
> I'm not quite sure what you're asking here -- do you want to override MCA 
> params on specific hosts?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 

Reply via email to