Attached is the output of running with verbose 100, mpirun --mca btl_openib_cpc_include rdmacm --mca btl_base_verbose 100 NPmpi
[nyx0665.engin.umich.edu:06399] mca: base: components_open: Looking for btl components [nyx0666.engin.umich.edu:07210] mca: base: components_open: Looking for btl components [nyx0665.engin.umich.edu:06399] mca: base: components_open: opening btl components [nyx0665.engin.umich.edu:06399] mca: base: components_open: found loaded component ofud [nyx0665.engin.umich.edu:06399] mca: base: components_open: component ofud has no register function [nyx0665.engin.umich.edu:06399] mca: base: components_open: component ofud open function successful [nyx0665.engin.umich.edu:06399] mca: base: components_open: found loaded component openib [nyx0665.engin.umich.edu:06399] mca: base: components_open: component openib has no register function [nyx0665.engin.umich.edu:06399] mca: base: components_open: component openib open function successful [nyx0665.engin.umich.edu:06399] mca: base: components_open: found loaded component self [nyx0665.engin.umich.edu:06399] mca: base: components_open: component self has no register function [nyx0665.engin.umich.edu:06399] mca: base: components_open: component self open function successful [nyx0665.engin.umich.edu:06399] mca: base: components_open: found loaded component sm [nyx0665.engin.umich.edu:06399] mca: base: components_open: component sm has no register function [nyx0665.engin.umich.edu:06399] mca: base: components_open: component sm open function successful [nyx0665.engin.umich.edu:06399] mca: base: components_open: found loaded component tcp [nyx0665.engin.umich.edu:06399] mca: base: components_open: component tcp has no register function [nyx0665.engin.umich.edu:06399] mca: base: components_open: component tcp open function successful [nyx0666.engin.umich.edu:07210] mca: base: components_open: opening btl components [nyx0666.engin.umich.edu:07210] mca: base: components_open: found loaded component ofud [nyx0666.engin.umich.edu:07210] mca: base: components_open: component ofud has no register function [nyx0666.engin.umich.edu:07210] mca: base: components_open: component ofud open function successful [nyx0666.engin.umich.edu:07210] mca: base: components_open: found loaded component openib [nyx0666.engin.umich.edu:07210] mca: base: components_open: component openib has no register function [nyx0666.engin.umich.edu:07210] mca: base: components_open: component openib open function successful [nyx0666.engin.umich.edu:07210] mca: base: components_open: found loaded component self [nyx0666.engin.umich.edu:07210] mca: base: components_open: component self has no register function [nyx0666.engin.umich.edu:07210] mca: base: components_open: component self open function successful [nyx0666.engin.umich.edu:07210] mca: base: components_open: found loaded component sm [nyx0666.engin.umich.edu:07210] mca: base: components_open: component sm has no register function [nyx0666.engin.umich.edu:07210] mca: base: components_open: component sm open function successful [nyx0666.engin.umich.edu:07210] mca: base: components_open: found loaded component tcp [nyx0666.engin.umich.edu:07210] mca: base: components_open: component tcp has no register function [nyx0666.engin.umich.edu:07210] mca: base: components_open: component tcp open function successful [nyx0665.engin.umich.edu:06399] select: initializing btl component ofud [nyx0665.engin.umich.edu:06399] select: init of component ofud returned failure [nyx0665.engin.umich.edu:06399] select: module ofud unloaded [nyx0665.engin.umich.edu:06399] select: initializing btl component openib [nyx0666.engin.umich.edu:07210] select: initializing btl component ofud [nyx0666.engin.umich.edu:07210] select: init of component ofud returned failure [nyx0666.engin.umich.edu:07210] select: module ofud unloaded [nyx0666.engin.umich.edu:07210] select: initializing btl component openib [nyx0665.engin.umich.edu:06399] openib BTL: rdmacm IP address not found on port [nyx0665.engin.umich.edu:06399] openib BTL: rdmacm CPC unavailable for use on mthca0:1; skipped -------------------------------------------------------------------------- No OpenFabrics connection schemes reported that they were able to be used on a specific port. As such, the openib BTL (OpenFabrics support) will be disabled for this port.
Local host: nyx0665.engin.umich.edu Local device: mthca0 Local port: 1 CPCs attempted: rdmacm -------------------------------------------------------------------------- [nyx0665.engin.umich.edu:06399] select: init of component openib returned failure [nyx0665.engin.umich.edu:06399] select: module openib unloaded [nyx0665.engin.umich.edu:06399] select: initializing btl component self [nyx0665.engin.umich.edu:06399] select: init of component self returned success [nyx0665.engin.umich.edu:06399] select: initializing btl component sm [nyx0665.engin.umich.edu:06399] select: init of component sm returned success [nyx0665.engin.umich.edu:06399] select: initializing btl component tcp [nyx0665.engin.umich.edu:06399] select: init of component tcp returned success [nyx0666.engin.umich.edu:07210] openib BTL: rdmacm IP address not found on port [nyx0666.engin.umich.edu:07210] openib BTL: rdmacm CPC unavailable for use on mthca0:1; skipped [nyx0666.engin.umich.edu:07210] select: init of component openib returned failure [nyx0666.engin.umich.edu:07210] select: module openib unloaded [nyx0666.engin.umich.edu:07210] select: initializing btl component self [nyx0666.engin.umich.edu:07210] select: init of component self returned success [nyx0666.engin.umich.edu:07210] select: initializing btl component sm [nyx0666.engin.umich.edu:07210] select: init of component sm returned success [nyx0666.engin.umich.edu:07210] select: initializing btl component tcp [nyx0666.engin.umich.edu:07210] select: init of component tcp returned success 0: nyx0665 1: nyx0666 [nyx0666.engin.umich.edu:07210] btl: tcp: attempting to connect() to address 10.164.2.153 on port 516 [nyx0665.engin.umich.edu:06399] btl: tcp: attempting to connect() to address 10.164.2.154 on port 4 Now starting the main loop 0: 1 bytes 1948 times --> 0.14 Mbps in 53.29 usec 1: 2 bytes 1876 times --> 0.29 Mbps in 52.74 usec 2: 3 bytes 1896 times --> 0.43 Mbps in 53.04 usec 3: 4 bytes 1256 times --> 0.57 Mbps in 53.55 usec 4: 6 bytes 1400 times --> 0.85 Mbps in 54.03 usec 5: 8 bytes 925 times --> mpirun: killing job... -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 6399 on node nyx0665.engin.umich.edu exited on signal 0 (Unknown signal 0). -------------------------------------------------------------------------- mpirun: clean termination accomplished [nyx0665.engin.umich.edu:06398] 1 more process has sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port [nyx0665.engin.umich.edu:06398] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages 2 total processes killed (some possibly by mpirun during cleanup)
We were being bit by a number of codes hanging in collectives, and was resolved by using rdmacm. We tried setting this as default till the two bugs in bugzilla are resolved as a work around. Then we hit this problem on our DDR/SDR gear. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Apr 28, 2011, at 8:07 AM, Jeff Squyres wrote: > On Apr 27, 2011, at 10:02 AM, Brock Palen wrote: > >> Argh, our messed up environment with three generations on infiniband bit us, >> Setting openib_cpc_include to rdmacm causes ib to not be used on our old DDR >> ib on some of our hosts. Note that jobs will never run across our old DDR >> ib and our new QDR stuff where rdmacm does work. > > Hmm -- odd. I use RDMACM on some old DDR (and SDR!) IB hardware and it seems > to work fine. > > Do you have any indication as to why OMPI is refusing to use rdmacm on your > older hardware, other than "No OF connection schemes reported..."? Try > running with --mca btl_base_verbose 100 (beware: it will be a truckload of > output). Make sure that you have rdmacm support available on those machines, > both in OMPI and in OFED/the OS. > >> I am doing some testing with: >> export OMPI_MCA_btl_openib_cpc_include=rdmacm,oob,xoob >> >> What I want to know is there a way to tell mpirun to 'dump all resolved mca >> settings' Or something similar. > > I'm not quite sure what you're asking here -- do you want to override MCA > params on specific hosts? > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >