On Jun 2, 2010, at 9:54 AM, Jeff Squyres wrote: >> this is the output I get on a node with ethernet and infiniband hardware. >> note the Error regarding mx. >> >> $ ~/openmpi-1.4.2-bin/bin/mpirun ~/bwlat/mpi_helloworld >> [bordeplage-9.bordeaux.grid5000.fr:32365] Error in mx_init (error No MX >> device entry in /dev.)
This is ompi_common_mx_initialize(). It fails since there is no MX and prints the above with: opal_output(0, "Error in mx_init (error %s)\n", mx_strerror(mx_return)); return OMPI_ERR_NOT_AVAILABLE; >> [bordeplage-9.bordeaux.grid5000.fr:32365] mca_btl_mx_component_init: >> mx_get_info(MX_NIC_COUNT) failed with status 4(MX not initialized.) > > I'm guessing the MX BTL is designed to be noisy when it fails, on the > assumption that if MX is down, you probably want to know it. > > George/Myricom -- can you confirm? This is odd. The ompi_common_mx_initialize() above does not return OPAL_SUCCESS to mca_btl_mx_component_init(). It should return NULL and never call mx_get_info(). This too uses a opal_output(0, ...). I will let George comment on the verbosity. It looks like ompi_common_mx_initialize() is doing things that affect memory before calling mx_init() such as setting ompi_mpi_leave_pinned to 1 and setting mpool_resources.regcache_clean = mx__regcache_clean. There is a chicken-and-egg scenario. The BTL needs to set an registration cache environment variable before calling mx_init(), but the altering of mpool resources should probably wait until after the fact in case MX is not available. Does the same error happen if he tries on a MX host that does not have IB? Scott