I've tried to launch the application on nodes with QDR Infiniband. The first attempt with 2 processes worked, but the following was printed to the output: [1345633953.436676] [b01:2523 :0] mpool.c:99 MXM ERROR Invalid mempool parameter(s) [1345633953.436676] [b01:2522 :0] mpool.c:99 MXM ERROR Invalid mempool parameter(s) -------------------------------------------------------------------------- MXM was unable to create an endpoint. Please make sure that the network link is active on the node and the hardware is functioning.
Error: Invalid parameter -------------------------------------------------------------------------- The results from this launch didn't differ from the results of the launch without MXM. Then I've tried to launch it with 256 processes, but got the same message from each process and then the application crashed. After that I'm observing the same behavior as with FDR: application hangs in the beginning. Best regards, Pavel Mezentsev. 2012/8/22 Pavel Mezentsev <pavel.mezent...@gmail.com> > Hello! > > I've built openmpi 1.6.1rc3 with support of MXM. But when I try to launch > an application using this mtl it hangs and can't figure out why. > > If I launch it with np below 128 then everything works fine since mxm > isn't used. I've tried setting the threshold to 0 and launching 2 processes > with the same result: hangs on startup. > What could be causing this problem? > > Here is the command I execute: > /opt/openmpi/1.6.1/mxm-test/bin/mpirun \ > -np $NP \ > -hostfile hosts_fdr2 \ > --mca mtl mxm \ > --mca btl ^tcp \ > --mca mtl_mxm_np 0 \ > -x OMP_NUM_THREADS=$NT \ > -x LD_LIBRARY_PATH \ > --bind-to-core \ > -npernode 16 \ > --mca coll_fca_np 0 -mca coll_fca_enable 0 \ > ./IMB-MPI1 -npmin $NP Allreduce Reduce Barrier Bcast > Allgather Allgatherv > > I'm performing the tests on nodes with Intel SB processors and FDR. > Openmpi was configured with the following parameters: > CC=icc CXX=icpc F77=ifort FC=ifort ./configure > --prefix=/opt/openmpi/1.6.1rc3/mxm-test --with-mxm=/opt/mellanox/mxm > --with-fca=/opt/mellanox/fca --with-knem=/usr/share/knem > I'm using the latest ofed from mellanox: 1.5.3-3.1.0 on centos 6.1 with > default kernel: 2.6.32-131.0.15. > The compilation with default mxm (1.0.601) failed so I installed the > latest version from mellanox: 1.1.1227 > > Best regards, Pavel Mezentsev. >