--mca btl_openib_ib_path_record_**service_level 1 flag controls openib btl, you need to remove --mca mtl mxm from command line.
Have you compiled OpenMPI with rhel6.4 inbox ofed driver? AFAIK, the MOFED 2.x does not have XRC and you mentioned "--enable-openib-connectx-xrc" flag in configure. On Tue, Jun 11, 2013 at 3:02 PM, Jesús Escudero Sahuquillo < jescud...@dsi.uclm.es> wrote: > I have a 16-node Mellanox cluster built with Mellanox ConnectX3 cards. > Recently I have updated the MLNX_OFED to the 2.0.5 version. The reason of > this e-mail to the OpenMPI users list is that I am not able to run MPI > applications using the service levels (SLs) feature of the OpenMPI driver. > > Currently, the nodes have the Red-Hat 6.4 with the kernel > 2.6.32-358.el6.x86_64. I have compiled OpenMPI 1.6.4 with: > > ./configure --with-sge --with-openib=/usr --enable-openib-connectx-xrc > --enable-mpi-thread-multiple --with-threads --with-hwloc > --enable-heterogeneous --with-fca=/opt/mellanox/fca --with-mxm-libdir=/opt/ > **mellanox/mxm/lib --with-mxm=/opt/mellanox/mxm > --prefix=/home/jescudero/opt/**openmpi > > I have modified the OpenSM code (which is based on 3.3.15) in order to > include a special routing algorithm based on "ftree". Apparently all is > correct with the OpenSM since it returns the SLs when I execute the command > "saquery --src-to-dst slid:dlid". Anyway, I have also tried to run the > OpenSM with the DFSSSP algorithm. > > However, when I try to run MPI applications (i.e. HPCC, OSU or even > alltoall.c -included in the OpenMPI sources-) I experience some errors if > the "btl_openib_path_record_info" is set to "1", otherwise (i.e. if the > btl_openib_path_record_info is not enabled) the application execution ends > correctly. I run the MPI application with the next command: > > mpirun -display-allocation -display-map -np 8 -machinefile maquinas.aux > --mca btl openib,self,sm --mca mtl mxm --mca > btl_openib_ib_path_record_**service_level > 1 --mca btl_openib_cpc_include oob hpcc > > I obtain the next trace: > > [nodo20.XXXXX][[31227,1],6][**connect/btl_openib_connect_sl.**c:239:get_pathrecord_info] > error posting receive on QP [0x16db] errno says: Success [0] > [nodo15.XXXXX][[31227,1],4][**connect/btl_openib_connect_sl.**c:239:get_pathrecord_info] > error posting receive on QP [0x1749] errno says: Success [0] > [nodo17.XXXXX][[31227,1],5][**connect/btl_openib_connect_sl.**c:239:get_pathrecord_info] > error posting receive on QP [0x1783] errno says: Success [0] > [nodo21.XXXXX][[31227,1],7][**connect/btl_openib_connect_sl.**c:239:get_pathrecord_info] > error posting receive on QP [0x1838] errno says: Success [0] > [nodo21.XXXXX][[31227,1],7][**connect/btl_openib_connect_**oob.c:885:rml_recv_cb] > endpoint connect error: -1 > [nodo17.XXXXX][[31227,1],5][**connect/btl_openib_connect_**oob.c:885:rml_recv_cb] > endpoint connect error: -1 > [nodo15.XXXXX][[31227,1],4][**connect/btl_openib_connect_**oob.c:885:rml_recv_cb] > endpoint connect error: -1 > [nodo20.XXXXX][[31227,1],6][**connect/btl_openib_connect_**oob.c:885:rml_recv_cb] > endpoint connect error: -1 > > Does anyone know what I am doing wrong? > > All the best, > > > > > > > ______________________________**_________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users> >