Hi,
When we ran openmpi v4.0.0 on a cluster with infiniband, we got the following warning and error messages. The older versions < 3.x work fine on the cluster. #################################### $ mpirun -n 4 ./a.out -------------------------------------------------------------------------- By default, for Open MPI 4.0 and later, infiniband ports on a device are not used by default. The intent is to use UCX for these devices. You can override this policy by setting the btl_openib_allow_ib MCA parameter to true. Local host: t02n34 Local adapter: mlx5_0 Local port: 1 -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: There was an error initializing an OpenFabrics device. Local host: t02n34 Local device: mlx5_0 -------------------------------------------------------------------------- libibcm: couldn't read ABI version [1546869563.579350] [t02n34:28160:0] cm_iface.c:309 UCX ERROR ib_cm_open_device() failed: No such file or directory. Check if ib_ucm.ko module is loaded. libibcm: couldn't read ABI version [1546869563.580315] [t02n34:28159:0] cm_iface.c:309 UCX ERROR ib_cm_open_device() failed: No such file or directory. Check if ib_ucm.ko module is loaded. libibcm: couldn't read ABI version [1546869563.580620] [t02n34:28161:0] cm_iface.c:309 UCX ERROR ib_cm_open_device() failed: No such file or directory. Check if ib_ucm.ko module is loaded. libibcm: couldn't read ABI version [1546869563.581113] [t02n34:28158:0] cm_iface.c:309 UCX ERROR ib_cm_open_device() failed: No such file or directory. Check if ib_ucm.ko module is loaded. [t02n34:28159] ../../../../../openmpi-4.0.0/ompi/mca/pml/ucx/pml_ucx.c:212 Error: Failed to create UCP worker [t02n34:28160] ../../../../../openmpi-4.0.0/ompi/mca/pml/ucx/pml_ucx.c:212 Error: Failed to create UCP worker [t02n34:28158] ../../../../../openmpi-4.0.0/ompi/mca/pml/ucx/pml_ucx.c:212 Error: Failed to create UCP worker [t02n34:28161] ../../../../../openmpi-4.0.0/ompi/mca/pml/ucx/pml_ucx.c:212 Error: Failed to create UCP worker Hello world from processor t02n34, rank 3 out of 4 processors Hello world from processor t02n34, rank 0 out of 4 processors Hello world from processor t02n34, rank 2 out of 4 processors Hello world from processor t02n34, rank 1 out of 4 processors [t02n34:28151] 3 more processes have sent help message help-mpi-btl-openib.txt / ib port not selected [t02n34:28151] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [t02n34:28151] 3 more processes have sent help message help-mpi-btl-openib.txt / error in device init If set the variable "btl_openib_allow_ib=1", there are other errors. t02n34$ mpirun -n 4 --mca btl_openib_allow_ib 1 ./a.out [t02n34:28232:0:28232] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7fef6749e7e0) [t02n34:28234:0:28234] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7fc2e8f4d7e0) -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- [t02n34:28233:0:28233] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7f981ee0e7e0) [t02n34:28235:0:28235] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7fdc778c07e0) -------------------------------------------------------------------------- mpirun noticed that process rank 2 with PID 0 on node t02n34 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- ############################ The configuration flags to build this version are: $ ../openmpi-4.0.0/configure --prefix=/vol/openmpi/4.0.0/ --with-ucx=/vol/openmpi/4.0.0/ucx/1.4.0 (even tried with --without-verbs but got same errors) Thanks a lot. Regards, Jing
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users