/usr/bin/ofed_info So, the OFED on your system is not MellanoxOFED 2.4.x but smth else.
try #rpm -qi libibverbs On Thu, Apr 23, 2015 at 7:47 AM, Subhra Mazumdar <subhramazumd...@gmail.com> wrote: > Hi, > > where is the command ofed_info located? I searched from / but didn't find > it. > > Subhra. > > On Tue, Apr 21, 2015 at 10:43 PM, Mike Dubman <mi...@dev.mellanox.co.il> > wrote: > >> cool, progress! >> >> >>1429676565.124664] sys.c:719 MXM WARN Conflicting CPU >> frequencies detected, using: 2601.00 >> >> means that cpu governor on your machine is not on "performance" mode >> >> >> MXM ERROR ibv_query_device() returned 38: Function not implemented >> >> indicates that ofed installed on your nodes is not indeed 2.4.-1.0.0 or >> there is a mismatch between ofed kernel drivers version and ofed userspace >> libraries version. >> or you have multiple ofed libraries installed on your node and use >> incorrect one. >> could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0? >> >> >> >> >> >> On Wed, Apr 22, 2015 at 7:59 AM, Subhra Mazumdar < >> subhramazumd...@gmail.com> wrote: >> >>> Hi, >>> >>> I compiled the openmpi that comes inside the mellanox hpcx package with >>> mxm support instead of separately downloaded openmpi. I also used the >>> environment as in the README so that no LD_PRELOAD (except our own library >>> which is unrelated) is needed. Now it runs fine (no segfault) but we get >>> same errors as before (saying initialization of MXM library failed). Is it >>> using MXM successfully? >>> >>> [root@JARVICE >>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# mpirun >>> --allow-run-as-root --mca mtl mxm -n 1 /root/backend localhost : -x >>> LD_PRELOAD=/root/libci.so -n 1 /root/app2 >>> >>> -------------------------------------------------------------------------- >>> WARNING: a request was made to bind a process. While the system >>> supports binding the process itself, at least one node does NOT >>> support binding memory to the process location. >>> >>> Node: JARVICE >>> >>> This usually is due to not having the required NUMA support installed >>> on the node. In some Linux distributions, the required support is >>> contained in the libnumactl and libnumactl-devel packages. >>> This is a warning only; your job will continue, though performance may >>> be degraded. >>> >>> -------------------------------------------------------------------------- >>> i am backend >>> [1429676565.121218] sys.c:719 MXM WARN Conflicting CPU >>> frequencies detected, using: 2601.00 >>> [1429676565.122937] [JARVICE:14767:0] ib_dev.c:445 MXM WARN >>> failed call to ibv_exp_use_priv_env(): Function not implemented >>> [1429676565.122950] [JARVICE:14767:0] ib_dev.c:456 MXM ERROR >>> ibv_query_device() returned 38: Function not implemented >>> [1429676565.123535] [JARVICE:14767:0] ib_dev.c:445 MXM WARN >>> failed call to ibv_exp_use_priv_env(): Function not implemented >>> [1429676565.123543] [JARVICE:14767:0] ib_dev.c:456 MXM ERROR >>> ibv_query_device() returned 38: Function not implemented >>> [1429676565.124664] sys.c:719 MXM WARN Conflicting CPU >>> frequencies detected, using: 2601.00 >>> [1429676565.126264] [JARVICE:14768:0] ib_dev.c:445 MXM WARN >>> failed call to ibv_exp_use_priv_env(): Function not implemented >>> [1429676565.126276] [JARVICE:14768:0] ib_dev.c:456 MXM ERROR >>> ibv_query_device() returned 38: Function not implemented >>> [1429676565.126812] [JARVICE:14768:0] ib_dev.c:445 MXM WARN >>> failed call to ibv_exp_use_priv_env(): Function not implemented >>> [1429676565.126821] [JARVICE:14768:0] ib_dev.c:456 MXM ERROR >>> ibv_query_device() returned 38: Function not implemented >>> >>> -------------------------------------------------------------------------- >>> Initialization of MXM library failed. >>> >>> Error: Input/output error >>> >>> >>> -------------------------------------------------------------------------- >>> >>> <application runs fine> >>> >>> >>> Thanks, >>> Subhra. >>> >>> >>> On Sat, Apr 18, 2015 at 12:28 AM, Mike Dubman <mi...@dev.mellanox.co.il> >>> wrote: >>> >>>> could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0? >>>> why LD_PRELOAD needed in your command line? Can you try >>>> >>>> module load hpcx >>>> mpirun -np $np test.exe >>>> ? >>>> >>>> On Sat, Apr 18, 2015 at 8:39 AM, Subhra Mazumdar < >>>> subhramazumd...@gmail.com> wrote: >>>> >>>>> I followed the instructions as in the README, now getting a different >>>>> error: >>>>> >>>>> [root@JARVICE >>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# >>>>> ../openmpi-1.8.4/openmpinstall/bin/mpirun --allow-run-as-root --mca mtl >>>>> mxm >>>>> -x LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 >>>>> ./mxm/lib/libmxm.so.2" -n 1 ../backend localhost : -x >>>>> LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 >>>>> ./mxm/lib/libmxm.so.2 ../libci.so" -n 1 ../app2 >>>>> >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> WARNING: a request was made to bind a process. While the system >>>>> >>>>> supports binding the process itself, at least one node does NOT >>>>> >>>>> support binding memory to the process location. >>>>> >>>>> Node: JARVICE >>>>> >>>>> This usually is due to not having the required NUMA support installed >>>>> >>>>> on the node. In some Linux distributions, the required support is >>>>> >>>>> contained in the libnumactl and libnumactl-devel packages. >>>>> >>>>> This is a warning only; your job will continue, though performance may >>>>> be degraded. >>>>> >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> i am backend >>>>> >>>>> [1429334876.139452] [JARVICE:449 :0] ib_dev.c:445 MXM WARN >>>>> failed call to ibv_exp_use_priv_env(): Function not implemented >>>>> >>>>> [1429334876.139464] [JARVICE:449 :0] ib_dev.c:456 MXM ERROR >>>>> ibv_query_device() returned 38: Function not implemented >>>>> >>>>> [1429334876.139982] [JARVICE:449 :0] ib_dev.c:445 MXM WARN >>>>> failed call to ibv_exp_use_priv_env(): Function not implemented >>>>> >>>>> [1429334876.139990] [JARVICE:449 :0] ib_dev.c:456 MXM ERROR >>>>> ibv_query_device() returned 38: Function not implemented >>>>> >>>>> [1429334876.142649] [JARVICE:450 :0] ib_dev.c:445 MXM WARN >>>>> failed call to ibv_exp_use_priv_env(): Function not implemented >>>>> >>>>> [1429334876.142666] [JARVICE:450 :0] ib_dev.c:456 MXM ERROR >>>>> ibv_query_device() returned 38: Function not implemented >>>>> >>>>> [1429334876.143235] [JARVICE:450 :0] ib_dev.c:445 MXM WARN >>>>> failed call to ibv_exp_use_priv_env(): Function not implemented >>>>> >>>>> [1429334876.143243] [JARVICE:450 :0] ib_dev.c:456 MXM ERROR >>>>> ibv_query_device() returned 38: Function not implemented >>>>> >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> Initialization of MXM library failed. >>>>> >>>>> Error: Input/output error >>>>> >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> [JARVICE:449 :0] Caught signal 11 (Segmentation fault) >>>>> >>>>> [JARVICE:450 :0] Caught signal 11 (Segmentation fault) >>>>> >>>>> ==== backtrace ==== >>>>> >>>>> 2 0x000000000005640c mxm_handle_error() >>>>> >>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641 >>>>> >>>>> 3 0x000000000005657c mxm_error_signal_handler() >>>>> >>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616 >>>>> >>>>> 4 0x00000000000329a0 killpg() ??:0 >>>>> >>>>> 5 0x000000000004812c _IO_vfprintf() ??:0 >>>>> >>>>> 6 0x000000000006f6da vasprintf() ??:0 >>>>> >>>>> 7 0x0000000000059b3b opal_show_help_vstring() ??:0 >>>>> >>>>> 8 0x0000000000026630 orte_show_help() ??:0 >>>>> >>>>> 9 0x0000000000001a3f mca_bml_r2_add_procs() >>>>> >>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409 >>>>> >>>>> 10 0x0000000000004475 mca_pml_ob1_add_procs() >>>>> >>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332 >>>>> >>>>> 11 0x00000000000442f3 ompi_mpi_init() ??:0 >>>>> >>>>> 12 0x0000000000067cb0 PMPI_Init_thread() ??:0 >>>>> >>>>> 13 0x000000000000d0ca l_getLocalFromConfig() >>>>> /root/rain_ib/interposer/libciutils.c:83 >>>>> >>>>> 14 0x000000000000c7b4 __cudaRegisterFatBinary() >>>>> /root/rain_ib/interposer/libci.c:4055 >>>>> >>>>> 15 0x0000000000402b59 >>>>> _ZL70__sti____cudaRegisterAll_39_tmpxft_00000703_00000000_6_app2_cpp1_ii_hwv() >>>>> tmpxft_00000703_00000000-3_app2.cudafe1.cpp:0 >>>>> >>>>> 16 0x0000000000402dd6 __do_global_ctors_aux() crtstuff.c:0 >>>>> >>>>> =================== >>>>> >>>>> ==== backtrace ==== >>>>> >>>>> 2 0x000000000005640c mxm_handle_error() >>>>> >>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641 >>>>> >>>>> 3 0x000000000005657c mxm_error_signal_handler() >>>>> >>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616 >>>>> >>>>> 4 0x00000000000329a0 killpg() ??:0 >>>>> >>>>> 5 0x000000000004812c _IO_vfprintf() ??:0 >>>>> >>>>> 6 0x000000000006f6da vasprintf() ??:0 >>>>> >>>>> 7 0x0000000000059b3b opal_show_help_vstring() ??:0 >>>>> >>>>> 8 0x0000000000026630 orte_show_help() ??:0 >>>>> >>>>> 9 0x0000000000001a3f mca_bml_r2_add_procs() >>>>> >>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409 >>>>> >>>>> 10 0x0000000000004475 mca_pml_ob1_add_procs() >>>>> >>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332 >>>>> >>>>> 11 0x00000000000442f3 ompi_mpi_init() ??:0 >>>>> >>>>> 12 0x0000000000067cb0 PMPI_Init_thread() ??:0 >>>>> >>>>> 13 0x0000000000404fdf main() /root/rain_ib/backend/backend.c:1237 >>>>> >>>>> 14 0x000000000001ed1d __libc_start_main() ??:0 >>>>> >>>>> 15 0x0000000000402db9 _start() ??:0 >>>>> >>>>> =================== >>>>> >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> mpirun noticed that process rank 1 with PID 450 on node JARVICE exited >>>>> on signal 11 (Segmentation fault). >>>>> >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> [JARVICE:00447] 1 more process has sent help message help-mtl-mxm.txt >>>>> / mxm init >>>>> >>>>> [JARVICE:00447] Set MCA parameter "orte_base_help_aggregate" to 0 to >>>>> see all help / error messages >>>>> >>>>> [root@JARVICE >>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# >>>>> >>>>> >>>>> Subhra. >>>>> >>>>> >>>>> On Mon, Apr 13, 2015 at 10:58 PM, Mike Dubman < >>>>> mi...@dev.mellanox.co.il> wrote: >>>>> >>>>>> Have you followed installation steps from README (Also here for >>>>>> reference http://bgate.mellanox.com/products/hpcx/README.txt) >>>>>> >>>>>> ... >>>>>> >>>>>> * Load OpenMPI/OpenSHMEM v1.8 based package: >>>>>> >>>>>> % source $HPCX_HOME/hpcx-init.sh >>>>>> % hpcx_load >>>>>> % env | grep HPCX >>>>>> % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_usempi >>>>>> % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem >>>>>> % hpcx_unload >>>>>> >>>>>> 3. Load HPCX environment from modules >>>>>> >>>>>> * Load OpenMPI/OpenSHMEM based package: >>>>>> >>>>>> % module use $HPCX_HOME/modulefiles >>>>>> % module load hpcx >>>>>> % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_c >>>>>> % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem >>>>>> % module unload hpcx >>>>>> >>>>>> ... >>>>>> >>>>>> On Tue, Apr 14, 2015 at 5:42 AM, Subhra Mazumdar < >>>>>> subhramazumd...@gmail.com> wrote: >>>>>> >>>>>>> I am using 2.4-1.0.0 mellanox ofed. >>>>>>> >>>>>>> I downloaded mofed tarball >>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5.tar and >>>>>>> extracted >>>>>>> it. It has mxm directory. >>>>>>> >>>>>>> hpcx-v1.2.0-325-[root@JARVICE ~]# ls >>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5 >>>>>>> archive fca hpcx-init-ompi-mellanox-v1.8.sh ibprof >>>>>>> modulefiles ompi-mellanox-v1.8 sources VERSION >>>>>>> bupc-master hcoll hpcx-init.sh knem >>>>>>> mxm README.txt utils >>>>>>> >>>>>>> I tried using LD_PRELOAD for libmxm, but getting a different error >>>>>>> stack now as following >>>>>>> >>>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun >>>>>>> --allow-run-as-root --mca mtl mxm -x >>>>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 >>>>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2" >>>>>>> -n 1 ./backend localhost : -x >>>>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 >>>>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2 >>>>>>> ./libci.so" -n 1 ./app2 >>>>>>> i am backend >>>>>>> [JARVICE:00564] mca: base: components_open: component pml / cm open >>>>>>> function failed >>>>>>> [JARVICE:564 :0] Caught signal 11 (Segmentation fault) >>>>>>> [JARVICE:00565] mca: base: components_open: component pml / cm open >>>>>>> function failed >>>>>>> [JARVICE:565 :0] Caught signal 11 (Segmentation fault) >>>>>>> ==== backtrace ==== >>>>>>> 2 0x000000000005640c mxm_handle_error() >>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641 >>>>>>> 3 0x000000000005657c mxm_error_signal_handler() >>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616 >>>>>>> 4 0x00000000000329a0 killpg() ??:0 >>>>>>> 5 0x0000000000045491 mca_base_components_close() ??:0 >>>>>>> 6 0x000000000004e99a mca_base_framework_close() ??:0 >>>>>>> 7 0x0000000000045431 mca_base_component_close() ??:0 >>>>>>> 8 0x000000000004515c mca_base_framework_components_open() ??:0 >>>>>>> 9 0x00000000000a0de9 mca_pml_base_open() pml_base_frame.c:0 >>>>>>> 10 0x000000000004eb1c mca_base_framework_open() ??:0 >>>>>>> 11 0x0000000000043eb3 ompi_mpi_init() ??:0 >>>>>>> 12 0x0000000000067cb0 PMPI_Init_thread() ??:0 >>>>>>> 13 0x0000000000404fdf main() /root/rain_ib/backend/backend.c:1237 >>>>>>> 14 0x000000000001ed1d __libc_start_main() ??:0 >>>>>>> 15 0x0000000000402db9 _start() ??:0 >>>>>>> =================== >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> A requested component was not found, or was unable to be opened. >>>>>>> This >>>>>>> means that this component is either not installed or is unable to be >>>>>>> used on your system (e.g., sometimes this means that shared libraries >>>>>>> that the component requires are unable to be found/loaded). Note >>>>>>> that >>>>>>> Open MPI stopped checking at the first component that it did not >>>>>>> find. >>>>>>> >>>>>>> Host: JARVICE >>>>>>> Framework: mtl >>>>>>> Component: mxm >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> mpirun noticed that process rank 0 with PID 564 on node JARVICE >>>>>>> exited on signal 11 (Segmentation fault). >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> [JARVICE:00562] 1 more process has sent help message >>>>>>> help-mca-base.txt / find-available:not-valid >>>>>>> [JARVICE:00562] Set MCA parameter "orte_base_help_aggregate" to 0 to >>>>>>> see all help / error messages >>>>>>> >>>>>>> >>>>>>> Subhra >>>>>>> >>>>>>> >>>>>>> On Sun, Apr 12, 2015 at 10:48 PM, Mike Dubman < >>>>>>> mi...@dev.mellanox.co.il> wrote: >>>>>>> >>>>>>>> seems like mxm was not found in your ld_library_path. >>>>>>>> >>>>>>>> what mofed version do you use? >>>>>>>> does it have /opt/mellanox/mxm in it? >>>>>>>> You could just run mpirun from HPCX package which looks for mxm >>>>>>>> internally and recompile ompi as mentioned in README. >>>>>>>> >>>>>>>> On Mon, Apr 13, 2015 at 3:24 AM, Subhra Mazumdar < >>>>>>>> subhramazumd...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I used mxm mtl as follows but getting segfault. It says mxm >>>>>>>>> component not found but I have compiled openmpi with mxm. Any idea >>>>>>>>> what I >>>>>>>>> might be missing? >>>>>>>>> >>>>>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun >>>>>>>>> --allow-run-as-root --mca pml cm --mca mtl mxm -n 1 -x >>>>>>>>> LD_PRELOAD=./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 ./backend >>>>>>>>> localhosst : -n 1 -x LD_PRELOAD="./libci.so >>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1" ./app2 >>>>>>>>> i am backend >>>>>>>>> [JARVICE:08398] *** Process received signal *** >>>>>>>>> [JARVICE:08398] Signal: Segmentation fault (11) >>>>>>>>> [JARVICE:08398] Signal code: Address not mapped (1) >>>>>>>>> [JARVICE:08398] Failing at address: 0x10 >>>>>>>>> [JARVICE:08398] [ 0] >>>>>>>>> /lib64/libpthread.so.0(+0xf710)[0x7ff8d0ddb710] >>>>>>>>> [JARVICE:08398] [ 1] >>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_components_close+0x21)[0x7ff8cf9ae491] >>>>>>>>> [JARVICE:08398] [ 2] >>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_close+0x6a)[0x7ff8cf9b799a] >>>>>>>>> [JARVICE:08398] [ 3] >>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_component_close+0x21)[0x7ff8cf9ae431] >>>>>>>>> [JARVICE:08398] [ 4] >>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_components_open+0x11c)[0x7ff8cf9ae15c] >>>>>>>>> [JARVICE:08398] [ 5] >>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(+0xa0de9)[0x7ff8d1089de9] >>>>>>>>> [JARVICE:08398] [ 6] >>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7ff8cf9b7b1c] >>>>>>>>> [JARVICE:08398] [ 7] [JARVICE:08398] mca: base: components_open: >>>>>>>>> component pml / cm open function failed >>>>>>>>> >>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(ompi_mpi_init+0x4b3)[0x7ff8d102ceb3] >>>>>>>>> [JARVICE:08398] [ 8] >>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(PMPI_Init_thread+0x100)[0x7ff8d1050cb0] >>>>>>>>> [JARVICE:08398] [ 9] ./backend[0x404fdf] >>>>>>>>> [JARVICE:08398] [10] >>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff8cfeded1d] >>>>>>>>> [JARVICE:08398] [11] ./backend[0x402db9] >>>>>>>>> [JARVICE:08398] *** End of error message *** >>>>>>>>> >>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>> A requested component was not found, or was unable to be opened. >>>>>>>>> This >>>>>>>>> means that this component is either not installed or is unable to >>>>>>>>> be >>>>>>>>> used on your system (e.g., sometimes this means that shared >>>>>>>>> libraries >>>>>>>>> that the component requires are unable to be found/loaded). Note >>>>>>>>> that >>>>>>>>> Open MPI stopped checking at the first component that it did not >>>>>>>>> find. >>>>>>>>> >>>>>>>>> Host: JARVICE >>>>>>>>> Framework: mtl >>>>>>>>> Component: mxm >>>>>>>>> >>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>> mpirun noticed that process rank 0 with PID 8398 on node JARVICE >>>>>>>>> exited on signal 11 (Segmentation fault). >>>>>>>>> >>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> >>>>>>>>> Subhra. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 10, 2015 at 12:12 AM, Mike Dubman < >>>>>>>>> mi...@dev.mellanox.co.il> wrote: >>>>>>>>> >>>>>>>>>> no need IPoIB, mxm uses native IB. >>>>>>>>>> >>>>>>>>>> Please see HPCX (pre-compiled ompi, integrated with MXM and FCA) >>>>>>>>>> README file for details how to compile/select. >>>>>>>>>> >>>>>>>>>> The default transport is UD for internode communication and >>>>>>>>>> shared-memory for intra-node. >>>>>>>>>> >>>>>>>>>> http://bgate,mellanox.com/products/hpcx/ >>>>>>>>>> >>>>>>>>>> Also, mxm included in the Mellanox OFED. >>>>>>>>>> >>>>>>>>>> On Fri, Apr 10, 2015 at 5:26 AM, Subhra Mazumdar < >>>>>>>>>> subhramazumd...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Does ipoib need to be configured on the ib cards for mxm (I have >>>>>>>>>>> a separate ethernet connection too)? Also are there special flags >>>>>>>>>>> in mpirun >>>>>>>>>>> to select from UD/RC/DC? What is the default? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Subhra. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 31, 2015 at 9:46 AM, Mike Dubman < >>>>>>>>>>> mi...@dev.mellanox.co.il> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> mxm uses IB rdma/roce technologies. Once can select UD/RC/DC >>>>>>>>>>>> transports to be used in mxm. >>>>>>>>>>>> >>>>>>>>>>>> By selecting mxm, all MPI p2p routines will be mapped to >>>>>>>>>>>> appropriate mxm functions. >>>>>>>>>>>> >>>>>>>>>>>> M >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Mar 30, 2015 at 7:32 PM, Subhra Mazumdar < >>>>>>>>>>>> subhramazumd...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi MIke, >>>>>>>>>>>>> >>>>>>>>>>>>> Does the mxm mtl use infiniband rdma? Also from programming >>>>>>>>>>>>> perspective, do I need to use anything else other than >>>>>>>>>>>>> MPI_Send/MPI_Recv? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Subhra. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Mar 29, 2015 at 11:14 PM, Mike Dubman < >>>>>>>>>>>>> mi...@dev.mellanox.co.il> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> openib btl does not support this thread model. >>>>>>>>>>>>>> You can use OMPI w/ mxm (-mca mtl mxm) and multiple thread >>>>>>>>>>>>>> mode lin 1.8 x series or (-mca pml yalla) in the master branch. >>>>>>>>>>>>>> >>>>>>>>>>>>>> M >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Mar 30, 2015 at 9:09 AM, Subhra Mazumdar < >>>>>>>>>>>>>> subhramazumd...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Can MPI_THREAD_MULTIPLE and openib btl work together in >>>>>>>>>>>>>>> open mpi 1.8.4? If so are there any command line options needed >>>>>>>>>>>>>>> during run >>>>>>>>>>>>>>> time? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Subhra. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> Subscription: >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>> Link to this post: >>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26574.php >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> Kind Regards, >>>>>>>>>>>>>> >>>>>>>>>>>>>> M. >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> Subscription: >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> Link to this post: >>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26575.php >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> Subscription: >>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> Link to this post: >>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26580.php >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> Kind Regards, >>>>>>>>>>>> >>>>>>>>>>>> M. >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> Subscription: >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>> Link to this post: >>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26584.php >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> Link to this post: >>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26663.php >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Kind Regards, >>>>>>>>>> >>>>>>>>>> M. >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> Link to this post: >>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26665.php >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this post: >>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26686.php >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Kind Regards, >>>>>>>> >>>>>>>> M. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26688.php >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26711.php >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Kind Regards, >>>>>> >>>>>> M. >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26712.php >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/04/26752.php >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Kind Regards, >>>> >>>> M. >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26754.php >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/04/26761.php >>> >> >> >> >> -- >> >> Kind Regards, >> >> M. >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26762.php >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/04/26766.php > -- Kind Regards, M.