Hi, where is the command ofed_info located? I searched from / but didn't find it.
Subhra. On Tue, Apr 21, 2015 at 10:43 PM, Mike Dubman <mi...@dev.mellanox.co.il> wrote: > cool, progress! > > >>1429676565.124664] sys.c:719 MXM WARN Conflicting CPU > frequencies detected, using: 2601.00 > > means that cpu governor on your machine is not on "performance" mode > > >> MXM ERROR ibv_query_device() returned 38: Function not implemented > > indicates that ofed installed on your nodes is not indeed 2.4.-1.0.0 or > there is a mismatch between ofed kernel drivers version and ofed userspace > libraries version. > or you have multiple ofed libraries installed on your node and use > incorrect one. > could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0? > > > > > > On Wed, Apr 22, 2015 at 7:59 AM, Subhra Mazumdar < > subhramazumd...@gmail.com> wrote: > >> Hi, >> >> I compiled the openmpi that comes inside the mellanox hpcx package with >> mxm support instead of separately downloaded openmpi. I also used the >> environment as in the README so that no LD_PRELOAD (except our own library >> which is unrelated) is needed. Now it runs fine (no segfault) but we get >> same errors as before (saying initialization of MXM library failed). Is it >> using MXM successfully? >> >> [root@JARVICE >> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# mpirun >> --allow-run-as-root --mca mtl mxm -n 1 /root/backend localhost : -x >> LD_PRELOAD=/root/libci.so -n 1 /root/app2 >> -------------------------------------------------------------------------- >> WARNING: a request was made to bind a process. While the system >> supports binding the process itself, at least one node does NOT >> support binding memory to the process location. >> >> Node: JARVICE >> >> This usually is due to not having the required NUMA support installed >> on the node. In some Linux distributions, the required support is >> contained in the libnumactl and libnumactl-devel packages. >> This is a warning only; your job will continue, though performance may be >> degraded. >> -------------------------------------------------------------------------- >> i am backend >> [1429676565.121218] sys.c:719 MXM WARN Conflicting CPU >> frequencies detected, using: 2601.00 >> [1429676565.122937] [JARVICE:14767:0] ib_dev.c:445 MXM WARN >> failed call to ibv_exp_use_priv_env(): Function not implemented >> [1429676565.122950] [JARVICE:14767:0] ib_dev.c:456 MXM ERROR >> ibv_query_device() returned 38: Function not implemented >> [1429676565.123535] [JARVICE:14767:0] ib_dev.c:445 MXM WARN >> failed call to ibv_exp_use_priv_env(): Function not implemented >> [1429676565.123543] [JARVICE:14767:0] ib_dev.c:456 MXM ERROR >> ibv_query_device() returned 38: Function not implemented >> [1429676565.124664] sys.c:719 MXM WARN Conflicting CPU >> frequencies detected, using: 2601.00 >> [1429676565.126264] [JARVICE:14768:0] ib_dev.c:445 MXM WARN >> failed call to ibv_exp_use_priv_env(): Function not implemented >> [1429676565.126276] [JARVICE:14768:0] ib_dev.c:456 MXM ERROR >> ibv_query_device() returned 38: Function not implemented >> [1429676565.126812] [JARVICE:14768:0] ib_dev.c:445 MXM WARN >> failed call to ibv_exp_use_priv_env(): Function not implemented >> [1429676565.126821] [JARVICE:14768:0] ib_dev.c:456 MXM ERROR >> ibv_query_device() returned 38: Function not implemented >> -------------------------------------------------------------------------- >> Initialization of MXM library failed. >> >> Error: Input/output error >> >> -------------------------------------------------------------------------- >> >> <application runs fine> >> >> >> Thanks, >> Subhra. >> >> >> On Sat, Apr 18, 2015 at 12:28 AM, Mike Dubman <mi...@dev.mellanox.co.il> >> wrote: >> >>> could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0? >>> why LD_PRELOAD needed in your command line? Can you try >>> >>> module load hpcx >>> mpirun -np $np test.exe >>> ? >>> >>> On Sat, Apr 18, 2015 at 8:39 AM, Subhra Mazumdar < >>> subhramazumd...@gmail.com> wrote: >>> >>>> I followed the instructions as in the README, now getting a different >>>> error: >>>> >>>> [root@JARVICE >>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# >>>> ../openmpi-1.8.4/openmpinstall/bin/mpirun --allow-run-as-root --mca mtl mxm >>>> -x LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 >>>> ./mxm/lib/libmxm.so.2" -n 1 ../backend localhost : -x >>>> LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 >>>> ./mxm/lib/libmxm.so.2 ../libci.so" -n 1 ../app2 >>>> >>>> >>>> -------------------------------------------------------------------------- >>>> >>>> WARNING: a request was made to bind a process. While the system >>>> >>>> supports binding the process itself, at least one node does NOT >>>> >>>> support binding memory to the process location. >>>> >>>> Node: JARVICE >>>> >>>> This usually is due to not having the required NUMA support installed >>>> >>>> on the node. In some Linux distributions, the required support is >>>> >>>> contained in the libnumactl and libnumactl-devel packages. >>>> >>>> This is a warning only; your job will continue, though performance may >>>> be degraded. >>>> >>>> >>>> -------------------------------------------------------------------------- >>>> >>>> i am backend >>>> >>>> [1429334876.139452] [JARVICE:449 :0] ib_dev.c:445 MXM WARN >>>> failed call to ibv_exp_use_priv_env(): Function not implemented >>>> >>>> [1429334876.139464] [JARVICE:449 :0] ib_dev.c:456 MXM ERROR >>>> ibv_query_device() returned 38: Function not implemented >>>> >>>> [1429334876.139982] [JARVICE:449 :0] ib_dev.c:445 MXM WARN >>>> failed call to ibv_exp_use_priv_env(): Function not implemented >>>> >>>> [1429334876.139990] [JARVICE:449 :0] ib_dev.c:456 MXM ERROR >>>> ibv_query_device() returned 38: Function not implemented >>>> >>>> [1429334876.142649] [JARVICE:450 :0] ib_dev.c:445 MXM WARN >>>> failed call to ibv_exp_use_priv_env(): Function not implemented >>>> >>>> [1429334876.142666] [JARVICE:450 :0] ib_dev.c:456 MXM ERROR >>>> ibv_query_device() returned 38: Function not implemented >>>> >>>> [1429334876.143235] [JARVICE:450 :0] ib_dev.c:445 MXM WARN >>>> failed call to ibv_exp_use_priv_env(): Function not implemented >>>> >>>> [1429334876.143243] [JARVICE:450 :0] ib_dev.c:456 MXM ERROR >>>> ibv_query_device() returned 38: Function not implemented >>>> >>>> >>>> -------------------------------------------------------------------------- >>>> >>>> Initialization of MXM library failed. >>>> >>>> Error: Input/output error >>>> >>>> >>>> -------------------------------------------------------------------------- >>>> >>>> [JARVICE:449 :0] Caught signal 11 (Segmentation fault) >>>> >>>> [JARVICE:450 :0] Caught signal 11 (Segmentation fault) >>>> >>>> ==== backtrace ==== >>>> >>>> 2 0x000000000005640c mxm_handle_error() >>>> >>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641 >>>> >>>> 3 0x000000000005657c mxm_error_signal_handler() >>>> >>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616 >>>> >>>> 4 0x00000000000329a0 killpg() ??:0 >>>> >>>> 5 0x000000000004812c _IO_vfprintf() ??:0 >>>> >>>> 6 0x000000000006f6da vasprintf() ??:0 >>>> >>>> 7 0x0000000000059b3b opal_show_help_vstring() ??:0 >>>> >>>> 8 0x0000000000026630 orte_show_help() ??:0 >>>> >>>> 9 0x0000000000001a3f mca_bml_r2_add_procs() >>>> >>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409 >>>> >>>> 10 0x0000000000004475 mca_pml_ob1_add_procs() >>>> >>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332 >>>> >>>> 11 0x00000000000442f3 ompi_mpi_init() ??:0 >>>> >>>> 12 0x0000000000067cb0 PMPI_Init_thread() ??:0 >>>> >>>> 13 0x000000000000d0ca l_getLocalFromConfig() >>>> /root/rain_ib/interposer/libciutils.c:83 >>>> >>>> 14 0x000000000000c7b4 __cudaRegisterFatBinary() >>>> /root/rain_ib/interposer/libci.c:4055 >>>> >>>> 15 0x0000000000402b59 >>>> _ZL70__sti____cudaRegisterAll_39_tmpxft_00000703_00000000_6_app2_cpp1_ii_hwv() >>>> tmpxft_00000703_00000000-3_app2.cudafe1.cpp:0 >>>> >>>> 16 0x0000000000402dd6 __do_global_ctors_aux() crtstuff.c:0 >>>> >>>> =================== >>>> >>>> ==== backtrace ==== >>>> >>>> 2 0x000000000005640c mxm_handle_error() >>>> >>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641 >>>> >>>> 3 0x000000000005657c mxm_error_signal_handler() >>>> >>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616 >>>> >>>> 4 0x00000000000329a0 killpg() ??:0 >>>> >>>> 5 0x000000000004812c _IO_vfprintf() ??:0 >>>> >>>> 6 0x000000000006f6da vasprintf() ??:0 >>>> >>>> 7 0x0000000000059b3b opal_show_help_vstring() ??:0 >>>> >>>> 8 0x0000000000026630 orte_show_help() ??:0 >>>> >>>> 9 0x0000000000001a3f mca_bml_r2_add_procs() >>>> >>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409 >>>> >>>> 10 0x0000000000004475 mca_pml_ob1_add_procs() >>>> >>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332 >>>> >>>> 11 0x00000000000442f3 ompi_mpi_init() ??:0 >>>> >>>> 12 0x0000000000067cb0 PMPI_Init_thread() ??:0 >>>> >>>> 13 0x0000000000404fdf main() /root/rain_ib/backend/backend.c:1237 >>>> >>>> 14 0x000000000001ed1d __libc_start_main() ??:0 >>>> >>>> 15 0x0000000000402db9 _start() ??:0 >>>> >>>> =================== >>>> >>>> >>>> -------------------------------------------------------------------------- >>>> >>>> mpirun noticed that process rank 1 with PID 450 on node JARVICE exited >>>> on signal 11 (Segmentation fault). >>>> >>>> >>>> -------------------------------------------------------------------------- >>>> >>>> [JARVICE:00447] 1 more process has sent help message help-mtl-mxm.txt / >>>> mxm init >>>> >>>> [JARVICE:00447] Set MCA parameter "orte_base_help_aggregate" to 0 to >>>> see all help / error messages >>>> >>>> [root@JARVICE hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# >>>> >>>> >>>> Subhra. >>>> >>>> >>>> On Mon, Apr 13, 2015 at 10:58 PM, Mike Dubman <mi...@dev.mellanox.co.il >>>> > wrote: >>>> >>>>> Have you followed installation steps from README (Also here for >>>>> reference http://bgate.mellanox.com/products/hpcx/README.txt) >>>>> >>>>> ... >>>>> >>>>> * Load OpenMPI/OpenSHMEM v1.8 based package: >>>>> >>>>> % source $HPCX_HOME/hpcx-init.sh >>>>> % hpcx_load >>>>> % env | grep HPCX >>>>> % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_usempi >>>>> % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem >>>>> % hpcx_unload >>>>> >>>>> 3. Load HPCX environment from modules >>>>> >>>>> * Load OpenMPI/OpenSHMEM based package: >>>>> >>>>> % module use $HPCX_HOME/modulefiles >>>>> % module load hpcx >>>>> % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_c >>>>> % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem >>>>> % module unload hpcx >>>>> >>>>> ... >>>>> >>>>> On Tue, Apr 14, 2015 at 5:42 AM, Subhra Mazumdar < >>>>> subhramazumd...@gmail.com> wrote: >>>>> >>>>>> I am using 2.4-1.0.0 mellanox ofed. >>>>>> >>>>>> I downloaded mofed tarball >>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5.tar and extracted >>>>>> it. It has mxm directory. >>>>>> >>>>>> hpcx-v1.2.0-325-[root@JARVICE ~]# ls >>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5 >>>>>> archive fca hpcx-init-ompi-mellanox-v1.8.sh ibprof >>>>>> modulefiles ompi-mellanox-v1.8 sources VERSION >>>>>> bupc-master hcoll hpcx-init.sh knem >>>>>> mxm README.txt utils >>>>>> >>>>>> I tried using LD_PRELOAD for libmxm, but getting a different error >>>>>> stack now as following >>>>>> >>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun >>>>>> --allow-run-as-root --mca mtl mxm -x >>>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 >>>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2" >>>>>> -n 1 ./backend localhost : -x >>>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 >>>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2 >>>>>> ./libci.so" -n 1 ./app2 >>>>>> i am backend >>>>>> [JARVICE:00564] mca: base: components_open: component pml / cm open >>>>>> function failed >>>>>> [JARVICE:564 :0] Caught signal 11 (Segmentation fault) >>>>>> [JARVICE:00565] mca: base: components_open: component pml / cm open >>>>>> function failed >>>>>> [JARVICE:565 :0] Caught signal 11 (Segmentation fault) >>>>>> ==== backtrace ==== >>>>>> 2 0x000000000005640c mxm_handle_error() >>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641 >>>>>> 3 0x000000000005657c mxm_error_signal_handler() >>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616 >>>>>> 4 0x00000000000329a0 killpg() ??:0 >>>>>> 5 0x0000000000045491 mca_base_components_close() ??:0 >>>>>> 6 0x000000000004e99a mca_base_framework_close() ??:0 >>>>>> 7 0x0000000000045431 mca_base_component_close() ??:0 >>>>>> 8 0x000000000004515c mca_base_framework_components_open() ??:0 >>>>>> 9 0x00000000000a0de9 mca_pml_base_open() pml_base_frame.c:0 >>>>>> 10 0x000000000004eb1c mca_base_framework_open() ??:0 >>>>>> 11 0x0000000000043eb3 ompi_mpi_init() ??:0 >>>>>> 12 0x0000000000067cb0 PMPI_Init_thread() ??:0 >>>>>> 13 0x0000000000404fdf main() /root/rain_ib/backend/backend.c:1237 >>>>>> 14 0x000000000001ed1d __libc_start_main() ??:0 >>>>>> 15 0x0000000000402db9 _start() ??:0 >>>>>> =================== >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> A requested component was not found, or was unable to be opened. This >>>>>> means that this component is either not installed or is unable to be >>>>>> used on your system (e.g., sometimes this means that shared libraries >>>>>> that the component requires are unable to be found/loaded). Note that >>>>>> Open MPI stopped checking at the first component that it did not find. >>>>>> >>>>>> Host: JARVICE >>>>>> Framework: mtl >>>>>> Component: mxm >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> mpirun noticed that process rank 0 with PID 564 on node JARVICE >>>>>> exited on signal 11 (Segmentation fault). >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> [JARVICE:00562] 1 more process has sent help message >>>>>> help-mca-base.txt / find-available:not-valid >>>>>> [JARVICE:00562] Set MCA parameter "orte_base_help_aggregate" to 0 to >>>>>> see all help / error messages >>>>>> >>>>>> >>>>>> Subhra >>>>>> >>>>>> >>>>>> On Sun, Apr 12, 2015 at 10:48 PM, Mike Dubman < >>>>>> mi...@dev.mellanox.co.il> wrote: >>>>>> >>>>>>> seems like mxm was not found in your ld_library_path. >>>>>>> >>>>>>> what mofed version do you use? >>>>>>> does it have /opt/mellanox/mxm in it? >>>>>>> You could just run mpirun from HPCX package which looks for mxm >>>>>>> internally and recompile ompi as mentioned in README. >>>>>>> >>>>>>> On Mon, Apr 13, 2015 at 3:24 AM, Subhra Mazumdar < >>>>>>> subhramazumd...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I used mxm mtl as follows but getting segfault. It says mxm >>>>>>>> component not found but I have compiled openmpi with mxm. Any idea >>>>>>>> what I >>>>>>>> might be missing? >>>>>>>> >>>>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun >>>>>>>> --allow-run-as-root --mca pml cm --mca mtl mxm -n 1 -x >>>>>>>> LD_PRELOAD=./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 ./backend >>>>>>>> localhosst : -n 1 -x LD_PRELOAD="./libci.so >>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1" ./app2 >>>>>>>> i am backend >>>>>>>> [JARVICE:08398] *** Process received signal *** >>>>>>>> [JARVICE:08398] Signal: Segmentation fault (11) >>>>>>>> [JARVICE:08398] Signal code: Address not mapped (1) >>>>>>>> [JARVICE:08398] Failing at address: 0x10 >>>>>>>> [JARVICE:08398] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7ff8d0ddb710] >>>>>>>> [JARVICE:08398] [ 1] >>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_components_close+0x21)[0x7ff8cf9ae491] >>>>>>>> [JARVICE:08398] [ 2] >>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_close+0x6a)[0x7ff8cf9b799a] >>>>>>>> [JARVICE:08398] [ 3] >>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_component_close+0x21)[0x7ff8cf9ae431] >>>>>>>> [JARVICE:08398] [ 4] >>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_components_open+0x11c)[0x7ff8cf9ae15c] >>>>>>>> [JARVICE:08398] [ 5] >>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(+0xa0de9)[0x7ff8d1089de9] >>>>>>>> [JARVICE:08398] [ 6] >>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7ff8cf9b7b1c] >>>>>>>> [JARVICE:08398] [ 7] [JARVICE:08398] mca: base: components_open: >>>>>>>> component pml / cm open function failed >>>>>>>> >>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(ompi_mpi_init+0x4b3)[0x7ff8d102ceb3] >>>>>>>> [JARVICE:08398] [ 8] >>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(PMPI_Init_thread+0x100)[0x7ff8d1050cb0] >>>>>>>> [JARVICE:08398] [ 9] ./backend[0x404fdf] >>>>>>>> [JARVICE:08398] [10] >>>>>>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff8cfeded1d] >>>>>>>> [JARVICE:08398] [11] ./backend[0x402db9] >>>>>>>> [JARVICE:08398] *** End of error message *** >>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> A requested component was not found, or was unable to be opened. >>>>>>>> This >>>>>>>> means that this component is either not installed or is unable to be >>>>>>>> used on your system (e.g., sometimes this means that shared >>>>>>>> libraries >>>>>>>> that the component requires are unable to be found/loaded). Note >>>>>>>> that >>>>>>>> Open MPI stopped checking at the first component that it did not >>>>>>>> find. >>>>>>>> >>>>>>>> Host: JARVICE >>>>>>>> Framework: mtl >>>>>>>> Component: mxm >>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> mpirun noticed that process rank 0 with PID 8398 on node JARVICE >>>>>>>> exited on signal 11 (Segmentation fault). >>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> >>>>>>>> >>>>>>>> Subhra. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 10, 2015 at 12:12 AM, Mike Dubman < >>>>>>>> mi...@dev.mellanox.co.il> wrote: >>>>>>>> >>>>>>>>> no need IPoIB, mxm uses native IB. >>>>>>>>> >>>>>>>>> Please see HPCX (pre-compiled ompi, integrated with MXM and FCA) >>>>>>>>> README file for details how to compile/select. >>>>>>>>> >>>>>>>>> The default transport is UD for internode communication and >>>>>>>>> shared-memory for intra-node. >>>>>>>>> >>>>>>>>> http://bgate,mellanox.com/products/hpcx/ >>>>>>>>> >>>>>>>>> Also, mxm included in the Mellanox OFED. >>>>>>>>> >>>>>>>>> On Fri, Apr 10, 2015 at 5:26 AM, Subhra Mazumdar < >>>>>>>>> subhramazumd...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Does ipoib need to be configured on the ib cards for mxm (I have >>>>>>>>>> a separate ethernet connection too)? Also are there special flags in >>>>>>>>>> mpirun >>>>>>>>>> to select from UD/RC/DC? What is the default? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Subhra. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Mar 31, 2015 at 9:46 AM, Mike Dubman < >>>>>>>>>> mi...@dev.mellanox.co.il> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> mxm uses IB rdma/roce technologies. Once can select UD/RC/DC >>>>>>>>>>> transports to be used in mxm. >>>>>>>>>>> >>>>>>>>>>> By selecting mxm, all MPI p2p routines will be mapped to >>>>>>>>>>> appropriate mxm functions. >>>>>>>>>>> >>>>>>>>>>> M >>>>>>>>>>> >>>>>>>>>>> On Mon, Mar 30, 2015 at 7:32 PM, Subhra Mazumdar < >>>>>>>>>>> subhramazumd...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi MIke, >>>>>>>>>>>> >>>>>>>>>>>> Does the mxm mtl use infiniband rdma? Also from programming >>>>>>>>>>>> perspective, do I need to use anything else other than >>>>>>>>>>>> MPI_Send/MPI_Recv? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Subhra. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Mar 29, 2015 at 11:14 PM, Mike Dubman < >>>>>>>>>>>> mi...@dev.mellanox.co.il> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> openib btl does not support this thread model. >>>>>>>>>>>>> You can use OMPI w/ mxm (-mca mtl mxm) and multiple thread >>>>>>>>>>>>> mode lin 1.8 x series or (-mca pml yalla) in the master branch. >>>>>>>>>>>>> >>>>>>>>>>>>> M >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Mar 30, 2015 at 9:09 AM, Subhra Mazumdar < >>>>>>>>>>>>> subhramazumd...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can MPI_THREAD_MULTIPLE and openib btl work together in open >>>>>>>>>>>>>> mpi 1.8.4? If so are there any command line options needed >>>>>>>>>>>>>> during run time? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Subhra. >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> Subscription: >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> Link to this post: >>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26574.php >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> Kind Regards, >>>>>>>>>>>>> >>>>>>>>>>>>> M. >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> Subscription: >>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> Link to this post: >>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26575.php >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> Subscription: >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>> Link to this post: >>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26580.php >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> Kind Regards, >>>>>>>>>>> >>>>>>>>>>> M. >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> Link to this post: >>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26584.php >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> Link to this post: >>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26663.php >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Kind Regards, >>>>>>>>> >>>>>>>>> M. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this post: >>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26665.php >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26686.php >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Kind Regards, >>>>>>> >>>>>>> M. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26688.php >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26711.php >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Kind Regards, >>>>> >>>>> M. >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/04/26712.php >>>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26752.php >>>> >>> >>> >>> >>> -- >>> >>> Kind Regards, >>> >>> M. >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/04/26754.php >>> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26761.php >> > > > > -- > > Kind Regards, > > M. > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/04/26762.php >