/usr/bin/ofed_info

So, the OFED on your system is not MellanoxOFED 2.4.x but smth else.

try #rpm -qi libibverbs


On Thu, Apr 23, 2015 at 7:47 AM, Subhra Mazumdar <subhramazumd...@gmail.com>
wrote:

> Hi,
>
> where is the command ofed_info located? I searched from / but didn't find
> it.
>
> Subhra.
>
> On Tue, Apr 21, 2015 at 10:43 PM, Mike Dubman <mi...@dev.mellanox.co.il>
> wrote:
>
>> cool, progress!
>>
>> >>1429676565.124664]         sys.c:719  MXM  WARN  Conflicting CPU
>> frequencies detected, using: 2601.00
>>
>> means that cpu governor on your machine is not on "performance" mode
>>
>> >> MXM  ERROR ibv_query_device() returned 38: Function not implemented
>>
>> indicates that ofed installed on your nodes is not indeed 2.4.-1.0.0 or
>> there is a mismatch between ofed kernel drivers version and ofed userspace
>> libraries version.
>> or you have multiple ofed libraries installed on your node and use
>> incorrect one.
>> could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0?
>>
>>
>>
>>
>>
>> On Wed, Apr 22, 2015 at 7:59 AM, Subhra Mazumdar <
>> subhramazumd...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I compiled the openmpi that comes inside the mellanox hpcx package with
>>> mxm support instead of separately downloaded openmpi. I also used the
>>> environment as in the README so that no LD_PRELOAD (except our own library
>>> which is unrelated) is needed. Now it runs fine (no segfault) but we get
>>> same errors as before (saying initialization of MXM library failed). Is it
>>> using MXM successfully?
>>>
>>> [root@JARVICE
>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# mpirun
>>> --allow-run-as-root  --mca mtl mxm -n 1 /root/backend  localhost : -x
>>> LD_PRELOAD=/root/libci.so -n 1 /root/app2
>>>
>>> --------------------------------------------------------------------------
>>> WARNING: a request was made to bind a process. While the system
>>> supports binding the process itself, at least one node does NOT
>>> support binding memory to the process location.
>>>
>>>   Node:  JARVICE
>>>
>>> This usually is due to not having the required NUMA support installed
>>> on the node. In some Linux distributions, the required support is
>>> contained in the libnumactl and libnumactl-devel packages.
>>> This is a warning only; your job will continue, though performance may
>>> be degraded.
>>>
>>> --------------------------------------------------------------------------
>>>  i am backend
>>> [1429676565.121218]         sys.c:719  MXM  WARN  Conflicting CPU
>>> frequencies detected, using: 2601.00
>>> [1429676565.122937] [JARVICE:14767:0]      ib_dev.c:445  MXM  WARN
>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>> [1429676565.122950] [JARVICE:14767:0]      ib_dev.c:456  MXM  ERROR
>>> ibv_query_device() returned 38: Function not implemented
>>> [1429676565.123535] [JARVICE:14767:0]      ib_dev.c:445  MXM  WARN
>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>> [1429676565.123543] [JARVICE:14767:0]      ib_dev.c:456  MXM  ERROR
>>> ibv_query_device() returned 38: Function not implemented
>>> [1429676565.124664]         sys.c:719  MXM  WARN  Conflicting CPU
>>> frequencies detected, using: 2601.00
>>> [1429676565.126264] [JARVICE:14768:0]      ib_dev.c:445  MXM  WARN
>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>> [1429676565.126276] [JARVICE:14768:0]      ib_dev.c:456  MXM  ERROR
>>> ibv_query_device() returned 38: Function not implemented
>>> [1429676565.126812] [JARVICE:14768:0]      ib_dev.c:445  MXM  WARN
>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>> [1429676565.126821] [JARVICE:14768:0]      ib_dev.c:456  MXM  ERROR
>>> ibv_query_device() returned 38: Function not implemented
>>>
>>> --------------------------------------------------------------------------
>>> Initialization of MXM library failed.
>>>
>>>   Error: Input/output error
>>>
>>>
>>> --------------------------------------------------------------------------
>>>
>>> <application runs fine>
>>>
>>>
>>> Thanks,
>>> Subhra.
>>>
>>>
>>> On Sat, Apr 18, 2015 at 12:28 AM, Mike Dubman <mi...@dev.mellanox.co.il>
>>> wrote:
>>>
>>>> could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0?
>>>> why LD_PRELOAD needed in your command line? Can you try
>>>>
>>>> module load hpcx
>>>> mpirun -np $np test.exe
>>>> ?
>>>>
>>>> On Sat, Apr 18, 2015 at 8:39 AM, Subhra Mazumdar <
>>>> subhramazumd...@gmail.com> wrote:
>>>>
>>>>> I followed the instructions as in the README, now getting a different
>>>>> error:
>>>>>
>>>>> [root@JARVICE
>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]#
>>>>> ../openmpi-1.8.4/openmpinstall/bin/mpirun --allow-run-as-root --mca mtl 
>>>>> mxm
>>>>> -x LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>> ./mxm/lib/libmxm.so.2" -n 1 ../backend localhost : -x
>>>>> LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>> ./mxm/lib/libmxm.so.2 ../libci.so" -n 1 ../app2
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> WARNING: a request was made to bind a process. While the system
>>>>>
>>>>> supports binding the process itself, at least one node does NOT
>>>>>
>>>>> support binding memory to the process location.
>>>>>
>>>>>  Node:  JARVICE
>>>>>
>>>>> This usually is due to not having the required NUMA support installed
>>>>>
>>>>> on the node. In some Linux distributions, the required support is
>>>>>
>>>>> contained in the libnumactl and libnumactl-devel packages.
>>>>>
>>>>> This is a warning only; your job will continue, though performance may
>>>>> be degraded.
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> i am backend
>>>>>
>>>>> [1429334876.139452] [JARVICE:449  :0]   ib_dev.c:445  MXM  WARN
>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>>
>>>>> [1429334876.139464] [JARVICE:449  :0]   ib_dev.c:456  MXM  ERROR
>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>
>>>>> [1429334876.139982] [JARVICE:449  :0]   ib_dev.c:445  MXM  WARN
>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>>
>>>>> [1429334876.139990] [JARVICE:449  :0]   ib_dev.c:456  MXM  ERROR
>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>
>>>>> [1429334876.142649] [JARVICE:450  :0]   ib_dev.c:445  MXM  WARN
>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>>
>>>>> [1429334876.142666] [JARVICE:450  :0]   ib_dev.c:456  MXM  ERROR
>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>
>>>>> [1429334876.143235] [JARVICE:450  :0]   ib_dev.c:445  MXM  WARN
>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>>
>>>>> [1429334876.143243] [JARVICE:450  :0]   ib_dev.c:456  MXM  ERROR
>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> Initialization of MXM library failed.
>>>>>
>>>>>  Error: Input/output error
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> [JARVICE:449  :0] Caught signal 11 (Segmentation fault)
>>>>>
>>>>> [JARVICE:450  :0] Caught signal 11 (Segmentation fault)
>>>>>
>>>>> ==== backtrace ====
>>>>>
>>>>> 2 0x000000000005640c mxm_handle_error()
>>>>>  
>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641
>>>>>
>>>>> 3 0x000000000005657c mxm_error_signal_handler()
>>>>>  
>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616
>>>>>
>>>>> 4 0x00000000000329a0 killpg()  ??:0
>>>>>
>>>>> 5 0x000000000004812c _IO_vfprintf()  ??:0
>>>>>
>>>>> 6 0x000000000006f6da vasprintf()  ??:0
>>>>>
>>>>> 7 0x0000000000059b3b opal_show_help_vstring()  ??:0
>>>>>
>>>>> 8 0x0000000000026630 orte_show_help()  ??:0
>>>>>
>>>>> 9 0x0000000000001a3f mca_bml_r2_add_procs()
>>>>>  
>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409
>>>>>
>>>>> 10 0x0000000000004475 mca_pml_ob1_add_procs()
>>>>>  
>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332
>>>>>
>>>>> 11 0x00000000000442f3 ompi_mpi_init()  ??:0
>>>>>
>>>>> 12 0x0000000000067cb0 PMPI_Init_thread()  ??:0
>>>>>
>>>>> 13 0x000000000000d0ca l_getLocalFromConfig()
>>>>>  /root/rain_ib/interposer/libciutils.c:83
>>>>>
>>>>> 14 0x000000000000c7b4 __cudaRegisterFatBinary()
>>>>>  /root/rain_ib/interposer/libci.c:4055
>>>>>
>>>>> 15 0x0000000000402b59
>>>>> _ZL70__sti____cudaRegisterAll_39_tmpxft_00000703_00000000_6_app2_cpp1_ii_hwv()
>>>>>  tmpxft_00000703_00000000-3_app2.cudafe1.cpp:0
>>>>>
>>>>> 16 0x0000000000402dd6 __do_global_ctors_aux()  crtstuff.c:0
>>>>>
>>>>> ===================
>>>>>
>>>>> ==== backtrace ====
>>>>>
>>>>> 2 0x000000000005640c mxm_handle_error()
>>>>>  
>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641
>>>>>
>>>>> 3 0x000000000005657c mxm_error_signal_handler()
>>>>>  
>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616
>>>>>
>>>>> 4 0x00000000000329a0 killpg()  ??:0
>>>>>
>>>>> 5 0x000000000004812c _IO_vfprintf()  ??:0
>>>>>
>>>>> 6 0x000000000006f6da vasprintf()  ??:0
>>>>>
>>>>> 7 0x0000000000059b3b opal_show_help_vstring()  ??:0
>>>>>
>>>>> 8 0x0000000000026630 orte_show_help()  ??:0
>>>>>
>>>>> 9 0x0000000000001a3f mca_bml_r2_add_procs()
>>>>>  
>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409
>>>>>
>>>>> 10 0x0000000000004475 mca_pml_ob1_add_procs()
>>>>>  
>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332
>>>>>
>>>>> 11 0x00000000000442f3 ompi_mpi_init()  ??:0
>>>>>
>>>>> 12 0x0000000000067cb0 PMPI_Init_thread()  ??:0
>>>>>
>>>>> 13 0x0000000000404fdf main()  /root/rain_ib/backend/backend.c:1237
>>>>>
>>>>> 14 0x000000000001ed1d __libc_start_main()  ??:0
>>>>>
>>>>> 15 0x0000000000402db9 _start()  ??:0
>>>>>
>>>>> ===================
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> mpirun noticed that process rank 1 with PID 450 on node JARVICE exited
>>>>> on signal 11 (Segmentation fault).
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> [JARVICE:00447] 1 more process has sent help message help-mtl-mxm.txt
>>>>> / mxm init
>>>>>
>>>>> [JARVICE:00447] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>>>> see all help / error messages
>>>>>
>>>>> [root@JARVICE
>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]#
>>>>>
>>>>>
>>>>> Subhra.
>>>>>
>>>>>
>>>>> On Mon, Apr 13, 2015 at 10:58 PM, Mike Dubman <
>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>
>>>>>> Have you followed installation steps from README (Also here for
>>>>>> reference http://bgate.mellanox.com/products/hpcx/README.txt)
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> * Load OpenMPI/OpenSHMEM v1.8 based package:
>>>>>>
>>>>>>     % source $HPCX_HOME/hpcx-init.sh
>>>>>>     % hpcx_load
>>>>>>     % env | grep HPCX
>>>>>>     % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_usempi
>>>>>>     % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem
>>>>>>     % hpcx_unload
>>>>>>
>>>>>> 3. Load HPCX environment from modules
>>>>>>
>>>>>> * Load OpenMPI/OpenSHMEM based package:
>>>>>>
>>>>>>     % module use $HPCX_HOME/modulefiles
>>>>>>     % module load hpcx
>>>>>>     % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_c
>>>>>>     % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem
>>>>>>     % module unload hpcx
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> On Tue, Apr 14, 2015 at 5:42 AM, Subhra Mazumdar <
>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>
>>>>>>> I am using 2.4-1.0.0 mellanox ofed.
>>>>>>>
>>>>>>> I downloaded mofed tarball
>>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5.tar and 
>>>>>>> extracted
>>>>>>> it. It has mxm directory.
>>>>>>>
>>>>>>> hpcx-v1.2.0-325-[root@JARVICE ~]# ls
>>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5
>>>>>>> archive      fca    hpcx-init-ompi-mellanox-v1.8.sh  ibprof
>>>>>>> modulefiles  ompi-mellanox-v1.8  sources  VERSION
>>>>>>> bupc-master  hcoll  hpcx-init.sh                     knem
>>>>>>> mxm          README.txt          utils
>>>>>>>
>>>>>>> I tried using LD_PRELOAD for libmxm, but getting a different error
>>>>>>> stack now as following
>>>>>>>
>>>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun
>>>>>>> --allow-run-as-root --mca mtl mxm -x
>>>>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2"
>>>>>>> -n 1 ./backend  localhost : -x
>>>>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2
>>>>>>> ./libci.so" -n 1 ./app2
>>>>>>>  i am backend
>>>>>>> [JARVICE:00564] mca: base: components_open: component pml / cm open
>>>>>>> function failed
>>>>>>> [JARVICE:564  :0] Caught signal 11 (Segmentation fault)
>>>>>>> [JARVICE:00565] mca: base: components_open: component pml / cm open
>>>>>>> function failed
>>>>>>> [JARVICE:565  :0] Caught signal 11 (Segmentation fault)
>>>>>>> ==== backtrace ====
>>>>>>>  2 0x000000000005640c mxm_handle_error()
>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641
>>>>>>>  3 0x000000000005657c mxm_error_signal_handler()
>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616
>>>>>>>  4 0x00000000000329a0 killpg()  ??:0
>>>>>>>  5 0x0000000000045491 mca_base_components_close()  ??:0
>>>>>>>  6 0x000000000004e99a mca_base_framework_close()  ??:0
>>>>>>>  7 0x0000000000045431 mca_base_component_close()  ??:0
>>>>>>>  8 0x000000000004515c mca_base_framework_components_open()  ??:0
>>>>>>>  9 0x00000000000a0de9 mca_pml_base_open()  pml_base_frame.c:0
>>>>>>> 10 0x000000000004eb1c mca_base_framework_open()  ??:0
>>>>>>> 11 0x0000000000043eb3 ompi_mpi_init()  ??:0
>>>>>>> 12 0x0000000000067cb0 PMPI_Init_thread()  ??:0
>>>>>>> 13 0x0000000000404fdf main()  /root/rain_ib/backend/backend.c:1237
>>>>>>> 14 0x000000000001ed1d __libc_start_main()  ??:0
>>>>>>> 15 0x0000000000402db9 _start()  ??:0
>>>>>>> ===================
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>> A requested component was not found, or was unable to be opened.
>>>>>>> This
>>>>>>> means that this component is either not installed or is unable to be
>>>>>>> used on your system (e.g., sometimes this means that shared libraries
>>>>>>> that the component requires are unable to be found/loaded).  Note
>>>>>>> that
>>>>>>> Open MPI stopped checking at the first component that it did not
>>>>>>> find.
>>>>>>>
>>>>>>> Host:      JARVICE
>>>>>>> Framework: mtl
>>>>>>> Component: mxm
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>> mpirun noticed that process rank 0 with PID 564 on node JARVICE
>>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>> [JARVICE:00562] 1 more process has sent help message
>>>>>>> help-mca-base.txt / find-available:not-valid
>>>>>>> [JARVICE:00562] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>>>>>> see all help / error messages
>>>>>>>
>>>>>>>
>>>>>>> Subhra
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Apr 12, 2015 at 10:48 PM, Mike Dubman <
>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>
>>>>>>>> seems like mxm was not found in your ld_library_path.
>>>>>>>>
>>>>>>>> what mofed version do you use?
>>>>>>>> does it have /opt/mellanox/mxm in it?
>>>>>>>> You could just run mpirun from HPCX package which looks for mxm
>>>>>>>> internally and recompile ompi as mentioned in README.
>>>>>>>>
>>>>>>>> On Mon, Apr 13, 2015 at 3:24 AM, Subhra Mazumdar <
>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I used mxm mtl as follows but getting segfault. It says mxm
>>>>>>>>> component not found but I have compiled openmpi with mxm. Any idea 
>>>>>>>>> what I
>>>>>>>>> might be missing?
>>>>>>>>>
>>>>>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun
>>>>>>>>> --allow-run-as-root --mca pml cm --mca mtl mxm -n 1 -x
>>>>>>>>> LD_PRELOAD=./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 ./backend
>>>>>>>>> localhosst : -n 1 -x LD_PRELOAD="./libci.so
>>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1" ./app2
>>>>>>>>>  i am backend
>>>>>>>>> [JARVICE:08398] *** Process received signal ***
>>>>>>>>> [JARVICE:08398] Signal: Segmentation fault (11)
>>>>>>>>> [JARVICE:08398] Signal code: Address not mapped (1)
>>>>>>>>> [JARVICE:08398] Failing at address: 0x10
>>>>>>>>> [JARVICE:08398] [ 0]
>>>>>>>>> /lib64/libpthread.so.0(+0xf710)[0x7ff8d0ddb710]
>>>>>>>>> [JARVICE:08398] [ 1]
>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_components_close+0x21)[0x7ff8cf9ae491]
>>>>>>>>> [JARVICE:08398] [ 2]
>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_close+0x6a)[0x7ff8cf9b799a]
>>>>>>>>> [JARVICE:08398] [ 3]
>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_component_close+0x21)[0x7ff8cf9ae431]
>>>>>>>>> [JARVICE:08398] [ 4]
>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_components_open+0x11c)[0x7ff8cf9ae15c]
>>>>>>>>> [JARVICE:08398] [ 5]
>>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(+0xa0de9)[0x7ff8d1089de9]
>>>>>>>>> [JARVICE:08398] [ 6]
>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7ff8cf9b7b1c]
>>>>>>>>> [JARVICE:08398] [ 7] [JARVICE:08398] mca: base: components_open:
>>>>>>>>> component pml / cm open function failed
>>>>>>>>>
>>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(ompi_mpi_init+0x4b3)[0x7ff8d102ceb3]
>>>>>>>>> [JARVICE:08398] [ 8]
>>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(PMPI_Init_thread+0x100)[0x7ff8d1050cb0]
>>>>>>>>> [JARVICE:08398] [ 9] ./backend[0x404fdf]
>>>>>>>>> [JARVICE:08398] [10]
>>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff8cfeded1d]
>>>>>>>>> [JARVICE:08398] [11] ./backend[0x402db9]
>>>>>>>>> [JARVICE:08398] *** End of error message ***
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> A requested component was not found, or was unable to be opened.
>>>>>>>>> This
>>>>>>>>> means that this component is either not installed or is unable to
>>>>>>>>> be
>>>>>>>>> used on your system (e.g., sometimes this means that shared
>>>>>>>>> libraries
>>>>>>>>> that the component requires are unable to be found/loaded).  Note
>>>>>>>>> that
>>>>>>>>> Open MPI stopped checking at the first component that it did not
>>>>>>>>> find.
>>>>>>>>>
>>>>>>>>> Host:      JARVICE
>>>>>>>>> Framework: mtl
>>>>>>>>> Component: mxm
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> mpirun noticed that process rank 0 with PID 8398 on node JARVICE
>>>>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Subhra.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Apr 10, 2015 at 12:12 AM, Mike Dubman <
>>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>>
>>>>>>>>>> no need IPoIB, mxm uses native IB.
>>>>>>>>>>
>>>>>>>>>> Please see HPCX (pre-compiled ompi, integrated with MXM and FCA)
>>>>>>>>>> README file for details how to compile/select.
>>>>>>>>>>
>>>>>>>>>> The default transport is UD for internode communication and
>>>>>>>>>> shared-memory for intra-node.
>>>>>>>>>>
>>>>>>>>>> http://bgate,mellanox.com/products/hpcx/
>>>>>>>>>>
>>>>>>>>>> Also, mxm included in the Mellanox OFED.
>>>>>>>>>>
>>>>>>>>>> On Fri, Apr 10, 2015 at 5:26 AM, Subhra Mazumdar <
>>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Does ipoib need to be configured on the ib cards for mxm (I have
>>>>>>>>>>> a separate ethernet connection too)? Also are there special flags 
>>>>>>>>>>> in mpirun
>>>>>>>>>>> to select from UD/RC/DC? What is the default?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Subhra.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 31, 2015 at 9:46 AM, Mike Dubman <
>>>>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> mxm uses IB rdma/roce technologies. Once can select UD/RC/DC
>>>>>>>>>>>> transports to be used in mxm.
>>>>>>>>>>>>
>>>>>>>>>>>> By selecting mxm, all MPI p2p routines will be mapped to
>>>>>>>>>>>> appropriate mxm functions.
>>>>>>>>>>>>
>>>>>>>>>>>> M
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Mar 30, 2015 at 7:32 PM, Subhra Mazumdar <
>>>>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi MIke,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does the mxm mtl use infiniband rdma? Also from programming
>>>>>>>>>>>>> perspective, do I need to use anything else other than 
>>>>>>>>>>>>> MPI_Send/MPI_Recv?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Subhra.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Mar 29, 2015 at 11:14 PM, Mike Dubman <
>>>>>>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> openib btl does not support this thread model.
>>>>>>>>>>>>>> You can use OMPI w/ mxm (-mca mtl mxm) and multiple thread
>>>>>>>>>>>>>> mode lin 1.8 x series or (-mca pml yalla) in the master branch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> M
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Mar 30, 2015 at 9:09 AM, Subhra Mazumdar <
>>>>>>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can MPI_THREAD_MULTIPLE and openib btl work together in
>>>>>>>>>>>>>>> open mpi 1.8.4? If so are there any command line options needed 
>>>>>>>>>>>>>>> during run
>>>>>>>>>>>>>>> time?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Subhra.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26574.php
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> M.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26575.php
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26580.php
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> M.
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> Subscription:
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26584.php
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> Link to this post:
>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26663.php
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Kind Regards,
>>>>>>>>>>
>>>>>>>>>> M.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> Link to this post:
>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26665.php
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> Link to this post:
>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26686.php
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Kind Regards,
>>>>>>>>
>>>>>>>> M.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post:
>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26688.php
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26711.php
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Kind Regards,
>>>>>>
>>>>>> M.
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26712.php
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26752.php
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Kind Regards,
>>>>
>>>> M.
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/04/26754.php
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/04/26761.php
>>>
>>
>>
>>
>> --
>>
>> Kind Regards,
>>
>> M.
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/04/26762.php
>>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/04/26766.php
>



-- 

Kind Regards,

M.

Reply via email to