The rdma error sounds like something isn’t right with your machine’s Infiniband 
installation.

The cross-version problem sounds like you installed both OMPI versions into the 
same location - did you do that?? If so, then that might be the root cause of 
both problems. You need to install them in totally different locations. Then 
you need to _prefix_ your PATH and LD_LIBRARY_PATH with the location of the 
version you want to use.

HTH
Ralph

> On Aug 19, 2016, at 12:53 AM, Juan A. Cordero Varelaq 
> <bioinformatica-i...@us.es> wrote:
> 
> Dear users,
> 
> I am totally stuck using openmpi. I have two versions on my machine: 1.8.1 
> and 2.0.0, and none of them work. When use the mpirun 1.8.1 version, I get 
> the following error:
> 
> librdmacm: Fatal: unable to open RDMA device
> librdmacm: Fatal: unable to open RDMA device
> librdmacm: Fatal: unable to open RDMA device
> librdmacm: Fatal: unable to open RDMA device
> librdmacm: Fatal: unable to open RDMA device
> --------------------------------------------------------------------------
> Open MPI failed to open the /dev/knem device due to a local error.
> Please check with your system administrator to get the problem fixed,
> or set the btl_sm_use_knem MCA parameter to 0 to run without /dev/knem
> support.
> 
>   Local host: MYMACHINE
>   Errno:      2 (No such file or directory)
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Open MPI failed to open an OpenFabrics device.  This is an unusual
> error; the system reported the OpenFabrics device as being present,
> but then later failed to access it successfully.  This usually
> indicates either a misconfiguration or a failed OpenFabrics hardware
> device.
> 
> All OpenFabrics support has been disabled in this MPI process; your
> job may or may not continue.
> 
>   Hostname:    MYMACHINE
>   Device name: mlx4_0
>   Errror (22): Invalid argument
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> [[60527,1],4]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: usNIC
>   Host: MYMACHINE
> 
> When I use the 2.0.0 version, I get something strange, it seems openmpi-2.0.0 
> looks for openmpi-1.8.1 libraries?:
> 
> A requested component was not found, or was unable to be opened.  This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded).  Note that
> Open MPI stopped checking at the first component that it did not find.
> 
> Host:      MYMACHINE
> Framework: ess
> Component: pmi
> --------------------------------------------------------------------------
> [MYMACHINE:126820] *** Process received signal ***
> [MYMACHINE:126820] Signal: Segmentation fault (11)
> [MYMACHINE:126820] Signal code: Address not mapped (1)
> [MYMACHINE:126820] Failing at address: 0x1c0
> [MYMACHINE:126820] [ 0] 
> /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f39b2ec4cb0]
> [MYMACHINE:126820] [ 1] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7f39b23e7430]
> [MYMACHINE:126820] [ 2] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7f39b2676a57]
> [MYMACHINE:126820] [ 3] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7f39b2676fb7]
> [MYMACHINE:126820] [ 4] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7f39b267718f]
> [MYMACHINE:126820] [ 5] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7f39b23c5f2a]
> [MYMACHINE:126820] [ 6] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7f39b23c70c3]
> [MYMACHINE:126820] [ 7] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7f39b23c8278]
> [MYMACHINE:126820] [ 8] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7f39b23d1e6c]
> [MYMACHINE:126820] [ 9] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7f39b2666e21]
> [MYMACHINE:126820] [10] 
> /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7f39b3115c92]
> [MYMACHINE:126820] [11] 
> /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7f39b31387bb]
> [MYMACHINE:126820] [12] mb[0x402024]
> [MYMACHINE:126820] [13] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f39b2b187ed]
> [MYMACHINE:126820] [14] mb[0x402111]
> [MYMACHINE:126820] *** End of error message ***
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened.  This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded).  Note that
> Open MPI stopped checking at the first component that it did not find.
> 
> Host:      MYMACHINE
> Framework: ess
> Component: pmi
> --------------------------------------------------------------------------
> [MYMACHINE:126821] *** Process received signal ***
> [MYMACHINE:126821] Signal: Segmentation fault (11)
> [MYMACHINE:126821] Signal code: Address not mapped (1)
> [MYMACHINE:126821] Failing at address: 0x1c0
> [MYMACHINE:126821] [ 0] 
> /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fed834bbcb0]
> [MYMACHINE:126821] [ 1] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7fed829de430]
> [MYMACHINE:126821] [ 2] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7fed82c6da57]
> [MYMACHINE:126821] [ 3] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7fed82c6dfb7]
> [MYMACHINE:126821] [ 4] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7fed82c6e18f]
> [MYMACHINE:126821] [ 5] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7fed829bcf2a]
> [MYMACHINE:126821] [ 6] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7fed829be0c3]
> [MYMACHINE:126821] [ 7] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7fed829bf278]
> [MYMACHINE:126821] [ 8] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7fed829c8e6c]
> [MYMACHINE:126821] [ 9] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7fed82c5de21]
> [MYMACHINE:126821] [10] 
> /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7fed8370cc92]
> [MYMACHINE:126821] [11] 
> /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7fed8372f7bb]
> [MYMACHINE:126821] [12] mb[0x402024]
> [MYMACHINE:126821] [13] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fed8310f7ed]
> [MYMACHINE:126821] [14] mb[0x402111]
> [MYMACHINE:126821] *** End of error message ***
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened.  This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded).  Note that
> Open MPI stopped checking at the first component that it did not find.
> 
> Host:      MYMACHINE
> Framework: ess
> Component: pmi
> --------------------------------------------------------------------------
> [MYMACHINE:126822] *** Process received signal ***
> [MYMACHINE:126822] Signal: Segmentation fault (11)
> [MYMACHINE:126822] Signal code: Address not mapped (1)
> [MYMACHINE:126822] Failing at address: 0x1c0
> [MYMACHINE:126822] [ 0] 
> /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f0174bc0cb0]
> [MYMACHINE:126822] [ 1] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7f01740e3430]
> [MYMACHINE:126822] [ 2] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7f0174372a57]
> [MYMACHINE:126822] [ 3] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7f0174372fb7]
> [MYMACHINE:126822] [ 4] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7f017437318f]
> [MYMACHINE:126822] [ 5] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7f01740c1f2a]
> [MYMACHINE:126822] [ 6] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7f01740c30c3]
> [MYMACHINE:126822] [ 7] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7f01740c4278]
> [MYMACHINE:126822] [ 8] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7f01740cde6c]
> [MYMACHINE:126822] [ 9] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7f0174362e21]
> [MYMACHINE:126822] [10] 
> /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7f0174e11c92]
> [MYMACHINE:126822] [11] 
> /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7f0174e347bb]
> [MYMACHINE:126822] [12] mb[0x402024]
> [MYMACHINE:126822] [13] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f01748147ed]
> [MYMACHINE:126822] [14] mb[0x402111]
> [MYMACHINE:126822] *** End of error message ***
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened.  This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded).  Note that
> Open MPI stopped checking at the first component that it did not find.
> 
> Host:      MYMACHINE
> Framework: ess
> Component: pmi
> --------------------------------------------------------------------------
> [MYMACHINE:126823] *** Process received signal ***
> [MYMACHINE:126823] Signal: Segmentation fault (11)
> [MYMACHINE:126823] Signal code: Address not mapped (1)
> [MYMACHINE:126823] Failing at address: 0x1c0
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened.  This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded).  Note that
> Open MPI stopped checking at the first component that it did not find.
> 
> Host:      MYMACHINE
> Framework: ess
> Component: pmi
> --------------------------------------------------------------------------
> [MYMACHINE:126823] [ 0] 
> /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fcd9cb58cb0]
> [MYMACHINE:126823] [ 1] [MYMACHINE:126824] *** Process received signal ***
> [MYMACHINE:126824] Signal: Segmentation fault (11)
> [MYMACHINE:126824] Signal code: Address not mapped (1)
> [MYMACHINE:126824] Failing at address: 0x1c0
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7fcd9c07b430]
> [MYMACHINE:126823] [ 2] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7fcd9c30aa57]
> [MYMACHINE:126823] [ 3] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7fcd9c30afb7]
> [MYMACHINE:126823] [ 4] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7fcd9c30b18f]
> [MYMACHINE:126823] [ 5] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7fcd9c059f2a]
> [MYMACHINE:126823] [MYMACHINE:126824] [ 0] [ 6] 
> /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f2f0c611cb0]
> [MYMACHINE:126824] [ 1] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7fcd9c05b0c3]
> [MYMACHINE:126823] [ 7] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7f2f0bb34430]
> [MYMACHINE:126824] [ 2] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7f2f0bdc3a57]
> [MYMACHINE:126824] [ 3] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7f2f0bdc3fb7]
> [MYMACHINE:126824] [ 4] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7f2f0bdc418f]
> [MYMACHINE:126824] [ 5] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7fcd9c05c278]
> [MYMACHINE:126823] [ 8] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7fcd9c065e6c]
> [MYMACHINE:126823] [ 9] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7fcd9c2fae21]
> [MYMACHINE:126823] [10] 
> /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7fcd9cda9c92]
> [MYMACHINE:126823] [11] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7f2f0bb12f2a]
> [MYMACHINE:126824] [ 6] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7f2f0bb140c3]
> [MYMACHINE:126824] [ 7] 
> /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7fcd9cdcc7bb]
> [MYMACHINE:126823] [12] mb[0x402024]
> [MYMACHINE:126823] [13] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7f2f0bb15278]
> [MYMACHINE:126824] [ 8] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7f2f0bb1ee6c]
> [MYMACHINE:126824] [ 9] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7f2f0bdb3e21]
> [MYMACHINE:126824] [10] 
> /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7f2f0c862c92]
> [MYMACHINE:126824] [11] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fcd9c7ac7ed]
> [MYMACHINE:126823] [14] mb[0x402111]
> [MYMACHINE:126823] *** End of error message ***
> /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7f2f0c8857bb]
> [MYMACHINE:126824] [12] mb[0x402024]
> [MYMACHINE:126824] [13] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f2f0c2657ed]
> [MYMACHINE:126824] [14] mb[0x402111]
> [MYMACHINE:126824] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 2 with PID 0 on node MYMACHINE exited on 
> signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 
> I am running my script with mpirun in a single node of a SGE cluster.
> 
> I would be very grateful if somebody could give me some hints to solve this 
> issue.
> 
> Thanks a lot in advance
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to