The rdma error sounds like something isn’t right with your machine’s Infiniband installation.
The cross-version problem sounds like you installed both OMPI versions into the same location - did you do that?? If so, then that might be the root cause of both problems. You need to install them in totally different locations. Then you need to _prefix_ your PATH and LD_LIBRARY_PATH with the location of the version you want to use. HTH Ralph > On Aug 19, 2016, at 12:53 AM, Juan A. Cordero Varelaq > <bioinformatica-i...@us.es> wrote: > > Dear users, > > I am totally stuck using openmpi. I have two versions on my machine: 1.8.1 > and 2.0.0, and none of them work. When use the mpirun 1.8.1 version, I get > the following error: > > librdmacm: Fatal: unable to open RDMA device > librdmacm: Fatal: unable to open RDMA device > librdmacm: Fatal: unable to open RDMA device > librdmacm: Fatal: unable to open RDMA device > librdmacm: Fatal: unable to open RDMA device > -------------------------------------------------------------------------- > Open MPI failed to open the /dev/knem device due to a local error. > Please check with your system administrator to get the problem fixed, > or set the btl_sm_use_knem MCA parameter to 0 to run without /dev/knem > support. > > Local host: MYMACHINE > Errno: 2 (No such file or directory) > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Open MPI failed to open an OpenFabrics device. This is an unusual > error; the system reported the OpenFabrics device as being present, > but then later failed to access it successfully. This usually > indicates either a misconfiguration or a failed OpenFabrics hardware > device. > > All OpenFabrics support has been disabled in this MPI process; your > job may or may not continue. > > Hostname: MYMACHINE > Device name: mlx4_0 > Errror (22): Invalid argument > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > [[60527,1],4]: A high-performance Open MPI point-to-point messaging module > was unable to find any relevant network interfaces: > > Module: usNIC > Host: MYMACHINE > > When I use the 2.0.0 version, I get something strange, it seems openmpi-2.0.0 > looks for openmpi-1.8.1 libraries?: > > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: MYMACHINE > Framework: ess > Component: pmi > -------------------------------------------------------------------------- > [MYMACHINE:126820] *** Process received signal *** > [MYMACHINE:126820] Signal: Segmentation fault (11) > [MYMACHINE:126820] Signal code: Address not mapped (1) > [MYMACHINE:126820] Failing at address: 0x1c0 > [MYMACHINE:126820] [ 0] > /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f39b2ec4cb0] > [MYMACHINE:126820] [ 1] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7f39b23e7430] > [MYMACHINE:126820] [ 2] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7f39b2676a57] > [MYMACHINE:126820] [ 3] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7f39b2676fb7] > [MYMACHINE:126820] [ 4] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7f39b267718f] > [MYMACHINE:126820] [ 5] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7f39b23c5f2a] > [MYMACHINE:126820] [ 6] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7f39b23c70c3] > [MYMACHINE:126820] [ 7] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7f39b23c8278] > [MYMACHINE:126820] [ 8] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7f39b23d1e6c] > [MYMACHINE:126820] [ 9] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7f39b2666e21] > [MYMACHINE:126820] [10] > /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7f39b3115c92] > [MYMACHINE:126820] [11] > /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7f39b31387bb] > [MYMACHINE:126820] [12] mb[0x402024] > [MYMACHINE:126820] [13] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f39b2b187ed] > [MYMACHINE:126820] [14] mb[0x402111] > [MYMACHINE:126820] *** End of error message *** > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: MYMACHINE > Framework: ess > Component: pmi > -------------------------------------------------------------------------- > [MYMACHINE:126821] *** Process received signal *** > [MYMACHINE:126821] Signal: Segmentation fault (11) > [MYMACHINE:126821] Signal code: Address not mapped (1) > [MYMACHINE:126821] Failing at address: 0x1c0 > [MYMACHINE:126821] [ 0] > /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fed834bbcb0] > [MYMACHINE:126821] [ 1] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7fed829de430] > [MYMACHINE:126821] [ 2] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7fed82c6da57] > [MYMACHINE:126821] [ 3] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7fed82c6dfb7] > [MYMACHINE:126821] [ 4] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7fed82c6e18f] > [MYMACHINE:126821] [ 5] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7fed829bcf2a] > [MYMACHINE:126821] [ 6] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7fed829be0c3] > [MYMACHINE:126821] [ 7] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7fed829bf278] > [MYMACHINE:126821] [ 8] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7fed829c8e6c] > [MYMACHINE:126821] [ 9] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7fed82c5de21] > [MYMACHINE:126821] [10] > /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7fed8370cc92] > [MYMACHINE:126821] [11] > /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7fed8372f7bb] > [MYMACHINE:126821] [12] mb[0x402024] > [MYMACHINE:126821] [13] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fed8310f7ed] > [MYMACHINE:126821] [14] mb[0x402111] > [MYMACHINE:126821] *** End of error message *** > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: MYMACHINE > Framework: ess > Component: pmi > -------------------------------------------------------------------------- > [MYMACHINE:126822] *** Process received signal *** > [MYMACHINE:126822] Signal: Segmentation fault (11) > [MYMACHINE:126822] Signal code: Address not mapped (1) > [MYMACHINE:126822] Failing at address: 0x1c0 > [MYMACHINE:126822] [ 0] > /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f0174bc0cb0] > [MYMACHINE:126822] [ 1] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7f01740e3430] > [MYMACHINE:126822] [ 2] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7f0174372a57] > [MYMACHINE:126822] [ 3] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7f0174372fb7] > [MYMACHINE:126822] [ 4] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7f017437318f] > [MYMACHINE:126822] [ 5] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7f01740c1f2a] > [MYMACHINE:126822] [ 6] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7f01740c30c3] > [MYMACHINE:126822] [ 7] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7f01740c4278] > [MYMACHINE:126822] [ 8] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7f01740cde6c] > [MYMACHINE:126822] [ 9] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7f0174362e21] > [MYMACHINE:126822] [10] > /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7f0174e11c92] > [MYMACHINE:126822] [11] > /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7f0174e347bb] > [MYMACHINE:126822] [12] mb[0x402024] > [MYMACHINE:126822] [13] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f01748147ed] > [MYMACHINE:126822] [14] mb[0x402111] > [MYMACHINE:126822] *** End of error message *** > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: MYMACHINE > Framework: ess > Component: pmi > -------------------------------------------------------------------------- > [MYMACHINE:126823] *** Process received signal *** > [MYMACHINE:126823] Signal: Segmentation fault (11) > [MYMACHINE:126823] Signal code: Address not mapped (1) > [MYMACHINE:126823] Failing at address: 0x1c0 > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: MYMACHINE > Framework: ess > Component: pmi > -------------------------------------------------------------------------- > [MYMACHINE:126823] [ 0] > /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fcd9cb58cb0] > [MYMACHINE:126823] [ 1] [MYMACHINE:126824] *** Process received signal *** > [MYMACHINE:126824] Signal: Segmentation fault (11) > [MYMACHINE:126824] Signal code: Address not mapped (1) > [MYMACHINE:126824] Failing at address: 0x1c0 > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7fcd9c07b430] > [MYMACHINE:126823] [ 2] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7fcd9c30aa57] > [MYMACHINE:126823] [ 3] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7fcd9c30afb7] > [MYMACHINE:126823] [ 4] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7fcd9c30b18f] > [MYMACHINE:126823] [ 5] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7fcd9c059f2a] > [MYMACHINE:126823] [MYMACHINE:126824] [ 0] [ 6] > /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f2f0c611cb0] > [MYMACHINE:126824] [ 1] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7fcd9c05b0c3] > [MYMACHINE:126823] [ 7] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7f2f0bb34430] > [MYMACHINE:126824] [ 2] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7f2f0bdc3a57] > [MYMACHINE:126824] [ 3] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7f2f0bdc3fb7] > [MYMACHINE:126824] [ 4] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7f2f0bdc418f] > [MYMACHINE:126824] [ 5] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7fcd9c05c278] > [MYMACHINE:126823] [ 8] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7fcd9c065e6c] > [MYMACHINE:126823] [ 9] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7fcd9c2fae21] > [MYMACHINE:126823] [10] > /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7fcd9cda9c92] > [MYMACHINE:126823] [11] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7f2f0bb12f2a] > [MYMACHINE:126824] [ 6] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7f2f0bb140c3] > [MYMACHINE:126824] [ 7] > /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7fcd9cdcc7bb] > [MYMACHINE:126823] [12] mb[0x402024] > [MYMACHINE:126823] [13] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7f2f0bb15278] > [MYMACHINE:126824] [ 8] > /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7f2f0bb1ee6c] > [MYMACHINE:126824] [ 9] > /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7f2f0bdb3e21] > [MYMACHINE:126824] [10] > /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7f2f0c862c92] > [MYMACHINE:126824] [11] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fcd9c7ac7ed] > [MYMACHINE:126823] [14] mb[0x402111] > [MYMACHINE:126823] *** End of error message *** > /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7f2f0c8857bb] > [MYMACHINE:126824] [12] mb[0x402024] > [MYMACHINE:126824] [13] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f2f0c2657ed] > [MYMACHINE:126824] [14] mb[0x402111] > [MYMACHINE:126824] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 2 with PID 0 on node MYMACHINE exited on > signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > > I am running my script with mpirun in a single node of a SGE cluster. > > I would be very grateful if somebody could give me some hints to solve this > issue. > > Thanks a lot in advance > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users