Re: [OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-10 Thread Gus Correa via users
Hi Guido

Your PATH and LD_LIBRARY_PATH seem to be inconsistent with each other:

PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH
LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH
Hence, you may be mixing different versions of Open MPI.

It looks like you installer Open MPI 4.0.2 here:
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/

Have you tried this instead?
LD_LIBRARY_PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib:$LD_LIBRARY_PATH

I hope this helps,
Gus Correa

On Tue, Dec 10, 2019 at 4:40 PM Guido granda muñoz via users <
users@lists.open-mpi.org> wrote:

> Hello,
> I compiled the application now using  openmpi-4.0.2:
>
>  linux-vdso.so.1 =>  (0x7fffb23ff000)
> libhdf5.so.103 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/hdf5-1.10.5_serial/lib/libhdf5.so.103
> (0x2b3cd188c000)
> libz.so.1 => /lib64/libz.so.1 (0x2b3cd1e74000)
> libmpi_usempif08.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempif08.so.40
> (0x2b3cd208a000)
> libmpi_usempi_ignore_tkr.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempi_ignore_tkr.so.40
> (0x2b3cd22c)
> libmpi_mpifh.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_mpifh.so.40
> (0x2b3cd24c7000)
> libmpi.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi.so.40
> (0x2b3cd2723000)
> libgfortran.so.4 => /share/apps/gcc-7.3.0/lib64/libgfortran.so.4
> (0x2b3cd2a55000)
> libm.so.6 => /lib64/libm.so.6 (0x2b3cd2dc3000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2b3cd3047000)
> libquadmath.so.0 => /share/apps/gcc-5.4.0/lib64/libquadmath.so.0
> (0x2b3cd325e000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x2b3cd349c000)
> libc.so.6 => /lib64/libc.so.6 (0x2b3cd36b9000)
> librt.so.1 => /lib64/librt.so.1 (0x2b3cd3a4e000)
> libdl.so.2 => /lib64/libdl.so.2 (0x2b3cd3c56000)
> libopen-rte.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-rte.so.40
> (0x2b3cd3e5b000)
> libopen-pal.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-pal.so.40
> (0x2b3cd411)
> libudev.so.0 => /lib64/libudev.so.0 (0x2b3cd4425000)
> libutil.so.1 => /lib64/libutil.so.1 (0x2b3cd4634000)
> /lib64/ld-linux-x86-64.so.2 (0x2b3cd166a000)
>
> and ran it like this:
>
> #!/bin/bash
> #PBS -l nodes=1:ppn=32
> #PBS -N mc_cond_0_h3
> #PBS -o mc_cond_0_h3.o
> #PBS -e mc_cond_0_h3.e
>
> PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH
> LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH
> cd $PBS_O_WORKDIR
> mpirun -np 32 ./flash4
>
> and now I'm getting this error messages:
>
> --
>
> As of version 3.0.0, the "sm" BTL is no longer available in Open MPI.
>
>
> Efficient, high-speed same-node shared memory communication support in
>
> Open MPI is available in the "vader" BTL. To use the vader BTL, you
>
> can re-run your job with:
>
>
> mpirun --mca btl vader,self,... your_mpi_application
>
> --
>
> --
>
> A requested component was not found, or was unable to be opened. This
>
> means that this component is either not installed or is unable to be
>
> used on your system (e.g., sometimes this means that shared libraries
>
> that the component requires are unable to be found/loaded). Note that
>
> Open MPI stopped checking at the first component that it did not find.
>
>
> Host: compute-0-34.local
>
> Framework: btl
>
> Component: sm
>
> --
>
> --
>
> It looks like MPI_INIT failed for some reason; your parallel process is
>
> likely to abort. There are many reasons that a parallel process can
>
> fail during MPI_INIT; some of which are due to configuration or environment
>
> problems. This failure appears to be an internal failure; here's some
>
> additional information (which may only be relevant to an Open MPI
>
> developer):
>
>
> mca_bml_base_open() failed
>
> --> Returned "Not found" (-13) instead of "Success" (0)
>
> --
>
> [compute-0-34:16915] *** An error occurred in MPI_Init
>
> [compute-0-34:16915] *** reported by process [3776708609,5]
>
> [compute-0-34:16915] *** on a NULL communicator
>
> [compute-0-34:16915] *** Unknown error
>
> [compute-0-34:16915] *** MPI_ERRORS_ARE_FATAL (processes in this
> communicator will now abort,
>
> [compute-0-34:16915] *** and potentially your MPI job)
>
> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
> server/pmix_server.c at line 2147
>
> [compute-0-34.local:16902] 

Re: [OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-10 Thread Guido granda muñoz via users
Hello,
I compiled the application now using  openmpi-4.0.2:

 linux-vdso.so.1 =>  (0x7fffb23ff000)
libhdf5.so.103 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/hdf5-1.10.5_serial/lib/libhdf5.so.103
(0x2b3cd188c000)
libz.so.1 => /lib64/libz.so.1 (0x2b3cd1e74000)
libmpi_usempif08.so.40 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempif08.so.40
(0x2b3cd208a000)
libmpi_usempi_ignore_tkr.so.40 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempi_ignore_tkr.so.40
(0x2b3cd22c)
libmpi_mpifh.so.40 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_mpifh.so.40
(0x2b3cd24c7000)
libmpi.so.40 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi.so.40
(0x2b3cd2723000)
libgfortran.so.4 => /share/apps/gcc-7.3.0/lib64/libgfortran.so.4
(0x2b3cd2a55000)
libm.so.6 => /lib64/libm.so.6 (0x2b3cd2dc3000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2b3cd3047000)
libquadmath.so.0 => /share/apps/gcc-5.4.0/lib64/libquadmath.so.0
(0x2b3cd325e000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x2b3cd349c000)
libc.so.6 => /lib64/libc.so.6 (0x2b3cd36b9000)
librt.so.1 => /lib64/librt.so.1 (0x2b3cd3a4e000)
libdl.so.2 => /lib64/libdl.so.2 (0x2b3cd3c56000)
libopen-rte.so.40 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-rte.so.40
(0x2b3cd3e5b000)
libopen-pal.so.40 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-pal.so.40
(0x2b3cd411)
libudev.so.0 => /lib64/libudev.so.0 (0x2b3cd4425000)
libutil.so.1 => /lib64/libutil.so.1 (0x2b3cd4634000)
/lib64/ld-linux-x86-64.so.2 (0x2b3cd166a000)

and ran it like this:

#!/bin/bash
#PBS -l nodes=1:ppn=32
#PBS -N mc_cond_0_h3
#PBS -o mc_cond_0_h3.o
#PBS -e mc_cond_0_h3.e

PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH
LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH
cd $PBS_O_WORKDIR
mpirun -np 32 ./flash4

and now I'm getting this error messages:

--

As of version 3.0.0, the "sm" BTL is no longer available in Open MPI.


Efficient, high-speed same-node shared memory communication support in

Open MPI is available in the "vader" BTL. To use the vader BTL, you

can re-run your job with:


mpirun --mca btl vader,self,... your_mpi_application

--

--

A requested component was not found, or was unable to be opened. This

means that this component is either not installed or is unable to be

used on your system (e.g., sometimes this means that shared libraries

that the component requires are unable to be found/loaded). Note that

Open MPI stopped checking at the first component that it did not find.


Host: compute-0-34.local

Framework: btl

Component: sm

--

--

It looks like MPI_INIT failed for some reason; your parallel process is

likely to abort. There are many reasons that a parallel process can

fail during MPI_INIT; some of which are due to configuration or environment

problems. This failure appears to be an internal failure; here's some

additional information (which may only be relevant to an Open MPI

developer):


mca_bml_base_open() failed

--> Returned "Not found" (-13) instead of "Success" (0)

--

[compute-0-34:16915] *** An error occurred in MPI_Init

[compute-0-34:16915] *** reported by process [3776708609,5]

[compute-0-34:16915] *** on a NULL communicator

[compute-0-34:16915] *** Unknown error

[compute-0-34:16915] *** MPI_ERRORS_ARE_FATAL (processes in this
communicator will now abort,

[compute-0-34:16915] *** and potentially your MPI job)

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902]