Re: [OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-11 Thread Gilles Gouaillardet via users
Guido,

This error message is from MPICH and not Open MPI.

Make sure your environment is correct and the shared filesystem is mounted on 
the compute nodes.


Cheers,

Gilles

Sent from my iPod

> On Dec 12, 2019, at 1:44, Guido granda muñoz via users 
>  wrote:
> 
> Hi, 
> after following the instructions of the error message, in other works running 
> like this:
> 
> #!/bin/bash
> #PBS -l nodes=1:ppn=32
> #PBS -N mc_cond_0_h3
> #PBS -o mc_cond_0_h3.o
> #PBS -e mc_cond_0_h3.e
> 
> PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH
> LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH
> cd $PBS_O_WORKDIR
> mpirun --mca btl vader,self -np 32 ./flash4 
> 
> I get the following error messages:
> 
> [mpiexec@compute-0-34.local] match_arg (utils/args/args.c:159): unrecognized 
> argument mca
> [mpiexec@compute-0-34.local] HYDU_parse_array (utils/args/args.c:174): 
> argument matching returned error
> [mpiexec@compute-0-34.local] parse_args (ui/mpich/utils.c:1596): error 
> parsing input array
> [mpiexec@compute-0-34.local] HYD_uii_mpx_get_parameters 
> (ui/mpich/utils.c:1648): unable to parse user arguments
> [mpiexec@compute-0-34.local] main (ui/mpich/mpiexec.c:149): error parsing 
> parameters
> 
> Am I running it incorrectly ? 
> Cheers,
> 
> El mar., 10 dic. 2019 a las 15:40, Guido granda muñoz 
> () escribió:
>> Hello,
>> I compiled the application now using  openmpi-4.0.2:
>> 
>>  linux-vdso.so.1 =>  (0x7fffb23ff000)
>> libhdf5.so.103 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/hdf5-1.10.5_serial/lib/libhdf5.so.103
>>  (0x2b3cd188c000)
>> libz.so.1 => /lib64/libz.so.1 (0x2b3cd1e74000)
>> libmpi_usempif08.so.40 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempif08.so.40
>>  (0x2b3cd208a000)
>> libmpi_usempi_ignore_tkr.so.40 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempi_ignore_tkr.so.40
>>  (0x2b3cd22c)
>> libmpi_mpifh.so.40 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_mpifh.so.40
>>  (0x2b3cd24c7000)
>> libmpi.so.40 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi.so.40 
>> (0x2b3cd2723000)
>> libgfortran.so.4 => /share/apps/gcc-7.3.0/lib64/libgfortran.so.4 
>> (0x2b3cd2a55000)
>> libm.so.6 => /lib64/libm.so.6 (0x2b3cd2dc3000)
>> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2b3cd3047000)
>> libquadmath.so.0 => /share/apps/gcc-5.4.0/lib64/libquadmath.so.0 
>> (0x2b3cd325e000)
>> libpthread.so.0 => /lib64/libpthread.so.0 (0x2b3cd349c000)
>> libc.so.6 => /lib64/libc.so.6 (0x2b3cd36b9000)
>> librt.so.1 => /lib64/librt.so.1 (0x2b3cd3a4e000)
>> libdl.so.2 => /lib64/libdl.so.2 (0x2b3cd3c56000)
>> libopen-rte.so.40 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-rte.so.40
>>  (0x2b3cd3e5b000)
>> libopen-pal.so.40 => 
>> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-pal.so.40
>>  (0x2b3cd411)
>> libudev.so.0 => /lib64/libudev.so.0 (0x2b3cd4425000)
>> libutil.so.1 => /lib64/libutil.so.1 (0x2b3cd4634000)
>> /lib64/ld-linux-x86-64.so.2 (0x2b3cd166a000)
>> 
>> and ran it like this:
>> 
>> #!/bin/bash
>> #PBS -l nodes=1:ppn=32
>> #PBS -N mc_cond_0_h3 
>> #PBS -o mc_cond_0_h3.o
>> #PBS -e mc_cond_0_h3.e
>> 
>> PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH
>> LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH
>> cd $PBS_O_WORKDIR
>> mpirun -np 32 ./flash4 
>> 
>> and now I'm getting this error messages:
>> 
>> --
>> As of version 3.0.0, the "sm" BTL is no longer available in Open MPI.
>> 
>> Efficient, high-speed same-node shared memory communication support in
>> Open MPI is available in the "vader" BTL. To use the vader BTL, you
>> can re-run your job with:
>> 
>> mpirun --mca btl vader,self,... your_mpi_application
>> --
>> --
>> A requested component was not found, or was unable to be opened. This
>> means that this component is either not installed or is unable to be
>> used on your system (e.g., sometimes this means that shared libraries
>> that the component requires are unable to be found/loaded). Note that
>> Open MPI stopped checking at the first component that it did not find.
>> 
>> Host: compute-0-34.local
>> Framework: btl
>> Component: sm
>> --
>> --
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an 

Re: [OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-11 Thread Guido granda muñoz via users
Hi,
after following the instructions of the error message, in other works
running like this:

#!/bin/bash
#PBS -l nodes=1:ppn=32
#PBS -N mc_cond_0_h3
#PBS -o mc_cond_0_h3.o
#PBS -e mc_cond_0_h3.e

PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH
LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH
cd $PBS_O_WORKDIR
mpirun --mca btl vader,self -np 32 ./flash4

I get the following error messages:

[mpiexec@compute-0-34.local] match_arg (utils/args/args.c:159):
unrecognized argument mca
[mpiexec@compute-0-34.local] HYDU_parse_array (utils/args/args.c:174):
argument matching returned error
[mpiexec@compute-0-34.local] parse_args (ui/mpich/utils.c:1596): error
parsing input array
[mpiexec@compute-0-34.local] HYD_uii_mpx_get_parameters
(ui/mpich/utils.c:1648): unable to parse user arguments
[mpiexec@compute-0-34.local] main (ui/mpich/mpiexec.c:149): error parsing
parameters

Am I running it incorrectly ?
Cheers,

El mar., 10 dic. 2019 a las 15:40, Guido granda muñoz (<
guidogra...@gmail.com>) escribió:

> Hello,
> I compiled the application now using  openmpi-4.0.2:
>
>  linux-vdso.so.1 =>  (0x7fffb23ff000)
> libhdf5.so.103 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/hdf5-1.10.5_serial/lib/libhdf5.so.103
> (0x2b3cd188c000)
> libz.so.1 => /lib64/libz.so.1 (0x2b3cd1e74000)
> libmpi_usempif08.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempif08.so.40
> (0x2b3cd208a000)
> libmpi_usempi_ignore_tkr.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempi_ignore_tkr.so.40
> (0x2b3cd22c)
> libmpi_mpifh.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_mpifh.so.40
> (0x2b3cd24c7000)
> libmpi.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi.so.40
> (0x2b3cd2723000)
> libgfortran.so.4 => /share/apps/gcc-7.3.0/lib64/libgfortran.so.4
> (0x2b3cd2a55000)
> libm.so.6 => /lib64/libm.so.6 (0x2b3cd2dc3000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2b3cd3047000)
> libquadmath.so.0 => /share/apps/gcc-5.4.0/lib64/libquadmath.so.0
> (0x2b3cd325e000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x2b3cd349c000)
> libc.so.6 => /lib64/libc.so.6 (0x2b3cd36b9000)
> librt.so.1 => /lib64/librt.so.1 (0x2b3cd3a4e000)
> libdl.so.2 => /lib64/libdl.so.2 (0x2b3cd3c56000)
> libopen-rte.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-rte.so.40
> (0x2b3cd3e5b000)
> libopen-pal.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-pal.so.40
> (0x2b3cd411)
> libudev.so.0 => /lib64/libudev.so.0 (0x2b3cd4425000)
> libutil.so.1 => /lib64/libutil.so.1 (0x2b3cd4634000)
> /lib64/ld-linux-x86-64.so.2 (0x2b3cd166a000)
>
> and ran it like this:
>
> #!/bin/bash
> #PBS -l nodes=1:ppn=32
> #PBS -N mc_cond_0_h3
> #PBS -o mc_cond_0_h3.o
> #PBS -e mc_cond_0_h3.e
>
> PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH
> LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH
> cd $PBS_O_WORKDIR
> mpirun -np 32 ./flash4
>
> and now I'm getting this error messages:
>
> --
>
> As of version 3.0.0, the "sm" BTL is no longer available in Open MPI.
>
>
> Efficient, high-speed same-node shared memory communication support in
>
> Open MPI is available in the "vader" BTL. To use the vader BTL, you
>
> can re-run your job with:
>
>
> mpirun --mca btl vader,self,... your_mpi_application
>
> --
>
> --
>
> A requested component was not found, or was unable to be opened. This
>
> means that this component is either not installed or is unable to be
>
> used on your system (e.g., sometimes this means that shared libraries
>
> that the component requires are unable to be found/loaded). Note that
>
> Open MPI stopped checking at the first component that it did not find.
>
>
> Host: compute-0-34.local
>
> Framework: btl
>
> Component: sm
>
> --
>
> --
>
> It looks like MPI_INIT failed for some reason; your parallel process is
>
> likely to abort. There are many reasons that a parallel process can
>
> fail during MPI_INIT; some of which are due to configuration or environment
>
> problems. This failure appears to be an internal failure; here's some
>
> additional information (which may only be relevant to an Open MPI
>
> developer):
>
>
> mca_bml_base_open() failed
>
> --> Returned "Not found" (-13) instead of "Success" (0)
>
> --
>
> [compute-0-34:16915] *** An error occurred in MPI_Init
>
> [compute-0-34:16915] *** 

Re: [OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-10 Thread Gus Correa via users
Hi Guido

Your PATH and LD_LIBRARY_PATH seem to be inconsistent with each other:

PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH
LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH
Hence, you may be mixing different versions of Open MPI.

It looks like you installer Open MPI 4.0.2 here:
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/

Have you tried this instead?
LD_LIBRARY_PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib:$LD_LIBRARY_PATH

I hope this helps,
Gus Correa

On Tue, Dec 10, 2019 at 4:40 PM Guido granda muñoz via users <
users@lists.open-mpi.org> wrote:

> Hello,
> I compiled the application now using  openmpi-4.0.2:
>
>  linux-vdso.so.1 =>  (0x7fffb23ff000)
> libhdf5.so.103 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/hdf5-1.10.5_serial/lib/libhdf5.so.103
> (0x2b3cd188c000)
> libz.so.1 => /lib64/libz.so.1 (0x2b3cd1e74000)
> libmpi_usempif08.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempif08.so.40
> (0x2b3cd208a000)
> libmpi_usempi_ignore_tkr.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempi_ignore_tkr.so.40
> (0x2b3cd22c)
> libmpi_mpifh.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_mpifh.so.40
> (0x2b3cd24c7000)
> libmpi.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi.so.40
> (0x2b3cd2723000)
> libgfortran.so.4 => /share/apps/gcc-7.3.0/lib64/libgfortran.so.4
> (0x2b3cd2a55000)
> libm.so.6 => /lib64/libm.so.6 (0x2b3cd2dc3000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2b3cd3047000)
> libquadmath.so.0 => /share/apps/gcc-5.4.0/lib64/libquadmath.so.0
> (0x2b3cd325e000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x2b3cd349c000)
> libc.so.6 => /lib64/libc.so.6 (0x2b3cd36b9000)
> librt.so.1 => /lib64/librt.so.1 (0x2b3cd3a4e000)
> libdl.so.2 => /lib64/libdl.so.2 (0x2b3cd3c56000)
> libopen-rte.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-rte.so.40
> (0x2b3cd3e5b000)
> libopen-pal.so.40 =>
> /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-pal.so.40
> (0x2b3cd411)
> libudev.so.0 => /lib64/libudev.so.0 (0x2b3cd4425000)
> libutil.so.1 => /lib64/libutil.so.1 (0x2b3cd4634000)
> /lib64/ld-linux-x86-64.so.2 (0x2b3cd166a000)
>
> and ran it like this:
>
> #!/bin/bash
> #PBS -l nodes=1:ppn=32
> #PBS -N mc_cond_0_h3
> #PBS -o mc_cond_0_h3.o
> #PBS -e mc_cond_0_h3.e
>
> PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH
> LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH
> cd $PBS_O_WORKDIR
> mpirun -np 32 ./flash4
>
> and now I'm getting this error messages:
>
> --
>
> As of version 3.0.0, the "sm" BTL is no longer available in Open MPI.
>
>
> Efficient, high-speed same-node shared memory communication support in
>
> Open MPI is available in the "vader" BTL. To use the vader BTL, you
>
> can re-run your job with:
>
>
> mpirun --mca btl vader,self,... your_mpi_application
>
> --
>
> --
>
> A requested component was not found, or was unable to be opened. This
>
> means that this component is either not installed or is unable to be
>
> used on your system (e.g., sometimes this means that shared libraries
>
> that the component requires are unable to be found/loaded). Note that
>
> Open MPI stopped checking at the first component that it did not find.
>
>
> Host: compute-0-34.local
>
> Framework: btl
>
> Component: sm
>
> --
>
> --
>
> It looks like MPI_INIT failed for some reason; your parallel process is
>
> likely to abort. There are many reasons that a parallel process can
>
> fail during MPI_INIT; some of which are due to configuration or environment
>
> problems. This failure appears to be an internal failure; here's some
>
> additional information (which may only be relevant to an Open MPI
>
> developer):
>
>
> mca_bml_base_open() failed
>
> --> Returned "Not found" (-13) instead of "Success" (0)
>
> --
>
> [compute-0-34:16915] *** An error occurred in MPI_Init
>
> [compute-0-34:16915] *** reported by process [3776708609,5]
>
> [compute-0-34:16915] *** on a NULL communicator
>
> [compute-0-34:16915] *** Unknown error
>
> [compute-0-34:16915] *** MPI_ERRORS_ARE_FATAL (processes in this
> communicator will now abort,
>
> [compute-0-34:16915] *** and potentially your MPI job)
>
> [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
> server/pmix_server.c at line 2147
>
> [compute-0-34.local:16902] 

Re: [OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-10 Thread Guido granda muñoz via users
Hello,
I compiled the application now using  openmpi-4.0.2:

 linux-vdso.so.1 =>  (0x7fffb23ff000)
libhdf5.so.103 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/hdf5-1.10.5_serial/lib/libhdf5.so.103
(0x2b3cd188c000)
libz.so.1 => /lib64/libz.so.1 (0x2b3cd1e74000)
libmpi_usempif08.so.40 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempif08.so.40
(0x2b3cd208a000)
libmpi_usempi_ignore_tkr.so.40 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempi_ignore_tkr.so.40
(0x2b3cd22c)
libmpi_mpifh.so.40 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_mpifh.so.40
(0x2b3cd24c7000)
libmpi.so.40 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi.so.40
(0x2b3cd2723000)
libgfortran.so.4 => /share/apps/gcc-7.3.0/lib64/libgfortran.so.4
(0x2b3cd2a55000)
libm.so.6 => /lib64/libm.so.6 (0x2b3cd2dc3000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2b3cd3047000)
libquadmath.so.0 => /share/apps/gcc-5.4.0/lib64/libquadmath.so.0
(0x2b3cd325e000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x2b3cd349c000)
libc.so.6 => /lib64/libc.so.6 (0x2b3cd36b9000)
librt.so.1 => /lib64/librt.so.1 (0x2b3cd3a4e000)
libdl.so.2 => /lib64/libdl.so.2 (0x2b3cd3c56000)
libopen-rte.so.40 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-rte.so.40
(0x2b3cd3e5b000)
libopen-pal.so.40 =>
/home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-pal.so.40
(0x2b3cd411)
libudev.so.0 => /lib64/libudev.so.0 (0x2b3cd4425000)
libutil.so.1 => /lib64/libutil.so.1 (0x2b3cd4634000)
/lib64/ld-linux-x86-64.so.2 (0x2b3cd166a000)

and ran it like this:

#!/bin/bash
#PBS -l nodes=1:ppn=32
#PBS -N mc_cond_0_h3
#PBS -o mc_cond_0_h3.o
#PBS -e mc_cond_0_h3.e

PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH
LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH
cd $PBS_O_WORKDIR
mpirun -np 32 ./flash4

and now I'm getting this error messages:

--

As of version 3.0.0, the "sm" BTL is no longer available in Open MPI.


Efficient, high-speed same-node shared memory communication support in

Open MPI is available in the "vader" BTL. To use the vader BTL, you

can re-run your job with:


mpirun --mca btl vader,self,... your_mpi_application

--

--

A requested component was not found, or was unable to be opened. This

means that this component is either not installed or is unable to be

used on your system (e.g., sometimes this means that shared libraries

that the component requires are unable to be found/loaded). Note that

Open MPI stopped checking at the first component that it did not find.


Host: compute-0-34.local

Framework: btl

Component: sm

--

--

It looks like MPI_INIT failed for some reason; your parallel process is

likely to abort. There are many reasons that a parallel process can

fail during MPI_INIT; some of which are due to configuration or environment

problems. This failure appears to be an internal failure; here's some

additional information (which may only be relevant to an Open MPI

developer):


mca_bml_base_open() failed

--> Returned "Not found" (-13) instead of "Success" (0)

--

[compute-0-34:16915] *** An error occurred in MPI_Init

[compute-0-34:16915] *** reported by process [3776708609,5]

[compute-0-34:16915] *** on a NULL communicator

[compute-0-34:16915] *** Unknown error

[compute-0-34:16915] *** MPI_ERRORS_ARE_FATAL (processes in this
communicator will now abort,

[compute-0-34:16915] *** and potentially your MPI job)

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file
server/pmix_server.c at line 2147

[compute-0-34.local:16902] 

Re: [OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-06 Thread Jeff Squyres (jsquyres) via users
On Dec 6, 2019, at 1:03 PM, Jeff Squyres (jsquyres) via users 
 wrote:
> 
>> I get the same error when running in a single node. I will try to use the 
>> last version. Is there way to check if different versions of open mpi were 
>> used in different nodes? 
> 
> mpirun -np 2 ompi_info | head
> 
> Or something like that.  With 1.10, I don't know/remember the mpirun CLI 
> option to make one process per node (when ppn>1); you may have to check that. 
>  Or just "mpirun -np 33 ompi_info | head" and examine the output carefully to 
> find the 33rd output and see if it's different.

Poor quoting on my part.  The intent was to see just the first few lines from 
running `ompi_info` on each node.

So maybe something like:

--
$ cat foo.sh
#!/bin/sh
ompi_info | head
$ mpirun -np 2 foo.sh
--

Or "mprun -np 33 foo.sh", etc.

-- 
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-06 Thread Jeff Squyres (jsquyres) via users
On Dec 6, 2019, at 12:40 PM, Guido granda muñoz  wrote:
> 
> I get the same error when running in a single node. I will try to use the 
> last version. Is there way to check if different versions of open mpi were 
> used in different nodes? 

mpirun -np 2 ompi_info | head

Or something like that.  With 1.10, I don't know/remember the mpirun CLI option 
to make one process per node (when ppn>1); you may have to check that.  Or just 
"mpirun -np 33 ompi_info | head" and examine the output carefully to find the 
33rd output and see if it's different.

That being said, thinking a little deeper on this: with your ld output, it's 
probably kinda unlikely that you have mismatched versions (because you're 
linking against very specific OMPI library versions) -- particularly if you're 
getting the same error when running on a single node.

Open MPI is 100% user space code -- you can just install Open MPI v4.0.2 under 
your $HOME and run it there.

You might want to see how your system-level Open MPI is installed (e.g., use 
the right ./configure option to get the PBS/TM integration, if you have a 
high-speed/HPC-class network support API library, ...etc.), and mimic all of 
that in your own personal Open MPI install.

-- 
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-06 Thread Guido granda muñoz via users
Hello,
I get the same error when running in a single node. I will try to use the
last version. Is there way to check if different versions of open mpi were
used in different nodes?
Cheers,

El jue., 5 dic. 2019 a las 19:10, Jeff Squyres (jsquyres) (<
jsquy...@cisco.com>) escribió:

> Are you able to run on a single node?
>
> Is there any chance you can upgrade your Open MPI?  1.10 is ancient and
> isn't really supported any more.  4.0.2 is the current version.
>
>
> On Dec 5, 2019, at 7:15 PM, Guido granda muñoz 
> wrote:
>
> Hello Jeff,
>
> Thank you for replying. I ran it using PBS like this:
>
> #!/bin/bash
> #PBS -l nodes=2:ppn=32
> #PBS -N cond_0_h3
> #PBS -o cond_0_h3.o
> #PBS -e cond_0_h3.e
>
>
> PATH=$PATH:/usr/mpi/intel/openmpi-1.10.3/bin
>
> LD_LIBRARY_PATH=/share/apps/composerxe-2011.2.137/lib/intel64:$LD_LIBRARY_PATH
> cd $PBS_O_WORKDIR
> mpirun -np 64 ./flash4
>
> Besides that, my .bashrc has the following lines:
> # Source global definitions
> if [ -f /etc/bashrc ]; then
> . /etc/bashrc
> fi
> #irya.guido.intel
> export PATH=$PATH:/home/guido/libraries/compiled_with_intel/hdf5-1.8.20/bin
> export
> INCLUDE=$INCLUDE:/home/guido/libraries/compiled_with_intel/hdf5-1.8.20/include
> export
> CPATH=$CPATH:/home/guido/libraries/compiled_with_intel/hdf5-1.8.20/include
> export
> LIBRARY_PATH=$LIBRARY_PATH:/home/guido/libraries/compiled_with_intel/hdf5-1.8.20/lib
> export
> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/guido/libraries/compiled_with_intel/hdf5-1.8.20/lib
> ###intel open-mpi system###
> export PATH=$PATH:/usr/mpi/intel/openmpi-1.10.3/bin
> export INCLUDE=$INCLUDE:/usr/mpi/intel/openmpi-1.10.3/include
> export CPATH=$CPATH:/usr/mpi/intel/openmpi-1.10.3/include
> export LIBRARY_PATH=$LIBRARY_PATH:/usr/mpi/intel/openmpi-1.10.3/lib
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/mpi/intel/openmpi-1.10.3/lib
> ##anaconda 3 ##
> export PATH="$PATH:/home/guido/anaconda3/bin"  # commented out by conda
> initialize
> # Intel
> export
> PATH=$PATH:/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/bin/intel64/
> export
> INCLUDE=$INCLUDE:/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/bin/intel64/
> export
> CPATH=$CPATH:/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/bin/intel64/
> export
> LIBRARY_PATH=$LIBRARY_PATH:/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/bin/intel64/
> export
> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/bin/intel64/
>
> How Can I check that what you suggested was the reason for this error?
> Cheers,
>
> El jue., 5 dic. 2019 a las 18:02, Jeff Squyres (jsquyres) (<
> jsquy...@cisco.com>) escribió:
>
>> How did you try to execute your application?
>>
>> An error message like this can mean that you accidentally mixed versions
>> of Open MPI within your run (e.g., used Open MPI va.b.c on node A but used
>> Open MPI vx.y.z on node B).
>>
>>
>> > On Dec 5, 2019, at 5:28 PM, Guido granda muñoz via users <
>> users@lists.open-mpi.org> wrote:
>> >
>> > Hello open-mpi users,
>> > I'm getting some problem while using openmpi-1.10.3. The executable was
>> compiled using : (ldd output)
>> >
>> > linux-vdso.so.1 =>  (0x7fffd9e8b000)
>> > libhdf5.so.10 =>
>> /home/guido/libraries/compiled_with_intel/hdf5-1.8.20/lib/libhdf5.so.10
>> (0x2ac4313c4000)
>> > libhdf5_fortran.so.10 =>
>> /home/guido/libraries/compiled_with_intel/hdf5-1.8.20/lib/libhdf5_fortran.so.10
>> (0x2ac4319ca000)
>> > libz.so.1 => /lib64/libz.so.1 (0x2ac431c2d000)
>> > libmpi_usempif08.so.11 =>
>> /usr/mpi/intel/openmpi-1.10.3/lib/libmpi_usempif08.so.11
>> (0x2ac431e44000)
>> > libmpi_usempi_ignore_tkr.so.6 =>
>> /usr/mpi/intel/openmpi-1.10.3/lib/libmpi_usempi_ignore_tkr.so.6
>> (0x2ac432077000)
>> > libmpi_mpifh.so.12 =>
>> /usr/mpi/intel/openmpi-1.10.3/lib/libmpi_mpifh.so.12 (0x2ac43228)
>> > libmpi.so.12 => /usr/mpi/intel/openmpi-1.10.3/lib/libmpi.so.12
>> (0x2ac4324df000)
>> > libm.so.6 => /lib64/libm.so.6 (0x2ac4327e7000)
>> > libpthread.so.0 => /lib64/libpthread.so.0 (0x2ac432a6b000)
>> > libc.so.6 => /lib64/libc.so.6 (0x2ac432c88000)
>> > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2ac43301d000)
>> > libdl.so.2 => /lib64/libdl.so.2 (0x2ac433233000)
>> > libimf.so =>
>> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libimf.so
>> (0x2ac433437000)
>> > libsvml.so =>
>> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libsvml.so
>> (0x2ac43381b000)
>> > libintlc.so.5 =>
>> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libintlc.so.5
>> (0x2ac433ec3000)
>> > libifport.so.5 =>
>> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libifport.so.5
>> (0x2ac434013000)
>> > libifcore.so.5 =>
>> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libifcore.so.5
>> (0x2ac43414c000)
>> > libopen-rte.so.12 =>
>> 

Re: [OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-05 Thread Guido granda muñoz via users
Hello Jeff,

Thank you for replying. I ran it using PBS like this:

#!/bin/bash
#PBS -l nodes=2:ppn=32
#PBS -N cond_0_h3
#PBS -o cond_0_h3.o
#PBS -e cond_0_h3.e


PATH=$PATH:/usr/mpi/intel/openmpi-1.10.3/bin
LD_LIBRARY_PATH=/share/apps/composerxe-2011.2.137/lib/intel64:$LD_LIBRARY_PATH
cd $PBS_O_WORKDIR
mpirun -np 64 ./flash4

Besides that, my .bashrc has the following lines:
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
#irya.guido.intel
export PATH=$PATH:/home/guido/libraries/compiled_with_intel/hdf5-1.8.20/bin
export
INCLUDE=$INCLUDE:/home/guido/libraries/compiled_with_intel/hdf5-1.8.20/include
export
CPATH=$CPATH:/home/guido/libraries/compiled_with_intel/hdf5-1.8.20/include
export
LIBRARY_PATH=$LIBRARY_PATH:/home/guido/libraries/compiled_with_intel/hdf5-1.8.20/lib
export
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/guido/libraries/compiled_with_intel/hdf5-1.8.20/lib
###intel open-mpi system###
export PATH=$PATH:/usr/mpi/intel/openmpi-1.10.3/bin
export INCLUDE=$INCLUDE:/usr/mpi/intel/openmpi-1.10.3/include
export CPATH=$CPATH:/usr/mpi/intel/openmpi-1.10.3/include
export LIBRARY_PATH=$LIBRARY_PATH:/usr/mpi/intel/openmpi-1.10.3/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/mpi/intel/openmpi-1.10.3/lib
##anaconda 3 ##
export PATH="$PATH:/home/guido/anaconda3/bin"  # commented out by conda
initialize
# Intel
export
PATH=$PATH:/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/bin/intel64/
export
INCLUDE=$INCLUDE:/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/bin/intel64/
export
CPATH=$CPATH:/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/bin/intel64/
export
LIBRARY_PATH=$LIBRARY_PATH:/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/bin/intel64/
export
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/bin/intel64/

How Can I check that what you suggested was the reason for this error?
Cheers,

El jue., 5 dic. 2019 a las 18:02, Jeff Squyres (jsquyres) (<
jsquy...@cisco.com>) escribió:

> How did you try to execute your application?
>
> An error message like this can mean that you accidentally mixed versions
> of Open MPI within your run (e.g., used Open MPI va.b.c on node A but used
> Open MPI vx.y.z on node B).
>
>
> > On Dec 5, 2019, at 5:28 PM, Guido granda muñoz via users <
> users@lists.open-mpi.org> wrote:
> >
> > Hello open-mpi users,
> > I'm getting some problem while using openmpi-1.10.3. The executable was
> compiled using : (ldd output)
> >
> > linux-vdso.so.1 =>  (0x7fffd9e8b000)
> > libhdf5.so.10 =>
> /home/guido/libraries/compiled_with_intel/hdf5-1.8.20/lib/libhdf5.so.10
> (0x2ac4313c4000)
> > libhdf5_fortran.so.10 =>
> /home/guido/libraries/compiled_with_intel/hdf5-1.8.20/lib/libhdf5_fortran.so.10
> (0x2ac4319ca000)
> > libz.so.1 => /lib64/libz.so.1 (0x2ac431c2d000)
> > libmpi_usempif08.so.11 =>
> /usr/mpi/intel/openmpi-1.10.3/lib/libmpi_usempif08.so.11
> (0x2ac431e44000)
> > libmpi_usempi_ignore_tkr.so.6 =>
> /usr/mpi/intel/openmpi-1.10.3/lib/libmpi_usempi_ignore_tkr.so.6
> (0x2ac432077000)
> > libmpi_mpifh.so.12 =>
> /usr/mpi/intel/openmpi-1.10.3/lib/libmpi_mpifh.so.12 (0x2ac43228)
> > libmpi.so.12 => /usr/mpi/intel/openmpi-1.10.3/lib/libmpi.so.12
> (0x2ac4324df000)
> > libm.so.6 => /lib64/libm.so.6 (0x2ac4327e7000)
> > libpthread.so.0 => /lib64/libpthread.so.0 (0x2ac432a6b000)
> > libc.so.6 => /lib64/libc.so.6 (0x2ac432c88000)
> > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2ac43301d000)
> > libdl.so.2 => /lib64/libdl.so.2 (0x2ac433233000)
> > libimf.so =>
> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libimf.so
> (0x2ac433437000)
> > libsvml.so =>
> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libsvml.so
> (0x2ac43381b000)
> > libintlc.so.5 =>
> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libintlc.so.5
> (0x2ac433ec3000)
> > libifport.so.5 =>
> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libifport.so.5
> (0x2ac434013000)
> > libifcore.so.5 =>
> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libifcore.so.5
> (0x2ac43414c000)
> > libopen-rte.so.12 => /usr/mpi/intel/openmpi-1.10.3/lib/libopen-rte.so.12
> (0x2ac4343ad000)
> > libopen-pal.so.13 => /usr/mpi/intel/openmpi-1.10.3/lib/libopen-pal.so.13
> (0x2ac43464e000)
> > librt.so.1 => /lib64/librt.so.1 (0x2ac43496a000)
> > libutil.so.1 => /lib64/libutil.so.1 (0x2ac434b72000)
> > libifcoremt.so.5 =>
> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libifcoremt.so.5
> (0x2ac434d76000)
> > /lib64/ld-linux-x86-64.so.2 (0x2ac4311a2000)
> >
> > When I run it, I get the following message:
> >
> > [compute-0-34.local:17553] [[5279,0],0] mca_oob_tcp_recv_handler:
> invalid message type: 15
> >
> --
> > 

Re: [OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-05 Thread Jeff Squyres (jsquyres) via users
How did you try to execute your application?

An error message like this can mean that you accidentally mixed versions of 
Open MPI within your run (e.g., used Open MPI va.b.c on node A but used Open 
MPI vx.y.z on node B).


> On Dec 5, 2019, at 5:28 PM, Guido granda muñoz via users 
>  wrote:
> 
> Hello open-mpi users,
> I'm getting some problem while using openmpi-1.10.3. The executable was 
> compiled using : (ldd output)
> 
> linux-vdso.so.1 =>  (0x7fffd9e8b000)
> libhdf5.so.10 => 
> /home/guido/libraries/compiled_with_intel/hdf5-1.8.20/lib/libhdf5.so.10 
> (0x2ac4313c4000)
> libhdf5_fortran.so.10 => 
> /home/guido/libraries/compiled_with_intel/hdf5-1.8.20/lib/libhdf5_fortran.so.10
>  (0x2ac4319ca000)
> libz.so.1 => /lib64/libz.so.1 (0x2ac431c2d000)
> libmpi_usempif08.so.11 => 
> /usr/mpi/intel/openmpi-1.10.3/lib/libmpi_usempif08.so.11 (0x2ac431e44000)
> libmpi_usempi_ignore_tkr.so.6 => 
> /usr/mpi/intel/openmpi-1.10.3/lib/libmpi_usempi_ignore_tkr.so.6 
> (0x2ac432077000)
> libmpi_mpifh.so.12 => /usr/mpi/intel/openmpi-1.10.3/lib/libmpi_mpifh.so.12 
> (0x2ac43228)
> libmpi.so.12 => /usr/mpi/intel/openmpi-1.10.3/lib/libmpi.so.12 
> (0x2ac4324df000)
> libm.so.6 => /lib64/libm.so.6 (0x2ac4327e7000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x2ac432a6b000)
> libc.so.6 => /lib64/libc.so.6 (0x2ac432c88000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2ac43301d000)
> libdl.so.2 => /lib64/libdl.so.2 (0x2ac433233000)
> libimf.so => 
> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libimf.so
>  (0x2ac433437000)
> libsvml.so => 
> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libsvml.so
>  (0x2ac43381b000)
> libintlc.so.5 => 
> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libintlc.so.5
>  (0x2ac433ec3000)
> libifport.so.5 => 
> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libifport.so.5
>  (0x2ac434013000)
> libifcore.so.5 => 
> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libifcore.so.5
>  (0x2ac43414c000)
> libopen-rte.so.12 => /usr/mpi/intel/openmpi-1.10.3/lib/libopen-rte.so.12 
> (0x2ac4343ad000)
> libopen-pal.so.13 => /usr/mpi/intel/openmpi-1.10.3/lib/libopen-pal.so.13 
> (0x2ac43464e000)
> librt.so.1 => /lib64/librt.so.1 (0x2ac43496a000)
> libutil.so.1 => /lib64/libutil.so.1 (0x2ac434b72000)
> libifcoremt.so.5 => 
> /share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libifcoremt.so.5
>  (0x2ac434d76000)
> /lib64/ld-linux-x86-64.so.2 (0x2ac4311a2000)
> 
> When I run it, I get the following message:
> 
> [compute-0-34.local:17553] [[5279,0],0] mca_oob_tcp_recv_handler: invalid 
> message type: 15
> --
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> 
> The executable was also compiled using hdf5-1.8.20
> I really don't know wht this error means, could please help me?
> Cheers,
> -- 
> Guido


-- 
Jeff Squyres
jsquy...@cisco.com



[OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-05 Thread Guido granda muñoz via users
Hello open-mpi users,
I'm getting some problem while using openmpi-1.10.3. The executable was
compiled using : (ldd output)

linux-vdso.so.1 =>  (0x7fffd9e8b000)
libhdf5.so.10 =>
/home/guido/libraries/compiled_with_intel/hdf5-1.8.20/lib/libhdf5.so.10
(0x2ac4313c4000)
libhdf5_fortran.so.10 =>
/home/guido/libraries/compiled_with_intel/hdf5-1.8.20/lib/libhdf5_fortran.so.10
(0x2ac4319ca000)
libz.so.1 => /lib64/libz.so.1 (0x2ac431c2d000)
libmpi_usempif08.so.11 =>
/usr/mpi/intel/openmpi-1.10.3/lib/libmpi_usempif08.so.11
(0x2ac431e44000)
libmpi_usempi_ignore_tkr.so.6 =>
/usr/mpi/intel/openmpi-1.10.3/lib/libmpi_usempi_ignore_tkr.so.6
(0x2ac432077000)
libmpi_mpifh.so.12 => /usr/mpi/intel/openmpi-1.10.3/lib/libmpi_mpifh.so.12
(0x2ac43228)
libmpi.so.12 => /usr/mpi/intel/openmpi-1.10.3/lib/libmpi.so.12
(0x2ac4324df000)
libm.so.6 => /lib64/libm.so.6 (0x2ac4327e7000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x2ac432a6b000)
libc.so.6 => /lib64/libc.so.6 (0x2ac432c88000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2ac43301d000)
libdl.so.2 => /lib64/libdl.so.2 (0x2ac433233000)
libimf.so =>
/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libimf.so
(0x2ac433437000)
libsvml.so =>
/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libsvml.so
(0x2ac43381b000)
libintlc.so.5 =>
/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libintlc.so.5
(0x2ac433ec3000)
libifport.so.5 =>
/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libifport.so.5
(0x2ac434013000)
libifcore.so.5 =>
/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libifcore.so.5
(0x2ac43414c000)
libopen-rte.so.12 => /usr/mpi/intel/openmpi-1.10.3/lib/libopen-rte.so.12
(0x2ac4343ad000)
libopen-pal.so.13 => /usr/mpi/intel/openmpi-1.10.3/lib/libopen-pal.so.13
(0x2ac43464e000)
librt.so.1 => /lib64/librt.so.1 (0x2ac43496a000)
libutil.so.1 => /lib64/libutil.so.1 (0x2ac434b72000)
libifcoremt.so.5 =>
/share/apps/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libifcoremt.so.5
(0x2ac434d76000)
/lib64/ld-linux-x86-64.so.2 (0x2ac4311a2000)

When I run it, I get the following message:

[compute-0-34.local:17553] [[5279,0],0] mca_oob_tcp_recv_handler: invalid
message type: 15
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.

The executable was also compiled using hdf5-1.8.20
I really don't know wht this error means, could please help me?
Cheers,
-- 
Guido