Re: [OMPI users] symbol lookup error for a "hello world" fortran script

2016-11-15 Thread Gilles Gouaillardet
Julien,

the fortran lib is in /usr/lib/libmpi_mpifh.so.12
the C lib is the one from Intel MPI

i guess the C lib is in /usr/lib, and not /usr/lib/openmpi/lib
prepending /usr/lib is never recommended, so i suggest you simply remove
/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib
from your LD_LIBRARY_PATH

Cheers,

Gilles

On Tue, Nov 15, 2016 at 10:33 PM, Julien de Troullioud de Lanversin
 wrote:
> Gilles,
>
>
> Thank you for your fast reply.
>
> When I type whereis mpifort I have the following: mpifort:
> /usr/bin/mpifort.openmpi /usr/bin/mpifort /usr/share/man/man1/mpifort.1.gz
>
> I made sure I exported the LD_LIBRARY_PATH after I prepended
> /usr/lib/openmpi/lib. The same error is produced.
>
> If I type ldd ./test I get the following:
>
> linux-vdso.so.1 =>  (0x7ffde08ef000)
> libmpi_mpifh.so.12 => /usr/lib/libmpi_mpifh.so.12 (0x2b5a2132c000)
> libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3
> (0x2b5a21585000)
> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x2b5a218b)
> libmpi.so.12 =>
> /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib/libmpi.so.12
> (0x2b5a21c7a000)
> libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x2b5a2243c000)
> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> (0x2b5a226d9000)
> libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0
> (0x2b5a228f7000)
> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x2b5a22b36000)
> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> (0x2b5a22e3f000)
> /lib64/ld-linux-x86-64.so.2 (0x55689175a000)
> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x2b5a23056000)
> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x2b5a2325e000)
> libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x2b5a23462000)
> libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5
> (0x2b5a23666000)
> libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1
> (0x2b5a238a)
> libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7
> (0x2b5a23aac000)
>
>
> Thanks.
>
>
>
> Julien
>
>
>
> 2016-11-15 23:52 GMT-05:00 Gilles Gouaillardet
> :
>>
>> Julien,
>>
>> first, make sure you are using the Open MPI wrapper
>> which mpifort
>> should be /usr/lib/openmpi/bin if i understand correctly
>> then make sure you exported your LD_LIBRARY_PATH *after* you prepended
>> the path to Open MPI lib
>> in your .bashrc you can either
>> LD_LIBRARY_PATH=/usr/lib/openmpi/lib:$LD_LIBRARY_PATH
>> export LD_LIBRARY_PATH
>> or directly
>> export LD_LIBRARY_PATH=/usr/lib/openmpi/lib:$LD_LIBRARY_PATH
>>
>> then you can
>> ldd ./test
>> and comfirm all MPI libs (both C and Fortran) are pointing to Open MPI
>>
>> Cheers,
>>
>> Gilles
>>
>> On Tue, Nov 15, 2016 at 9:41 PM, Julien de Troullioud de Lanversin
>>  wrote:
>> > Hi all,
>> >
>> >
>> > I am completely new to MPI (and relatively new to linux). I am sorry if
>> > the
>> > problem I encountered is obvious to solve.
>> >
>> > When I run the following simple test with mpirun:
>> >
>> > program hello_world
>> >
>> >   use mpi
>> >   integer ierr
>> >
>> >   call MPI_INIT ( ierr )
>> >   print *, "Hello world"
>> >   call MPI_FINALIZE ( ierr )
>> >
>> > end program hello_world
>> >
>> >
>> > I get the following error :
>> > ./test: symbol lookup error: /usr/lib/libmpi_mpifh.so.12: undefined
>> > symbol:
>> > mpi_fortran_weights_empty
>> >
>> > I compiled the source code like this: mpifort -o test test.f90
>> >
>> > I look up on the internet and I understand that it is a problem with the
>> > shared library of open mpi. But I think I correctly added the open mpi
>> > lib
>> > to ld_library_path (I added the first directory -- /usr/lib/openmpi/lib
>> > --
>> > via .bashrc). Here is an echo $LD_LIBRARY_PATH:
>> >
>> >
>> > $/usr/lib/openmpi/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/ipp/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/intel64/gcc4.4:/opt/intel/debugger_2016/libipt/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/daal/lib/intel64_lin:/opt/intel/compilers_and_libraries_2016.1.150/linux/daal/../compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/ipp/lib/intel64:/opt/intel
>>
>> /compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/

Re: [OMPI users] symbol lookup error for a "hello world" fortran script

2016-11-15 Thread Julien de Troullioud de Lanversin
Gilles,


Thank you for your fast reply.

When I type whereis mpifort I have the following:
*mpifort: /usr/bin/mpifort.openmpi /usr/bin/mpifort
/usr/share/man/man1/mpifort.1.gz*
I made sure I exported the LD_LIBRARY_PATH after I prepended
/usr/lib/openmpi/lib.
The same error is produced.

If I type ldd ./test I get the following:

















*linux-vdso.so.1 =>  (0x7ffde08ef000)libmpi_mpifh.so.12 =>
/usr/lib/libmpi_mpifh.so.12 (0x2b5a2132c000)libgfortran.so.3 =>
/usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x2b5a21585000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x2b5a218b)
libmpi.so.12 =>
/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib/libmpi.so.12
(0x2b5a21c7a000)libopen-pal.so.13 => /usr/lib/libopen-pal.so.13
(0x2b5a2243c000)libpthread.so.0 =>
/lib/x86_64-linux-gnu/libpthread.so.0 (0x2b5a226d9000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0
(0x2b5a228f7000)libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
(0x2b5a22b36000)libgcc_s.so.1 =>
/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x2b5a22e3f000)
/lib64/ld-linux-x86-64.so.2 (0x55689175a000)librt.so.1 =>
/lib/x86_64-linux-gnu/librt.so.1 (0x2b5a23056000)libdl.so.2 =>
/lib/x86_64-linux-gnu/libdl.so.2 (0x2b5a2325e000)libutil.so.1 =>
/lib/x86_64-linux-gnu/libutil.so.1 (0x2b5a23462000)libhwloc.so.5 =>
/usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x2b5a23666000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1
(0x2b5a238a)libltdl.so.7 =>
/usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x2b5a23aac000)*


Thanks.



Julien



2016-11-15 23:52 GMT-05:00 Gilles Gouaillardet <
gilles.gouaillar...@gmail.com>:

> Julien,
>
> first, make sure you are using the Open MPI wrapper
> which mpifort
> should be /usr/lib/openmpi/bin if i understand correctly
> then make sure you exported your LD_LIBRARY_PATH *after* you prepended
> the path to Open MPI lib
> in your .bashrc you can either
> LD_LIBRARY_PATH=/usr/lib/openmpi/lib:$LD_LIBRARY_PATH
> export LD_LIBRARY_PATH
> or directly
> export LD_LIBRARY_PATH=/usr/lib/openmpi/lib:$LD_LIBRARY_PATH
>
> then you can
> ldd ./test
> and comfirm all MPI libs (both C and Fortran) are pointing to Open MPI
>
> Cheers,
>
> Gilles
>
> On Tue, Nov 15, 2016 at 9:41 PM, Julien de Troullioud de Lanversin
>  wrote:
> > Hi all,
> >
> >
> > I am completely new to MPI (and relatively new to linux). I am sorry if
> the
> > problem I encountered is obvious to solve.
> >
> > When I run the following simple test with mpirun:
> >
> > program hello_world
> >
> >   use mpi
> >   integer ierr
> >
> >   call MPI_INIT ( ierr )
> >   print *, "Hello world"
> >   call MPI_FINALIZE ( ierr )
> >
> > end program hello_world
> >
> >
> > I get the following error :
> > ./test: symbol lookup error: /usr/lib/libmpi_mpifh.so.12: undefined
> symbol:
> > mpi_fortran_weights_empty
> >
> > I compiled the source code like this: mpifort -o test test.f90
> >
> > I look up on the internet and I understand that it is a problem with the
> > shared library of open mpi. But I think I correctly added the open mpi
> lib
> > to ld_library_path (I added the first directory -- /usr/lib/openmpi/lib
> --
> > via .bashrc). Here is an echo $LD_LIBRARY_PATH:
> >
> > $/usr/lib/openmpi/lib:/opt/intel/compilers_and_libraries_
> 2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_
> and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/
> intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/
> opt/intel/compilers_and_libraries_2016.1.150/linux/
> ipp/lib/intel64:/opt/intel/compilers_and_libraries_2016.
> 1.150/linux/compiler/lib/intel64:/opt/intel/compilers_
> and_libraries_2016.1.150/linux/mkl/lib/intel64:/opt/
> intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/
> intel64/gcc4.4:/opt/intel/debugger_2016/libipt/intel64/
> lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/
> daal/lib/intel64_lin:/opt/intel/compilers_and_libraries_
> 2016.1.150/linux/daal/../compiler/lib/intel64_lin:/opt/
> intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/
> intel64:/opt/intel/compilers_and_libraries_2016.1.150/
> linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_
> 2016.1.150/linux/mpi/mic/lib:/opt/intel/compilers_and_
> libraries_2016.1.150/linux/ipp/lib/intel64:/opt/intel
>  /compilers_and_libraries_2016.1.150/linux/compiler/lib/
> intel64:/opt/intel/compilers_and_libraries_2016.1.150/
> linux/mkl/lib/intel64:/opt/intel/compilers_and_libraries_
> 2016.1.150/linux/tbb/lib/intel64/gcc4.4:/opt/intel/
> debugger_2016/libipt/intel64/lib:/opt/intel/compilers_and_
> libraries_2016.1.150/linux/daal/lib/intel64_lin:/opt/
> intel/compilers_and_libraries_2016.1.150/linux/daal/../tbb/
> lib/intel64_lin/gcc4.4:/opt/intel/compilers_and_libraries_
> 2016.1.150/linux/daal/../compiler/lib/intel64_lin
> >
> > The path that my system uses at run time (/usr/lib/) is not in the
> library
> > path. I thus don't know w

Re: [OMPI users] symbol lookup error for a "hello world" fortran script

2016-11-15 Thread Gilles Gouaillardet
Julien,

first, make sure you are using the Open MPI wrapper
which mpifort
should be /usr/lib/openmpi/bin if i understand correctly
then make sure you exported your LD_LIBRARY_PATH *after* you prepended
the path to Open MPI lib
in your .bashrc you can either
LD_LIBRARY_PATH=/usr/lib/openmpi/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH
or directly
export LD_LIBRARY_PATH=/usr/lib/openmpi/lib:$LD_LIBRARY_PATH

then you can
ldd ./test
and comfirm all MPI libs (both C and Fortran) are pointing to Open MPI

Cheers,

Gilles

On Tue, Nov 15, 2016 at 9:41 PM, Julien de Troullioud de Lanversin
 wrote:
> Hi all,
>
>
> I am completely new to MPI (and relatively new to linux). I am sorry if the
> problem I encountered is obvious to solve.
>
> When I run the following simple test with mpirun:
>
> program hello_world
>
>   use mpi
>   integer ierr
>
>   call MPI_INIT ( ierr )
>   print *, "Hello world"
>   call MPI_FINALIZE ( ierr )
>
> end program hello_world
>
>
> I get the following error :
> ./test: symbol lookup error: /usr/lib/libmpi_mpifh.so.12: undefined symbol:
> mpi_fortran_weights_empty
>
> I compiled the source code like this: mpifort -o test test.f90
>
> I look up on the internet and I understand that it is a problem with the
> shared library of open mpi. But I think I correctly added the open mpi lib
> to ld_library_path (I added the first directory -- /usr/lib/openmpi/lib --
> via .bashrc). Here is an echo $LD_LIBRARY_PATH:
>
> $/usr/lib/openmpi/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/ipp/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/intel64/gcc4.4:/opt/intel/debugger_2016/libipt/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/daal/lib/intel64_lin:/opt/intel/compilers_and_libraries_2016.1.150/linux/daal/../compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/ipp/lib/intel64:/opt/intel
 
/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/intel64/gcc4.4:/opt/intel/debugger_2016/libipt/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/daal/lib/intel64_lin:/opt/intel/compilers_and_libraries_2016.1.150/linux/daal/../tbb/lib/intel64_lin/gcc4.4:/opt/intel/compilers_and_libraries_2016.1.150/linux/daal/../compiler/lib/intel64_lin
>
> The path that my system uses at run time (/usr/lib/) is not in the library
> path. I thus don't know why this path is used.
>
> I would be thankful if anyone could help me with this issue.
>
>
> Best,
>
>
> Julien de Troullioud de Lanversin
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] symbol lookup error for a "hello world" fortran script

2016-11-15 Thread Julien de Troullioud de Lanversin
Hi all,


I am completely new to MPI (and relatively new to linux). I am sorry if the
problem I encountered is obvious to solve.

When I run the following simple test with mpirun:










*program hello_world  use mpi  integer ierr  call MPI_INIT (
ierr )  print *, "Hello world"  call MPI_FINALIZE ( ierr )end
program hello_world*


I get the following error :
*./test: symbol lookup error: /usr/lib/libmpi_mpifh.so.12: undefined
symbol: mpi_fortran_weights_empty*

I compiled the source code like this: mpifort -o test test.f90

I look up on the internet and I understand that it is a problem with the
shared library of open mpi. But I think I correctly added the open mpi lib
to ld_library_path (I added the first directory -- /usr/lib/openmpi/lib -- via
.bashrc). Here is an echo $LD_LIBRARY_PATH:

*$/usr/lib/openmpi/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/ipp/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/intel64/gcc4.4:/opt/intel/debugger_2016/libipt/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/daal/lib/intel64_lin:/opt/intel/compilers_and_libraries_2016.1.150/linux/daal/../compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/ipp/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/intel64/gcc4.4:/opt/intel/debugger_2016/libipt/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/daal/lib/intel64_lin:/opt/intel/compilers_and_libraries_2016.1.150/linux/daal/../tbb/lib/intel64_lin/gcc4.4:/opt/intel/compilers_and_libraries_2016.1.150/linux/daal/../compiler/lib/intel64_lin*

The path that my system uses at run time (/usr/lib/) is not in the library
path. I thus don't know why this path is used.

I would be thankful if anyone could help me with this issue.


Best,


Julien de Troullioud de Lanversin
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

2016-11-15 Thread Graham, Nathaniel Richard
​​Hello Gundram,


This seems to be an issue with psm.  I have been communicating with Matias at 
Intel who has forwarded the problem to the correct team so they can investigate 
it.  There is a new version of psm available at the link below.  It will fix 
the issue you are currently seeing, however there are other issues you will 
most likely run into (this is what the team at Intel will be looking into).


https://github.com/01org/psm​


I'd recommend updating your psm and seeing whether you get the issues I was 
seeing (hangs and crashes).  If so you will probably need to disable psm until 
a fix is implemented.


-Nathan


--
Nathaniel Graham
HPC-DES
Los Alamos National Laboratory

From: users  on behalf of Gundram Leifert 

Sent: Tuesday, September 20, 2016 2:13 AM
To: users@lists.open-mpi.org
Subject: Re: [OMPI users] Java-OpenMPI returns with SIGSEGV


Sorry for the delay...

we applied

 ./configure --enable-debug --with-psm --enable-mpi-java 
--with-jdk-dir=/cluster/libraries/java/jdk1.8.0_102/ 
--prefix=/cluster/mpi/gcc/openmpi/2.0.x_nightly
make -j 8 all
make install

Java-test-suite

export OMPI_MCA_osc=pt2pt

./make_onesided &> make_onesided.out
Output: https://gist.github.com/anonymous/f8c6837b6a6d40c806cec9458dfcc1ab


we still sometimes get the SIGSEGV:

WinAllocate with -np = 2:
Exception in thread "main" Exception in thread "main" mpi.MPIException: 
MPI_ERR_INTERN: internal errormpi.MPIException: MPI_ERR_INTERN: internal error

at mpi.Win.allocateSharedWin(Native Method)at 
mpi.Win.allocateSharedWin(Native Method)

at mpi.Win.(Win.java:110)at mpi.Win.(Win.java:110)

at WinAllocate.main(WinAllocate.java:42)at 
WinAllocate.main(WinAllocate.java:42)


WinName with -np = 2:
mpiexec has exited due to process rank 1 with PID 0 on
node node160 exiting improperly. There are three reasons this could occur:



CCreateInfo and Cput with -np 8:
sometimes end with SigSegV  (see 
https://gist.github.com/anonymous/605c19422fd00bdfc4d1ea0151a1f34c ) for 
detailed view.

I hope, this information is helpfull...

Best Regards,
Gundram


On 09/14/2016 08:18 PM, Nathan Hjelm wrote:
We have a new high-speed component for RMA in 2.0.x called osc/rdma. Since the 
component is doing direct rdma on the target we are much more strict about the 
ranges. osc/pt2pt doesn't bother checking at the moment.

Can you build Open MPI with --enable-debug and add -mca osc_base_verbose 100 to 
the mpirun command-line? Please upload the output as a gist 
(https://gist.github.com/) and send a link so we can take a look.

-Nathan

On Sep 14, 2016, at 04:26 AM, Gundram Leifert 
 wrote:


In short words: yes, we compiled with mpijavac and mpicc and run with mpirun 
-np 2.


In long words: we tested the following setups

a) without Java, with mpi 2.0.1 the C-test

[mw314@titan01 mpi_test]$ module list
Currently Loaded Modulefiles:
  1) openmpi/gcc/2.0.1

[mw314@titan01 mpi_test]$ mpirun -np 2 ./a.out
[titan01:18460] *** An error occurred in MPI_Compare_and_swap
[titan01:18460] *** reported by process [3535667201,1]
[titan01:18460] *** on win rdma window 3
[titan01:18460] *** MPI_ERR_RMA_RANGE: invalid RMA address range
[titan01:18460] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[titan01:18460] ***and potentially your MPI job)
[titan01.service:18454] 1 more process has sent help message 
help-mpi-errors.txt / mpi_errors_are_fatal
[titan01.service:18454] Set MCA parameter "orte_base_help_aggregate" to 0 to 
see all help / error messages

b) without Java with mpi 1.8.8 the C-test

[mw314@titan01 mpi_test2]$ module list
Currently Loaded Modulefiles:
  1) openmpi/gcc/1.8.8

[mw314@titan01 mpi_test2]$ mpirun -np 2 ./a.out
 No Errors
[mw314@titan01 mpi_test2]$

c) with java 1.8.8 with jdk and Java-Testsuite

[mw314@titan01 onesided]$ mpijavac TestMpiRmaCompareAndSwap.java
TestMpiRmaCompareAndSwap.java:49: error: cannot find symbol
win.compareAndSwap(next, iBuffer, result, MPI.INT, 
rank, 0);
   ^
  symbol:   method 
compareAndSwap(IntBuffer,IntBuffer,IntBuffer,Datatype,int,int)
  location: variable win of type Win
TestMpiRmaCompareAndSwap.java:53: error: cannot find symbol

>> these java methods are not supported in 1.8.8

d) ompi 2.0.1 and jdk and Testsuite

[mw314@titan01 ~]$ module list
Currently Loaded Modulefiles:
  1) openmpi/gcc/2.0.1   2) java/jdk1.8.0_102

[mw314@titan01 ~]$ cd ompi-java-test/
[mw314@titan01 ompi-java-test]$ ./autogen.sh
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force
autoreconf: configure.ac: tracing
autoreconf: configure.ac: not using Libtool
autoreconf: running: /usr/bin/autoconf --force
autoreconf: configure.ac: not using Autoheader
autoreconf: running: automake --add-missing --copy --force-missing
autoreconf: Leaving directory `.'
[mw314@titan01 ompi-java-test]$ ./configure
Confi

Re: [OMPI users] MPI_ABORT was invoked on rank 0 in communicator compute with errorcode 59

2016-11-15 Thread Gus Correa

Hi Mohammadali

"Signal number 11 SEGV", is the Unix/Linux signal for a memory
violation (a.k.a. segmentation violation or segmentation fault).
This normally happens when the program tries to read
or write in a memory area that it did not allocate, already
freed, or belongs to another process.
That is most likely a programming error on the FEM code,
probably not an MPI error, probably not a PETSC error either.

The "errorcode 59" seems to be the PETSC error message
issued when it receives a signal (in this case a
segmentation fault signal, I guess) from the operational
system (Linux, probably).
Apparently it simply throws the error message and
calls MPI_Abort, and the program stops.
This is what petscerror.h include file has about error code 59:

#define PETSC_ERR_SIG  59   /* signal received */

**

One suggestion is to compile the code with debugging flags (-g),
and attach a debugger to it. Not an easy task if you have many 
processes/ranks in your program, if your debugger is the default

Linux gdb, but it is not impossible to do either.
Depending on the computer you have, you may have a parallel debugger,
such as TotalView or DDT, which are more user friendly.

You could also compile it with the flag -traceback
(or -fbacktrace, the syntax depends on the compiler, check the compiler 
man page).
This at least will tell you the location in the program where the 
segmentation fault happened (in the STDERR file of your job).


I hope this helps.
Gus Correa

PS - The zip attachment with your "myjob.sh" script
was removed from the email.
Many email server programs remove zip for safety.
Files with ".sh" suffix are also removed in general.
You could compress it with gzip or bzip2 instead.

On 11/15/2016 02:40 PM, Beheshti, Mohammadali wrote:

Hi,



I am running simulations in a software which uses ompi to solve an FEM
problem.  From time to time I receive the error “

MPI_ABORT was invoked on rank 0 in communicator compute with errorcode
59” in the output file while for the larger simulations (with larger FEM
mesh) I almost always get this error. I don’t have any idea what is the
cause of this error. The error file contains a PETSC error: ”caught
signal number 11 SEGV”. I am running my jobs on a HPC system which has
Open MPI version 2.0.0.  I am also using a bash file (myjob.sh) which is
attached. The ompi_info - - all  command and ifconfig command outputs
are also attached. I appreciate any help in this regard.



Thanks



Ali





**

Mohammadali Beheshti

Post-Doctoral Fellow

Department of Medicine (Cardiology)

Toronto General Research Institute

University Health Network

Tel: 416-340-4800  ext. 6837



**




This e-mail may contain confidential and/or privileged information for
the sole use of the intended recipient.
Any review or distribution by anyone other than the person for whom it
was originally intended is strictly prohibited.
If you have received this e-mail in error, please contact the sender and
delete all copies.
Opinions, conclusions or other information contained in this e-mail may
not be that of the organization.



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Open MPI State of the Union BOF at SC'16 next week

2016-11-15 Thread Sean Ahern
Make sense. Thanks for making the slides available. Would you mind posting
to the list when the rest of us can get them?

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883

On Tue, Nov 15, 2016 at 10:53 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> On Nov 10, 2016, at 9:31 AM, Jeff Squyres (jsquyres) 
> wrote:
> >
> > The slides will definitely be available afterwards.  We'll see if we can
> make some flavor of recording available as well.
>
> After poking around a bit, it looks like the SC rules prohibit us from
> recording the BOF (which is not unreasonable, actually).
>
> The slides will definitely be available on the Open MPI web site.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Open MPI State of the Union BOF at SC'16 next week

2016-11-15 Thread Jeff Squyres (jsquyres)
On Nov 10, 2016, at 9:31 AM, Jeff Squyres (jsquyres)  wrote:
> 
> The slides will definitely be available afterwards.  We'll see if we can make 
> some flavor of recording available as well.

After poking around a bit, it looks like the SC rules prohibit us from 
recording the BOF (which is not unreasonable, actually).

The slides will definitely be available on the Open MPI web site.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Valgrind errors related to MPI_Win_allocate_shared

2016-11-15 Thread Gilles Gouaillardet
Joseph,

thanks for the report, this is a real memory leak.
i fixed it in master, and the fix is now being reviewed.
meanwhile, you can manually apply the patch available at
https://github.com/open-mpi/ompi/pull/2418.patch

Cheers,

Gilles


On Tue, Nov 15, 2016 at 1:52 AM, Joseph Schuchart  wrote:
> Hi Luke,
>
> Thanks for your reply. From my understanding, the wrappers mainly help catch
> errors on the MPI API level. The errors I reported are well below the API
> layer (please correct me if I'm wrong here) However, I re-ran the code with
> the wrapper loaded via LD_PRELOAD and without the suppresion file and the
> warnings issued by Valgrind for the shmem segment handling code and leaking
> memory from MPI_Win_allocate_shared are basically the same. Nevertheless, I
> am attaching the full log of that run as well.
>
> Cheers
> Joseph
>
>
>
> On 11/14/2016 05:07 PM, D'Alessandro, Luke K wrote:
>>
>> Hi Joesph,
>>
>> I don’t have a solution to your issue, but I’ve found that the valgrind
>> mpi wrapper is necessary to eliminate many of the false positives that the
>> suppressions file can’t.
>>
>>
>> http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap.gettingstarted
>>
>> You should LD_PRELOAD the libmpiwrap from your installation. If it’s not
>> there then you can rebuild valgrind with CC=mpicc to have it built.
>>
>> Hope this helps move you towards a solution.
>>
>> Luke
>>
>>> On Nov 14, 2016, at 5:49 AM, Joseph Schuchart  wrote:
>>>
>>> All,
>>>
>>> I am investigating an MPI application using Valgrind and see a load of
>>> memory leaks reported in MPI-related code. Please find the full log
>>> attached. Some observations/questions:
>>>
>>> 1) According to the information available at
>>> https://www.open-mpi.org/faq/?category=debugging#valgrind_clean the
>>> suppression file should help get a clean run of an MPI application despite
>>> several buffers not being free'd by MPI_Finalize. Is this assumption still
>>> valid? If so, maybe the suppression file needs an update as I still see
>>> reports on leaked memory allocated in MPI_Init?
>>>
>>> 2) There seem to be several invalid reads and writes in the
>>> opal_shmem_segment_* functions. Are they significant or can we regard them
>>> as false positives?
>>>
>>> 3) The code example attached allocates memory using
>>> MPI_Win_allocate_shared and frees it using MPI_Win_free. However, Valgrind
>>> reports some memory to be leaking, e.g.:
>>>
>>> ==4020== 16 bytes in 1 blocks are definitely lost in loss record 21 of
>>> 234
>>> ==4020==at 0x4C2DB8F: malloc (in
>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>> ==4020==by 0xCFDCD47: component_select (osc_sm_component.c:277)
>>> ==4020==by 0x4F39FC3: ompi_osc_base_select (osc_base_init.c:73)
>>> ==4020==by 0x4E945DC: ompi_win_allocate_shared (win.c:272)
>>> ==4020==by 0x4EF6576: PMPI_Win_allocate_shared
>>> (pwin_allocate_shared.c:80)
>>> ==4020==by 0x400E96: main (mpi_dynamic_win_free.c:48)
>>>
>>> Can someone please confirm that we the way the shared window memory is
>>> free'd is actually correct? I noticed that the amount of memory that is
>>> reported to be leaking scales with the number of windows that are allocated
>>> and free'd. In our case this happens in a set of unit tests that all
>>> allocate their own shared memory windows and thus the amount of leaked
>>> memory piles up quite a bit.
>>>
>>> I build the code using GCC 5.4.0 using OpenMPI 2.0.1 and ran it on a
>>> single node. How to reproduce:
>>>
>>> $ mpicc -Wall -ggdb mpi_dynamic_win_free.c -o mpi_dynamic_win_free
>>>
>>> $ mpirun -n 2 valgrind --leak-check=full
>>> --suppressions=$HOME/opt/openmpi-2.0.1/share/openmpi/openmpi-valgrind.supp
>>> ./mpi_dynamic_win_free
>>>
>>> Best regards,
>>> Joseph
>>>
>>> --
>>> Dipl.-Inf. Joseph Schuchart
>>> High Performance Computing Center Stuttgart (HLRS)
>>> Nobelstr. 19
>>> D-70569 Stuttgart
>>>
>>> Tel.: +49(0)711-68565890
>>> Fax: +49(0)711-6856832
>>> E-Mail: schuch...@hlrs.de
>>>
>>>
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> --
> Dipl.-Inf. Joseph Schuchart
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstr. 19
> D-70569 Stuttgart
>
> Tel.: +49(0)711-68565890
> Fax: +49(0)711-6856832
> E-Mail: schuch...@hlrs.de
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] An old code compatibility

2016-11-15 Thread Dave Love
Mahmood Naderan  writes:

> Hi,
> The following mpifort command fails with a syntax error. It seems that the
> code is compatible with old gfortran, but I am not aware of that. Any idea
> about that?
>
> mpifort -ffree-form -ffree-line-length-0 -ff2c -fno-second-underscore
> -I/opt/fftw-3.3.5/include  -O3  -c xml.f90
> xml.F:641.46:
>
>CALL XML_TAG("set", comment="spin "
>   1
> Error: Syntax error in argument list at (1)
>
>
>
>
> In the source code, that line is
>
> CALL XML_TAG("set", comment="spin "//TRIM(ADJUSTL(strcounter)))

Apparently that mpifort is running cpp on .f90 files, and without
--traditional.  I've no idea how it could be set up to do that; gfortran
itself won't do it.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Valgrind errors related to MPI_Win_allocate_shared

2016-11-15 Thread Joseph Schuchart

Hi Luke,

Thanks for your reply. From my understanding, the wrappers mainly help 
catch errors on the MPI API level. The errors I reported are well below 
the API layer (please correct me if I'm wrong here) However, I re-ran 
the code with the wrapper loaded via LD_PRELOAD and without the 
suppresion file and the warnings issued by Valgrind for the shmem 
segment handling code and leaking memory from MPI_Win_allocate_shared 
are basically the same. Nevertheless, I am attaching the full log of 
that run as well.


Cheers
Joseph


On 11/14/2016 05:07 PM, D'Alessandro, Luke K wrote:

Hi Joesph,

I don’t have a solution to your issue, but I’ve found that the valgrind mpi 
wrapper is necessary to eliminate many of the false positives that the 
suppressions file can’t.

http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap.gettingstarted

You should LD_PRELOAD the libmpiwrap from your installation. If it’s not there 
then you can rebuild valgrind with CC=mpicc to have it built.

Hope this helps move you towards a solution.

Luke


On Nov 14, 2016, at 5:49 AM, Joseph Schuchart  wrote:

All,

I am investigating an MPI application using Valgrind and see a load of memory 
leaks reported in MPI-related code. Please find the full log attached. Some 
observations/questions:

1) According to the information available at 
https://www.open-mpi.org/faq/?category=debugging#valgrind_clean the suppression 
file should help get a clean run of an MPI application despite several buffers 
not being free'd by MPI_Finalize. Is this assumption still valid? If so, maybe 
the suppression file needs an update as I still see reports on leaked memory 
allocated in MPI_Init?

2) There seem to be several invalid reads and writes in the 
opal_shmem_segment_* functions. Are they significant or can we regard them as 
false positives?

3) The code example attached allocates memory using MPI_Win_allocate_shared and 
frees it using MPI_Win_free. However, Valgrind reports some memory to be 
leaking, e.g.:

==4020== 16 bytes in 1 blocks are definitely lost in loss record 21 of 234
==4020==at 0x4C2DB8F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4020==by 0xCFDCD47: component_select (osc_sm_component.c:277)
==4020==by 0x4F39FC3: ompi_osc_base_select (osc_base_init.c:73)
==4020==by 0x4E945DC: ompi_win_allocate_shared (win.c:272)
==4020==by 0x4EF6576: PMPI_Win_allocate_shared (pwin_allocate_shared.c:80)
==4020==by 0x400E96: main (mpi_dynamic_win_free.c:48)

Can someone please confirm that we the way the shared window memory is free'd 
is actually correct? I noticed that the amount of memory that is reported to be 
leaking scales with the number of windows that are allocated and free'd. In our 
case this happens in a set of unit tests that all allocate their own shared 
memory windows and thus the amount of leaked memory piles up quite a bit.

I build the code using GCC 5.4.0 using OpenMPI 2.0.1 and ran it on a single 
node. How to reproduce:

$ mpicc -Wall -ggdb mpi_dynamic_win_free.c -o mpi_dynamic_win_free

$ mpirun -n 2 valgrind --leak-check=full 
--suppressions=$HOME/opt/openmpi-2.0.1/share/openmpi/openmpi-valgrind.supp 
./mpi_dynamic_win_free

Best regards,
Joseph

--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de

==27025== Memcheck, a memory error detector
==27025== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==27025== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==27025== Command: ./mpi_dynamic_win_free
==27025== 
==27026== Memcheck, a memory error detector
==27026== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==27026== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==27026== Command: ./mpi_dynamic_win_free
==27026== 
valgrind MPI wrappers 27026: Active for pid 27026
valgrind MPI wrappers 27026: Try MPIWRAP_DEBUG=help for possible options
valgrind MPI wrappers 27025: Active for pid 27025
valgrind MPI wrappers 27025: Try MPIWRAP_DEBUG=help for possible options
==27026== Invalid read of size 4
==27026==at 0x731B166: segment_attach (shmem_mmap_module.c:494)
==27026==by 0x5DA0D3E: opal_shmem_segment_attach (shmem_base_wrappers.c:61)
==27026==by 0xD221D14: component_select (osc_sm_component.c:272)
==27026==by 0x517EFC3: ompi_osc_base_select (osc_base_init.c:73)
==27026==