Re: [OMPI users] Invalid read of size 4 (Valgrind error) with OpenMPI 1.8.7

2015-07-25 Thread Ralph Castain
Looks to me like a false positive - we do malloc some space, and do access 
different parts of it. However, it looks like we are inside the space at all 
times.

I’d suppress it


> On Jul 23, 2015, at 12:47 AM, Schlottke-Lakemper, Michael 
>  wrote:
> 
> Hi folks,
> 
> recently we’ve been getting a Valgrind error in PMPI_Init for our suite of 
> regression tests:
> 
> ==5922== Invalid read of size 4
> ==5922==at 0x61CC5C0: opal_os_dirpath_create (in 
> /aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2)
> ==5922==by 0x5F207E5: orte_session_dir (in 
> /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
> ==5922==by 0x5F34F04: orte_ess_base_app_setup (in 
> /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
> ==5922==by 0x7E96679: rte_init (in 
> /aia/opt/openmpi-1.8.7/lib64/openmpi/mca_ess_env.so)
> ==5922==by 0x5F12A77: orte_init (in 
> /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
> ==5922==by 0x509883C: ompi_mpi_init (in 
> /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
> ==5922==by 0x50B843A: PMPI_Init (in 
> /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
> ==5922==by 0xEBA79C: ZFS::run() (in 
> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
> ==5922==by 0x4DC243: main (in 
> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
> ==5922==  Address 0x710f670 is 48 bytes inside a block of size 51 alloc'd
> ==5922==at 0x4C29110: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5922==by 0x61CC572: opal_os_dirpath_create (in 
> /aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2)
> ==5922==by 0x5F207E5: orte_session_dir (in 
> /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
> ==5922==by 0x5F34F04: orte_ess_base_app_setup (in 
> /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
> ==5922==by 0x7E96679: rte_init (in 
> /aia/opt/openmpi-1.8.7/lib64/openmpi/mca_ess_env.so)
> ==5922==by 0x5F12A77: orte_init (in 
> /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
> ==5922==by 0x509883C: ompi_mpi_init (in 
> /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
> ==5922==by 0x50B843A: PMPI_Init (in 
> /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
> ==5922==by 0xEBA79C: ZFS::run() (in 
> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
> ==5922==by 0x4DC243: main (in 
> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
> ==5922==
> 
> What is weird is that it seems to depend on the pbs/torque session we’re in: 
> sometimes the error does not occur and all and all tests run fine (this is in 
> fact the only Valgrind error we’re having at the moment). Other times every 
> single test we’re running has this error.
> 
> Has anyone seen this or might be able to offer an explanation? If it is a 
> false-positive, I’d be happy to suppress it :)
> 
> Thanks a lot in advance
> 
> Michael
> 
> P.S.: This error is not covered/suppressed by the default ompi suppression 
> file in $PREFIX/share/openmpi.
> 
> 
> --
> Michael Schlottke-Lakemper
> 
> SimLab Highly Scalable Fluids & Solids Engineering
> Jülich Aachen Research Alliance (JARA-HPC)
> RWTH Aachen University
> Wüllnerstraße 5a
> 52062 Aachen
> Germany
> 
> Phone: +49 (241) 80 95188
> Fax: +49 (241) 80 92257
> Mail: m.schlottke-lakem...@aia.rwth-aachen.de 
> 
> Web: http://www.jara.org/jara-hpc 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27303.php



Re: [OMPI users] Building OpenMPI 1.8.7 on XC30

2015-07-25 Thread Mark Santcroos
Hi Erik,

Do you really want 1.8.7, otherwise you might want to give latest master a try. 
Other including myself had more luck with that on Cray's, including Edison.

Mark

> On 25 Jul 2015, at 1:35 , Erik Schnetter  wrote:
> 
> I want to build OpenMPI 1.8.7 on a Cray XC30 (Edison at NERSC). I've tried 
> various configuration options, but I am always encountering either OpenMPI 
> build errors, application build errors, or run-time errors.
> 
> I'm currently looking at 
> , which 
> seems to describe my case. I'm now configuring OpenMPI without any options, 
> except setting compilers to clang/gfortran and pointing it to a self-built 
> hwloc. For completeness, here are my configure options as recorded by 
> config.status:
> 
> '/project/projectdirs/m152/schnette/edison/software/src/openmpi-1.8.7/src/openmpi-1.8.7/configure'
>   '--prefix=/project/projectdirs/m152/schnette/edison/software/openmpi-1.8.7' 
> '--with-hwloc=/project/projectdirs/m152/schnette/edison/software/hwloc-1.11.0'
>  '--disable-vt' 
> 'CC=/project/projectdirs/m152/schnette/edison/software/llvm-3.6.2/bin/wrap-clang'
>  
> 'CXX=/project/projectdirs/m152/schnette/edison/software/llvm-3.6.2/bin/wrap-clang++'
>  
> 'FC=/project/projectdirs/m152/schnette/edison/software/gcc-5.2.0/bin/wrap-gfortran'
>  'CFLAGS=-I/opt/ofed/include 
> -I/project/projectdirs/m152/schnette/edison/software/hwloc-1.11.0/include' 
> 'CXXFLAGS=-I/opt/ofed/include 
> -I/project/projectdirs/m152/schnette/edison/software/hwloc-1.11.0/include' 
> 'LDFLAGS=-L/opt/ofed/lib64 
> -L/project/projectdirs/m152/schnette/edison/software/hwloc-1.11.0/lib 
> -Wl,-rpath,/project/projectdirs/m152/schnette/edison/software/hwloc-1.11.0/lib'
>  'LIBS=-lhwloc -lpthread -lpthread' 
> '--with-wrapper-ldflags=-L/project/projectdirs/m152/schnette/edison/software/hwloc-1.11.0/lib
>  
> -Wl,-rpath,/project/projectdirs/m152/schnette/edison/software/hwloc-1.11.0/lib'
>  '--with-wrapper-libs=-lhwloc -lpthread'
> 
> This builds and installs fine, and works when running on a single node. 
> However, multi-node runs are stalling: The queue starts the job, but mpirun 
> produces no output. The "-v" option to mpirun doesn't help.
> 
> When I use aprun instead of mpirun to start my application, then all 
> processes think they are rank 0.
> 
> Do you have any pointers for how to debug this?
> 
> -erik
> 
> -- 
> Erik Schnetter  
> http://www.perimeterinstitute.ca/personal/eschnetter/
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27324.php