[OMPI devel] many-to-one test

2014-11-13 Thread Ralph Castain
Hey folks

Given that the many-to-one test will always fail on our MTT runs, can people 
just “skip” that test? It keeps showing as a false positive in the reports, and 
so I review it for nothing.

Thanks
Ralph



Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-13 Thread Howard Pritchard
Hi Adrian,

Please do your PSM results in the database.  Would be
very much appreciated.

Howard


2014-11-13 7:46 GMT-07:00 Adrian Reber :

> I applied the fix committed on master and described in
>
> https://github.com/open-mpi/ompi/issues/268
>
> on 1.8.3 and 1.8.4rc1 and this seems to have fixed my problems. I can
> include my PSM based mtt results in the main mtt database if desired.
>
> Adrian
>
>
> On Tue, Nov 11, 2014 at 07:42:24PM +0100, Adrian Reber wrote:
> > Using the intel test suite I can reproduce it for example with:
> >
> > $ mpirun --np 2 --map-by ppr:1:node   `pwd`/src/MPI_Allgatherv_c
> > MPITEST info  (0): Starting MPI_Allgatherv() test
> > MPITEST info  (0): Node spec MPITEST_comm_sizes[6]=2 too large, using 1
> > MPITEST info  (0): Node spec MPITEST_comm_sizes[22]=2 too large, using 1
> > MPITEST info  (0): Node spec MPITEST_comm_sizes[32]=2 too large, using 1
> >
> > MPI_Allgatherv_c:9230 terminated with signal 11 at PC=7fc4ced4b150
> SP=7fff45aa2fb0.  Backtrace:
> > /lib64/libpsm_infinipath.so.1(ips_proto_connect+0x630)[0x7fc4ced4b150]
> > /lib64/libpsm_infinipath.so.1(ips_ptl_connect+0x3a)[0x7fc4ced4219a]
> > /lib64/libpsm_infinipath.so.1(__psm_ep_connect+0x3e7)[0x7fc4ced3a727]
> >
> /opt/bwhpc/common/mpi/openmpi/1.8.4-gnu-4.8/lib/libmpi.so.1(ompi_mtl_psm_add_procs+0x1f3)[0x7fc4cf902303]
> >
> /opt/bwhpc/common/mpi/openmpi/1.8.4-gnu-4.8/lib/libmpi.so.1(ompi_comm_get_rprocs+0x49a)[0x7fc4cf7cbc2a]
> >
> /opt/bwhpc/common/mpi/openmpi/1.8.4-gnu-4.8/lib/libmpi.so.1(PMPI_Intercomm_create+0x2f2)[0x7fc4cf7fb602]
> >
> /lustre/lxfs/work/ws/es_test01-open_mpi-0/ompi-tests/intel_tests/src/MPI_Allgatherv_c[0x40f5bf]
> >
> /lustre/lxfs/work/ws/es_test01-open_mpi-0/ompi-tests/intel_tests/src/MPI_Allgatherv_c[0x40edf4]
> >
> /lustre/lxfs/work/ws/es_test01-open_mpi-0/ompi-tests/intel_tests/src/MPI_Allgatherv_c[0x401c80]
> > /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fc4cf1a8af5]
> >
> /lustre/lxfs/work/ws/es_test01-open_mpi-0/ompi-tests/intel_tests/src/MPI_Allgatherv_c[0x401a89]
> > ---
> > Primary job  terminated normally, but 1 process returned
> > a non-zero exit code.. Per user-direction, the job has been aborted.
> > ---
> >
> >
> > On Tue, Nov 11, 2014 at 10:26:52AM -0800, Ralph Castain wrote:
> > > I think it would help understand this if you isolated it down to a
> single test that is failing, rather than just citing an entire test suite.
> For example, we know that the many-to-one test is never going to pass,
> regardless of transport. We also know that the dynamic tests will fail with
> PSM as they are not supported by that transport.
> > >
> > > So could you find one test that doesn’t pass, and give us some info on
> that one?
> > >
> > >
> > > > On Nov 11, 2014, at 10:04 AM, Adrian Reber  wrote:
> > > >
> > > > Some more information about our PSM troubles.
> > > >
> > > > Using 1.6.5 the test suite still works. It fails with 1.8.3 and
> > > > 1.8.4rc1. As long as all processes are running on one node it also
> > > > works. As soon as one process is running on a second node it fails
> with
> > > > the previously described errors. I also tried the 1.8 release and it
> has
> > > > the same error. Another way to trigger it with only two processes is:
> > > >
> > > > mpirun --np 2 --map-by ppr:1:node   mpi_test_suite -t "environment"
> > > >
> > > > Some change introduced between 1.6.5 and 1.8 broke this test case
> with
> > > > PSM. I have not yet been able to upgrade PSM to 3.3 but it seems more
> > > > Open MPI related than PSM.
> > > >
> > > > Intel MPI (4.1.1) has also no troubles running the test cases.
> > > >
> > > >   Adrian
> > > >
> > > > On Mon, Nov 10, 2014 at 09:12:41PM +, Friedley, Andrew wrote:
> > > >> Hi Adrian,
> > > >>
> > > >> Yes, I suggest trying either RH support or Intel's support at
> ibsupp...@intel.com.  They might have seen this problem before.  Since
> you're running the RHEL versions of PSM and related software, one thing you
> could try is IFS.  I think I was running IFS 7.3.0, so that's a difference
> between your setup and mine.  At the least, it may help support nail down
> the issue.
> > > >>
> > > >> Andrew
> > > >>
> > > >>> -Original Message-
> > > >>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of
> Adrian
> > > >>> Reber
> > > >>> Sent: Monday, November 10, 2014 12:39 PM
> > > >>> To: Open MPI Developers
> > > >>> Subject: Re: [OMPI devel] 1.8.3 and PSM errors
> > > >>>
> > > >>> Andrew,
> > > >>>
> > > >>> thanks for looking into this. I was able to reproduce this error
> on RHEL 7 with
> > > >>> PSM provided by RHEL:
> > > >>>
> > > >>> infinipath-psm-3.2-2_ga8c3e3e_open.2.el7.x86_64
> > > >>> infinipath-psm-devel-3.2-2_ga8c3e3e_open.2.el7.x86_64
> > > >>>
> > > >>> $ mpirun -np 32 mpi_test_suite -t "environment"
> > > >>>
> > > >>> mpi_test_suite:4877 

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-13 Thread Adrian Reber
I applied the fix committed on master and described in

https://github.com/open-mpi/ompi/issues/268

on 1.8.3 and 1.8.4rc1 and this seems to have fixed my problems. I can
include my PSM based mtt results in the main mtt database if desired.

Adrian


On Tue, Nov 11, 2014 at 07:42:24PM +0100, Adrian Reber wrote:
> Using the intel test suite I can reproduce it for example with:
> 
> $ mpirun --np 2 --map-by ppr:1:node   `pwd`/src/MPI_Allgatherv_c
> MPITEST info  (0): Starting MPI_Allgatherv() test
> MPITEST info  (0): Node spec MPITEST_comm_sizes[6]=2 too large, using 1
> MPITEST info  (0): Node spec MPITEST_comm_sizes[22]=2 too large, using 1
> MPITEST info  (0): Node spec MPITEST_comm_sizes[32]=2 too large, using 1
> 
> MPI_Allgatherv_c:9230 terminated with signal 11 at PC=7fc4ced4b150 
> SP=7fff45aa2fb0.  Backtrace:
> /lib64/libpsm_infinipath.so.1(ips_proto_connect+0x630)[0x7fc4ced4b150]
> /lib64/libpsm_infinipath.so.1(ips_ptl_connect+0x3a)[0x7fc4ced4219a]
> /lib64/libpsm_infinipath.so.1(__psm_ep_connect+0x3e7)[0x7fc4ced3a727]
> /opt/bwhpc/common/mpi/openmpi/1.8.4-gnu-4.8/lib/libmpi.so.1(ompi_mtl_psm_add_procs+0x1f3)[0x7fc4cf902303]
> /opt/bwhpc/common/mpi/openmpi/1.8.4-gnu-4.8/lib/libmpi.so.1(ompi_comm_get_rprocs+0x49a)[0x7fc4cf7cbc2a]
> /opt/bwhpc/common/mpi/openmpi/1.8.4-gnu-4.8/lib/libmpi.so.1(PMPI_Intercomm_create+0x2f2)[0x7fc4cf7fb602]
> /lustre/lxfs/work/ws/es_test01-open_mpi-0/ompi-tests/intel_tests/src/MPI_Allgatherv_c[0x40f5bf]
> /lustre/lxfs/work/ws/es_test01-open_mpi-0/ompi-tests/intel_tests/src/MPI_Allgatherv_c[0x40edf4]
> /lustre/lxfs/work/ws/es_test01-open_mpi-0/ompi-tests/intel_tests/src/MPI_Allgatherv_c[0x401c80]
> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fc4cf1a8af5]
> /lustre/lxfs/work/ws/es_test01-open_mpi-0/ompi-tests/intel_tests/src/MPI_Allgatherv_c[0x401a89]
> ---
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> ---
> 
> 
> On Tue, Nov 11, 2014 at 10:26:52AM -0800, Ralph Castain wrote:
> > I think it would help understand this if you isolated it down to a single 
> > test that is failing, rather than just citing an entire test suite. For 
> > example, we know that the many-to-one test is never going to pass, 
> > regardless of transport. We also know that the dynamic tests will fail with 
> > PSM as they are not supported by that transport.
> > 
> > So could you find one test that doesn’t pass, and give us some info on that 
> > one?
> > 
> > 
> > > On Nov 11, 2014, at 10:04 AM, Adrian Reber  wrote:
> > > 
> > > Some more information about our PSM troubles.
> > > 
> > > Using 1.6.5 the test suite still works. It fails with 1.8.3 and
> > > 1.8.4rc1. As long as all processes are running on one node it also
> > > works. As soon as one process is running on a second node it fails with
> > > the previously described errors. I also tried the 1.8 release and it has
> > > the same error. Another way to trigger it with only two processes is:
> > > 
> > > mpirun --np 2 --map-by ppr:1:node   mpi_test_suite -t "environment"
> > > 
> > > Some change introduced between 1.6.5 and 1.8 broke this test case with
> > > PSM. I have not yet been able to upgrade PSM to 3.3 but it seems more
> > > Open MPI related than PSM.
> > > 
> > > Intel MPI (4.1.1) has also no troubles running the test cases.
> > > 
> > >   Adrian
> > > 
> > > On Mon, Nov 10, 2014 at 09:12:41PM +, Friedley, Andrew wrote:
> > >> Hi Adrian,
> > >> 
> > >> Yes, I suggest trying either RH support or Intel's support at  
> > >> ibsupp...@intel.com.  They might have seen this problem before.  Since 
> > >> you're running the RHEL versions of PSM and related software, one thing 
> > >> you could try is IFS.  I think I was running IFS 7.3.0, so that's a 
> > >> difference between your setup and mine.  At the least, it may help 
> > >> support nail down the issue.
> > >> 
> > >> Andrew
> > >> 
> > >>> -Original Message-
> > >>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian
> > >>> Reber
> > >>> Sent: Monday, November 10, 2014 12:39 PM
> > >>> To: Open MPI Developers
> > >>> Subject: Re: [OMPI devel] 1.8.3 and PSM errors
> > >>> 
> > >>> Andrew,
> > >>> 
> > >>> thanks for looking into this. I was able to reproduce this error on 
> > >>> RHEL 7 with
> > >>> PSM provided by RHEL:
> > >>> 
> > >>> infinipath-psm-3.2-2_ga8c3e3e_open.2.el7.x86_64
> > >>> infinipath-psm-devel-3.2-2_ga8c3e3e_open.2.el7.x86_64
> > >>> 
> > >>> $ mpirun -np 32 mpi_test_suite -t "environment"
> > >>> 
> > >>> mpi_test_suite:4877 terminated with signal 11 at PC=7f5a2f4a2150
> > >>> SP=7fff9e0ce770.  Backtrace:
> > >>> /lib64/libpsm_infinipath.so.1(ips_proto_connect+0x630)[0x7f5a2f4a2150]
> > >>> /lib64/libpsm_infinipath.so.1(ips_ptl_connect+0x3a)[0x7f5a2f49919a]
> > >>> 

Re: [OMPI devel] Error in version 1.8.3?!

2014-11-13 Thread Gilles Gouaillardet
Harmut,

this is a known bug.

in the mean time, can you give a try to 1.8.4rc1 ?
http://www.open-mpi.org/software/ompi/v1.8/downloads/openmpi-1.8.4rc1.tar.gz

/* if i remember correctly, this is fixed already in the rc1 */

Cheers,

Gilles

On 2014/11/13 19:48, Hartmut Häfner (SCC) wrote:
> Dear developers,
> we have compiled OpenMPI 1.8.3 with Intel Compiler in version 13.1.5 (and 
> 15.0.0). The configure command is:
> ./configure CC=icc CXX=icpc FC=ifort F90=ifort CFLAGS="-O2 -fno-strict-
> aliasing" CXXFLAGS="-O2" \
> FCFLAGS="-O2" --enable-shared --enable-static --enable-mpi-
> fortran=usempif08 --with-verbs --without-psm --with-slurm --
> prefix=/software/all/openmpi
> /1.8.3_intel_13.1
> We also tried the configure command without the option --enable-mpi-fortran
> and additionally with option  -D__INTEL_COMPILER in CFLAGS, CXXFLAGS and 
> FCFLAGS.
>
> If you want to use the subroutine MPI_Sizeof within a Fortran program, you 
> always get a undefined reference.
>
> We also have tried a test program:
> program testbcast
> use mpi
> implicit none
>  
> logical :: a
> integer :: ierror, rank, size
>  
> a = .true.
> call mpi_init(ierror)
> call mpi_sizeof(a, size, ierror)
> print *,size
> call mpi_finalize(ierror)
> end program testbcast
>
> Using 
> mpif90 -o mpi_sizetest mpi_sizetest.f90
> gives
> /scratch/ifortahgFcM.o: In function `MAIN__':
> mpi_sizetest.f90:(.text+0x4c): undefined reference to `mpi_sizeof0dl_'
>
> (Environment variable OMPI_FCFLAGS is unset)
>
> If we use the GNU compiler instead of the Intel compiler, it works!( but then 
> we run into trouble with the module "mpi" using the Intel compiler for our 
> application.)
>
> We did not find any hints on this erroneous behaviour!
>
>
> Sincerly yours
>
>Hartmut Häfner
>
>
>  
> Hartmut Häfner
> Karlsruhe Institute of Technology (KIT)
> University Karlsruhe (TH)
> Steinbuch Centre for Computing (SCC)
> Scientific Computing and Simulation (SCS)
> Zirkel 2 (Campus Süd, Geb. 20.21, Raum 204)
> D-76128 Karlsruhe
>
> Fon +49(0)721 608 44869
> Fax +49(0)721 32550
> hartmut.haef...@kit.edu
>
> http://www.scc.kit.edu/personen/hartmut.haefner
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/11/16307.php



[OMPI devel] Error in version 1.8.3?!

2014-11-13 Thread SCC
Dear developers,
we have compiled OpenMPI 1.8.3 with Intel Compiler in version 13.1.5 (and 
15.0.0). The configure command is:
./configure CC=icc CXX=icpc FC=ifort F90=ifort CFLAGS="-O2 -fno-strict-
aliasing" CXXFLAGS="-O2" \
FCFLAGS="-O2" --enable-shared --enable-static --enable-mpi-
fortran=usempif08 --with-verbs --without-psm --with-slurm --
prefix=/software/all/openmpi
/1.8.3_intel_13.1
We also tried the configure command without the option --enable-mpi-fortran
and additionally with option  -D__INTEL_COMPILER in CFLAGS, CXXFLAGS and 
FCFLAGS.

If you want to use the subroutine MPI_Sizeof within a Fortran program, you 
always get a undefined reference.

We also have tried a test program:
program testbcast
use mpi
implicit none
 
logical :: a
integer :: ierror, rank, size
 
a = .true.
call mpi_init(ierror)
call mpi_sizeof(a, size, ierror)
print *,size
call mpi_finalize(ierror)
end program testbcast

Using 
mpif90 -o mpi_sizetest mpi_sizetest.f90
gives
/scratch/ifortahgFcM.o: In function `MAIN__':
mpi_sizetest.f90:(.text+0x4c): undefined reference to `mpi_sizeof0dl_'

(Environment variable OMPI_FCFLAGS is unset)

If we use the GNU compiler instead of the Intel compiler, it works!( but then 
we run into trouble with the module "mpi" using the Intel compiler for our 
application.)

We did not find any hints on this erroneous behaviour!


Sincerly yours

   Hartmut Häfner


 
Hartmut Häfner
Karlsruhe Institute of Technology (KIT)
University Karlsruhe (TH)
Steinbuch Centre for Computing (SCC)
Scientific Computing and Simulation (SCS)
Zirkel 2 (Campus Süd, Geb. 20.21, Raum 204)
D-76128 Karlsruhe

Fon +49(0)721 608 44869
Fax +49(0)721 32550
hartmut.haef...@kit.edu

http://www.scc.kit.edu/personen/hartmut.haefner