On Sep 15, 2014, at 1:01 AM, Matthew Knepley <[email protected]> wrote:
> On Sun, Sep 14, 2014 at 5:56 PM, Christophe Prud'homme > <[email protected]> wrote: > Hello > > Pierre and I found this problem occurring using openmpi 1.8.1 petsc 3.5.1 > (this was done on my side using homebrew on osx) > we haven't checked mpich. However it is disturbing that removing the petsc.h > header actually solve the problem. > > from the mpi standard, if we understand correctly, we shouldn't have to > specify the data type for MPI_IN_PLACE scatter/gather operations. To avoid > either a crash with, deadlock(scatter) or wrong results (gather) with > boost::mpi we need to specify the data type. > > In our code Feel++ we have added some tests to verify this behavior [1] > basically > - mpi alone OK > - boost::mpi alone OK (boost::mpi is just used as an alternative to > initialize mpi) > - mpi+petsc.h with MPI_DATATYPE_NULL in scatter crash > - mpi+petsc.h+proper datatype OK > - boost::mpi.h+petsc with MPI_DATATYPE_NULL hangs in scatter > - boost::mpi.h+petsc+proper datatype OK > > 1. > https://github.com/feelpp/feelpp/blob/develop/testsuite/feelcore/test_gatherscatter.cpp > > 1) I think this is an OpenMPI bug > > 2) I think this because I cannot reproduce with MPICH, and it is not valgrind > clean > > 3) OpenMPI has lots of bugs. If you guys can reproduce with MPICH, I will > track it > down and fix it. True, but they fixed the similar wrong behavior (which was inherited from libNBC) when I pinged them (http://www.open-mpi.org/community/lists/users/2013/11/23034.php). BTW, I had the same problem with master and MPICH 3.1. Anyways, thanks Barry for the quick fix. Pierre > Matt > > > Best regards > C. > > On Mon, Sep 15, 2014 at 12:37 AM, Matthew Knepley <[email protected]> wrote: > On Sun, Sep 14, 2014 at 4:16 PM, Pierre Jolivet <[email protected]> > wrote: > Hello, > Could you please explain to me why the following example is not working > properly when <petsc.h> (from master, with OpenMPI 1.8.1) is included ? > > $ mpicxx in-place.cpp -I$PETSC_DIR/include -I$PETSC_DIR/$PETSC_ARCH/include > -L$PETSC_DIR/$PETSC_ARCH/lib -lpetsc > $ mpirun -np 2 ./a.out > Done with the scatter ! > 0 0 0 0 (this line should be filled with 0) > 1 1 1 1 (this line should be filled with 1) > Done with the gather ! > > $ mpicxx in-place.cpp -I$PETSC_DIR/include -I$PETSC_DIR/$PETSC_ARCH/include > -L$PETSC_DIR/$PETSC_ARCH/lib -lpetsc -DPETSC_BUG > $ mpirun -np 2 ./a.out > [:3367] *** An error occurred in MPI_Type_size > [:3367] *** reported by process [4819779585,140733193388032] > [:3367] *** on communicator MPI_COMM_WORLD > [:3367] *** MPI_ERR_TYPE: invalid datatype > [:3367] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now > abort, > [:3367] *** and potentially your MPI job) > > I just built this with MPICH and it runs fine: > > master:/PETSc3/petsc/petsc-pylith$ > /PETSc3/petsc/petsc-pylith/arch-pylith-cxx-debug/bin/mpiexec -host localhost > -n 2 > /PETSc3/petsc/petsc-pylith/arch-pylith-cxx-debug/lib/in-place-obj/in-place > Done with the scatter ! > 0 0 0 0 (this line should be filled with 0) > 1 1 1 1 (this line should be filled with 1) > Done with the gather ! > > Will valgrind. > > Matt > > Thank you for looking, > Pierre > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > > > -- > Christophe Prud'homme > Feel++ Project Manager > Professor in Applied Mathematics > @ Université Joseph Fourier (Grenoble, France) > @ Université de Strasbourg (France) > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener
