Just to "close" this thread, the offending bug has been found and corrected (was with MPI I/O implementation) (see https://github.com/open-mpi/ompi/issues/1875).

So with forthcoming OpenMPI 2.0.1 everyhting is fine with PETSc for me.

have a nice day!

Eric


On 25/07/16 03:53 PM, Matthew Knepley wrote:
On Mon, Jul 25, 2016 at 12:44 PM, Eric Chamberland
<[email protected]
<mailto:[email protected]>> wrote:

    Ok,

    here is the 2 points answered:

    #1) got valgrind output... here is the fatal free operation:


Okay, this is not the MatMult scatter, this is for local representations
of ghosted vectors. However, to me
it looks like OpenMPI mistakenly frees its built-in type for MPI_DOUBLE.


    ==107156== Invalid free() / delete / delete[] / realloc()
    ==107156==    at 0x4C2A37C: free (in
    /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==107156==    by 0x1E63CD5F: opal_free (malloc.c:184)
    ==107156==    by 0x27622627: mca_pml_ob1_recv_request_fini
    (pml_ob1_recvreq.h:133)
    ==107156==    by 0x27622C4F: mca_pml_ob1_recv_request_free
    (pml_ob1_recvreq.c:90)
    ==107156==    by 0x1D3EF9DC: ompi_request_free (request.h:362)
    ==107156==    by 0x1D3EFAD5: PMPI_Request_free (prequest_free.c:59)
    ==107156==    by 0x14AE3B9C: VecScatterDestroy_PtoP (vpscat.c:219)
    ==107156==    by 0x14ADEB74: VecScatterDestroy (vscat.c:1860)
    ==107156==    by 0x14A8D426: VecDestroy_MPI (pdvec.c:25)
    ==107156==    by 0x14A33809: VecDestroy (vector.c:432)
    ==107156==    by 0x10A2A5AB: GIREFVecDestroy(_p_Vec*&)
    (girefConfigurationPETSc.h:115)
    ==107156==    by 0x10BA9F14: VecteurPETSc::detruitObjetPETSc()
    (VecteurPETSc.cc:2292)
    ==107156==    by 0x10BA9D0D: VecteurPETSc::~VecteurPETSc()
    (VecteurPETSc.cc:287)
    ==107156==    by 0x10BA9F48: VecteurPETSc::~VecteurPETSc()
    (VecteurPETSc.cc:281)
    ==107156==    by 0x1135A57B:
    PPReactionsAppuiEL3D::~PPReactionsAppuiEL3D()
    (PPReactionsAppuiEL3D.cc:216)
    ==107156==    by 0xCD9A1EA: ProblemeGD::~ProblemeGD() (in
    /home/mefpp_ericc/depots_prepush/GIREF/lib/libgiref_dev_Formulation.so)
    ==107156==    by 0x435702: main (Test.ProblemeGD.icc:381)
    ==107156==  Address 0x1d6acbc0 is 0 bytes inside data symbol
    "ompi_mpi_double"
    --107156-- REDIR: 0x1dda2680 (libc.so.6:__GI_stpcpy) redirected to
    0x4c2f330 (__GI_stpcpy)
    ==107156==
    ==107156== Process terminating with default action of signal 6
    (SIGABRT): dumping core
    ==107156==    at 0x1DD520C7: raise (in /lib64/libc-2.19.so
    <http://libc-2.19.so>)
    ==107156==    by 0x1DD53534: abort (in /lib64/libc-2.19.so
    <http://libc-2.19.so>)
    ==107156==    by 0x1DD4B145: __assert_fail_base (in
    /lib64/libc-2.19.so <http://libc-2.19.so>)
    ==107156==    by 0x1DD4B1F1: __assert_fail (in /lib64/libc-2.19.so
    <http://libc-2.19.so>)
    ==107156==    by 0x27626D12: mca_pml_ob1_send_request_fini
    (pml_ob1_sendreq.h:221)
    ==107156==    by 0x276274C9: mca_pml_ob1_send_request_free
    (pml_ob1_sendreq.c:117)
    ==107156==    by 0x1D3EF9DC: ompi_request_free (request.h:362)
    ==107156==    by 0x1D3EFAD5: PMPI_Request_free (prequest_free.c:59)
    ==107156==    by 0x14AE3C3C: VecScatterDestroy_PtoP (vpscat.c:225)
    ==107156==    by 0x14ADEB74: VecScatterDestroy (vscat.c:1860)
    ==107156==    by 0x14A8D426: VecDestroy_MPI (pdvec.c:25)
    ==107156==    by 0x14A33809: VecDestroy (vector.c:432)
    ==107156==    by 0x10A2A5AB: GIREFVecDestroy(_p_Vec*&)
    (girefConfigurationPETSc.h:115)
    ==107156==    by 0x10BA9F14: VecteurPETSc::detruitObjetPETSc()
    (VecteurPETSc.cc:2292)
    ==107156==    by 0x10BA9D0D: VecteurPETSc::~VecteurPETSc()
    (VecteurPETSc.cc:287)
    ==107156==    by 0x10BA9F48: VecteurPETSc::~VecteurPETSc()
    (VecteurPETSc.cc:281)
    ==107156==    by 0x1135A57B:
    PPReactionsAppuiEL3D::~PPReactionsAppuiEL3D()
    (PPReactionsAppuiEL3D.cc:216)
    ==107156==    by 0xCD9A1EA: ProblemeGD::~ProblemeGD() (in
    /home/mefpp_ericc/depots_prepush/GIREF/lib/libgiref_dev_Formulation.so)
    ==107156==    by 0x435702: main (Test.ProblemeGD.icc:381)


    #2) For the run with -vecscatter_alltoall it works...!

    As an "end user", should I ever modify these VecScatterCreate
    options? How do they change the performances of the code on large
    problems?


Yep, those options are there because the different variants are better
on different architectures, and you can't know which one to pick until
runtime,
(and without experimentation).

  Thanks,

    Matt


    Thanks,

    Eric

    On 25/07/16 02:57 PM, Matthew Knepley wrote:

        On Mon, Jul 25, 2016 at 11:33 AM, Eric Chamberland
        <[email protected]
        <mailto:[email protected]>
        <mailto:[email protected]
        <mailto:[email protected]>>> wrote:

            Hi,

            has someone tried OpenMPI 2.0 with Petsc 3.7.2?

            I am having some errors with petsc, maybe someone have them too?

            Here are the configure logs for PETSc:


        
http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_configure.log


        
http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_RDict.log

            And for OpenMPI:

        
http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_config.log

            (in fact, I am testing the ompi-release branch, a sort of
            petsc-master branch, since I need the commit 9ba6678156).

            For a set of parallel tests, I have 104 that works on 124
        total tests.


        It appears that the fault happens when freeing the VecScatter we
        build
        for MatMult, which contains Request structures
        for the ISends and  IRecvs. These looks like internal OpenMPI
        errors to
        me since the Request should be opaque.
        I would try at least two things:

        1) Run under valgrind.

        2) Switch the VecScatter implementation. All the options are here,


        
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate

        but maybe use alltoall.

          Thanks,

             Matt


            And the typical error:
            *** Error in

        
`/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.dev':
            free(): invalid pointer:
            ======= Backtrace: =========
            /lib64/libc.so.6(+0x7277f)[0x7f80eb11677f]
            /lib64/libc.so.6(+0x78026)[0x7f80eb11c026]
            /lib64/libc.so.6(+0x78d53)[0x7f80eb11cd53]

        
/opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f80ea8f9d60]

        
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f80df0ea628]

        
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f80df0eac50]
            /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f80eb7029dd]

        
/opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f80eb702ad6]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f80f2fa6c6d]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f80f2fa1c45]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0xa9d0f5)[0x7f80f35960f5]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f80f35c2588]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x10bf0f4)[0x7f80f3bb80f4]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f80f3a79fd9]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f80f3d1a334]

            a similar one:
            *** Error in

        
`/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProbFluideIncompressible.dev':
            free(): invalid pointer: 0x00007f382a7c5bc0 ***
            ======= Backtrace: =========
            /lib64/libc.so.6(+0x7277f)[0x7f3829f1c77f]
            /lib64/libc.so.6(+0x78026)[0x7f3829f22026]
            /lib64/libc.so.6(+0x78d53)[0x7f3829f22d53]

        
/opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f38296ffd60]

        
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f381deab628]

        
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f381deabc50]
            /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f382a5089dd]

        
/opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f382a508ad6]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f3831dacc6d]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f3831da7c45]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x9f4755)[0x7f38322f3755]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f38323c8588]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x4e2)[0x7f383287f87a]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f383287ffd9]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f3832b20334]

            another one:

            *** Error in

        
`/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.MortierDiffusion.dev':
            free(): invalid pointer: 0x00007f67b6d37bc0 ***
            ======= Backtrace: =========
            /lib64/libc.so.6(+0x7277f)[0x7f67b648e77f]
            /lib64/libc.so.6(+0x78026)[0x7f67b6494026]
            /lib64/libc.so.6(+0x78d53)[0x7f67b6494d53]

        
/opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f67b5c71d60]

        
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x1adae)[0x7f67aa4cddae]

        
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x1b4ca)[0x7f67aa4ce4ca]
            /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f67b6a7a9dd]

        
/opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f67b6a7aad6]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adb09)[0x7f67be31eb09]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f67be319c45]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4574f7)[0x7f67be2c84f7]

        
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecDestroy+0x648)[0x7f67be26e8da]

            I feel like I should wait until someone else from Petsc have
        tested
            it too...

            Thanks,

            Eric




        --
        What most experimenters take for granted before they begin their
        experiments is infinitely more interesting than any results to which
        their experiments lead.
        -- Norbert Wiener




--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

Reply via email to