Just to "close" this thread, the offending bug has been found and
corrected (was with MPI I/O implementation) (see
https://github.com/open-mpi/ompi/issues/1875).
So with forthcoming OpenMPI 2.0.1 everyhting is fine with PETSc for me.
have a nice day!
Eric
On 25/07/16 03:53 PM, Matthew Knepley wrote:
On Mon, Jul 25, 2016 at 12:44 PM, Eric Chamberland
<[email protected]
<mailto:[email protected]>> wrote:
Ok,
here is the 2 points answered:
#1) got valgrind output... here is the fatal free operation:
Okay, this is not the MatMult scatter, this is for local representations
of ghosted vectors. However, to me
it looks like OpenMPI mistakenly frees its built-in type for MPI_DOUBLE.
==107156== Invalid free() / delete / delete[] / realloc()
==107156== at 0x4C2A37C: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==107156== by 0x1E63CD5F: opal_free (malloc.c:184)
==107156== by 0x27622627: mca_pml_ob1_recv_request_fini
(pml_ob1_recvreq.h:133)
==107156== by 0x27622C4F: mca_pml_ob1_recv_request_free
(pml_ob1_recvreq.c:90)
==107156== by 0x1D3EF9DC: ompi_request_free (request.h:362)
==107156== by 0x1D3EFAD5: PMPI_Request_free (prequest_free.c:59)
==107156== by 0x14AE3B9C: VecScatterDestroy_PtoP (vpscat.c:219)
==107156== by 0x14ADEB74: VecScatterDestroy (vscat.c:1860)
==107156== by 0x14A8D426: VecDestroy_MPI (pdvec.c:25)
==107156== by 0x14A33809: VecDestroy (vector.c:432)
==107156== by 0x10A2A5AB: GIREFVecDestroy(_p_Vec*&)
(girefConfigurationPETSc.h:115)
==107156== by 0x10BA9F14: VecteurPETSc::detruitObjetPETSc()
(VecteurPETSc.cc:2292)
==107156== by 0x10BA9D0D: VecteurPETSc::~VecteurPETSc()
(VecteurPETSc.cc:287)
==107156== by 0x10BA9F48: VecteurPETSc::~VecteurPETSc()
(VecteurPETSc.cc:281)
==107156== by 0x1135A57B:
PPReactionsAppuiEL3D::~PPReactionsAppuiEL3D()
(PPReactionsAppuiEL3D.cc:216)
==107156== by 0xCD9A1EA: ProblemeGD::~ProblemeGD() (in
/home/mefpp_ericc/depots_prepush/GIREF/lib/libgiref_dev_Formulation.so)
==107156== by 0x435702: main (Test.ProblemeGD.icc:381)
==107156== Address 0x1d6acbc0 is 0 bytes inside data symbol
"ompi_mpi_double"
--107156-- REDIR: 0x1dda2680 (libc.so.6:__GI_stpcpy) redirected to
0x4c2f330 (__GI_stpcpy)
==107156==
==107156== Process terminating with default action of signal 6
(SIGABRT): dumping core
==107156== at 0x1DD520C7: raise (in /lib64/libc-2.19.so
<http://libc-2.19.so>)
==107156== by 0x1DD53534: abort (in /lib64/libc-2.19.so
<http://libc-2.19.so>)
==107156== by 0x1DD4B145: __assert_fail_base (in
/lib64/libc-2.19.so <http://libc-2.19.so>)
==107156== by 0x1DD4B1F1: __assert_fail (in /lib64/libc-2.19.so
<http://libc-2.19.so>)
==107156== by 0x27626D12: mca_pml_ob1_send_request_fini
(pml_ob1_sendreq.h:221)
==107156== by 0x276274C9: mca_pml_ob1_send_request_free
(pml_ob1_sendreq.c:117)
==107156== by 0x1D3EF9DC: ompi_request_free (request.h:362)
==107156== by 0x1D3EFAD5: PMPI_Request_free (prequest_free.c:59)
==107156== by 0x14AE3C3C: VecScatterDestroy_PtoP (vpscat.c:225)
==107156== by 0x14ADEB74: VecScatterDestroy (vscat.c:1860)
==107156== by 0x14A8D426: VecDestroy_MPI (pdvec.c:25)
==107156== by 0x14A33809: VecDestroy (vector.c:432)
==107156== by 0x10A2A5AB: GIREFVecDestroy(_p_Vec*&)
(girefConfigurationPETSc.h:115)
==107156== by 0x10BA9F14: VecteurPETSc::detruitObjetPETSc()
(VecteurPETSc.cc:2292)
==107156== by 0x10BA9D0D: VecteurPETSc::~VecteurPETSc()
(VecteurPETSc.cc:287)
==107156== by 0x10BA9F48: VecteurPETSc::~VecteurPETSc()
(VecteurPETSc.cc:281)
==107156== by 0x1135A57B:
PPReactionsAppuiEL3D::~PPReactionsAppuiEL3D()
(PPReactionsAppuiEL3D.cc:216)
==107156== by 0xCD9A1EA: ProblemeGD::~ProblemeGD() (in
/home/mefpp_ericc/depots_prepush/GIREF/lib/libgiref_dev_Formulation.so)
==107156== by 0x435702: main (Test.ProblemeGD.icc:381)
#2) For the run with -vecscatter_alltoall it works...!
As an "end user", should I ever modify these VecScatterCreate
options? How do they change the performances of the code on large
problems?
Yep, those options are there because the different variants are better
on different architectures, and you can't know which one to pick until
runtime,
(and without experimentation).
Thanks,
Matt
Thanks,
Eric
On 25/07/16 02:57 PM, Matthew Knepley wrote:
On Mon, Jul 25, 2016 at 11:33 AM, Eric Chamberland
<[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>> wrote:
Hi,
has someone tried OpenMPI 2.0 with Petsc 3.7.2?
I am having some errors with petsc, maybe someone have them too?
Here are the configure logs for PETSc:
http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_configure.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_RDict.log
And for OpenMPI:
http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_config.log
(in fact, I am testing the ompi-release branch, a sort of
petsc-master branch, since I need the commit 9ba6678156).
For a set of parallel tests, I have 104 that works on 124
total tests.
It appears that the fault happens when freeing the VecScatter we
build
for MatMult, which contains Request structures
for the ISends and IRecvs. These looks like internal OpenMPI
errors to
me since the Request should be opaque.
I would try at least two things:
1) Run under valgrind.
2) Switch the VecScatter implementation. All the options are here,
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate
but maybe use alltoall.
Thanks,
Matt
And the typical error:
*** Error in
`/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.dev':
free(): invalid pointer:
======= Backtrace: =========
/lib64/libc.so.6(+0x7277f)[0x7f80eb11677f]
/lib64/libc.so.6(+0x78026)[0x7f80eb11c026]
/lib64/libc.so.6(+0x78d53)[0x7f80eb11cd53]
/opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f80ea8f9d60]
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f80df0ea628]
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f80df0eac50]
/opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f80eb7029dd]
/opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f80eb702ad6]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f80f2fa6c6d]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f80f2fa1c45]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0xa9d0f5)[0x7f80f35960f5]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f80f35c2588]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x10bf0f4)[0x7f80f3bb80f4]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f80f3a79fd9]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f80f3d1a334]
a similar one:
*** Error in
`/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProbFluideIncompressible.dev':
free(): invalid pointer: 0x00007f382a7c5bc0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7277f)[0x7f3829f1c77f]
/lib64/libc.so.6(+0x78026)[0x7f3829f22026]
/lib64/libc.so.6(+0x78d53)[0x7f3829f22d53]
/opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f38296ffd60]
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f381deab628]
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f381deabc50]
/opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f382a5089dd]
/opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f382a508ad6]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f3831dacc6d]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f3831da7c45]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x9f4755)[0x7f38322f3755]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f38323c8588]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x4e2)[0x7f383287f87a]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f383287ffd9]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f3832b20334]
another one:
*** Error in
`/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.MortierDiffusion.dev':
free(): invalid pointer: 0x00007f67b6d37bc0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7277f)[0x7f67b648e77f]
/lib64/libc.so.6(+0x78026)[0x7f67b6494026]
/lib64/libc.so.6(+0x78d53)[0x7f67b6494d53]
/opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f67b5c71d60]
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x1adae)[0x7f67aa4cddae]
/opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x1b4ca)[0x7f67aa4ce4ca]
/opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f67b6a7a9dd]
/opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f67b6a7aad6]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adb09)[0x7f67be31eb09]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f67be319c45]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4574f7)[0x7f67be2c84f7]
/opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecDestroy+0x648)[0x7f67be26e8da]
I feel like I should wait until someone else from Petsc have
tested
it too...
Thanks,
Eric
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener