Are you using complex numbers?
> On May 4, 2021, at 2:55 AM, Deij-van Rijswijk, Menno <[email protected]> wrote: > > > Hi Barry, > > Thank you for this message about finalisation. I have checked that > PetscFinalize is called after the problematic call to MatDestroy, and that is > indeed the case. Furthermore, the module does not use "final". > > Menno > > > dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development > MARIN | T +31 317 49 35 06 | [email protected] <mailto:[email protected]> | > www.marin.nl <http://www.marin.nl/> > > <image021333.PNG> <https://www.linkedin.com/company/marin> <image6d0c90.PNG> > <http://www.youtube.com/marinmultimedia> <image8d0af3.PNG> > <https://twitter.com/MARIN_nieuws> <image92bfe7.PNG> > <https://www.facebook.com/marin.wageningen> > MARIN news: Working paper on the Design of the Wageningen F-series > <https://www.marin.nl/news/working-paper-on-the-design-of-the-wageningen-f-series> > > > > From: Barry Smith <[email protected] <mailto:[email protected]>> > Sent: Sunday, May 2, 2021 6:30 PM > To: Deij-van Rijswijk, Menno <[email protected] <mailto:[email protected]>> > Cc: [email protected] <mailto:[email protected]> > Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and > SUPERLU_DIST > > > ==1026905== by 0x5317899: MatDestroy (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5336E58: matdestroy_ (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit > (fsi.F90:2297) > ==1026905== Address 0x2ce67398 is 11,112 bytes inside an unallocated block > of size 11,232 in arena "client" > > Is it possible that this __fsi_MOD_fem_constructmatricespetscexit is > being called AFTER PetscFinalize()? Perhaps it is defined with a "final" and > the compiler/linker schedule it to be called after the program has > "completed". > > This would explain the crash, the valgrind stack frames and why it even > does not crash with MPICH. This can happen with C++ destructors in code such > as > > MyC++Class my; <-- has a destructor that destroys PETSc objects > PetscInitialize() > .... > PetscFinalize() > <-- the destructor gets called here and messes with MPI data that no > longer exists. > return 0; > } > > The fix is to force the destructor to be called before PETSc finalize and > this can be done with > > PetscInitialize() > { > MyC++Class my; <-- has a destructor that destroys PETSc objects > .... > <-- the destructor gets called here and everything is fine > } > PetscFinalize() > return 0; > } > > I don't know the details of how Fortran's final is implemented but this is my > current guess as to what is happening in your code and you need to somehow > arrange for the module final to be called before PetscFinalize(). > > Barry > > > On Apr 28, 2021, at 7:22 AM, Deij-van Rijswijk, Menno <[email protected] > <mailto:[email protected]>> wrote: > > > The modules have automatic freeing in as much as that when a variable that is > local to a subroutine is ALLOCATE'd, it is automatically freed when the > subroutine returns. I don't think that is problematic, as MatDestroy is used > a lot in the code and normally executes just fine. > > As far as I can see, no specific new communicators are created; MatCreateAIJ > or MatCreateSeqAIJ are called with PETSC_COMM_WORLD, resp. PETSC_COMM_SELF as > first argument. > > We also run this with the Intel MPI library, which is based on MPICH. There > this problem does not occur. > > The Valgrind run did not produce any new insights (at least not for me), I > have pasted the relevant bits at the end of this message. I did a run on > debug versions of PETSc (v3.14.5) and OpenMPI (v 3.1.2) and I find the > following stack trace with line numbers for each frame. Maybe that helps in > further pinpointing the problem. > > 0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at > /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470 > 1470 if ( ! OMPI_COMM_IS_INTRINSIC((*comm)->c_local_comm)) { > Missing separate debuginfos, use: yum debuginfo-install > libgcc-8.3.1-5.el8.0.2.x86_64 libgfortran-8.3.1-5.el8.0.2.x86_64 > libibumad-47mlnx1-1.47329.x86_64 libibverbs-47mlnx1-1.47329.x86_64 > libnl3-3.5.0-1.el8.x86_64 libquadmath-8.3.1-5.el8.0.2.x86_64 > librdmacm-47mlnx1-1.47329.x86_64 libstdc++-8.3.1-5.el8.0.2.x86_64 > libxml2-2.9.7-7.el8.x86_64 numactl-libs-2.0.12-9.el8.x86_64 > opensm-libs-5.5.1.MLNX20191120.0c8dde0-0.1.47329.x86_64 > openssl-libs-1.1.1c-15.el8.x86_64 python3-libs-3.6.8-23.el8.x86_64 > sssd-client-2.2.3-20.el8.x86_64 ucx-cma-1.7.0-1.47329.x86_64 > ucx-ib-1.7.0-1.47329.x86_64 xz-libs-5.2.4-3.el8.x86_64 > zlib-1.2.11-16.el8_2.x86_64 > (gdb) bt > #0 0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at > /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470 > #1 0x0000155540d4f1af in PMPI_Comm_free (comm=0x483f4e0) at pcomm_free.c:62 > #2 0x000015555346329a in superlu_gridexit (grid=0x483f4e0) at > /home/mdeij/install-gnu/extLibs/Linux-x86_64-Intel/superlu_dist-6.3.0/SRC/superlu_grid.c:174 > #3 0x0000155553ca2ff1 in Petsc_Superlu_dist_keyval_Delete_Fn > (comm=0x3921b10, keyval=16, attr_val=0x483f4d0, extra_state=0x0) at > /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:97 > #4 0x0000155540d0baa1 in ompi_attr_delete_impl (type=COMM_ATTR, > object=0x3921b10, attr_hash=0x377efe0, key=16, predefined=true) at > /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1062 > #5 0x0000155540d0c039 in ompi_attr_delete_all (type=COMM_ATTR, > object=0x3921b10, attr_hash=0x377efe0) at > /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1166 > #6 0x0000155540d11676 in ompi_comm_free (comm=0x7fffffffc5c0) at > /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1462 > #7 0x0000155540d4f1af in PMPI_Comm_free (comm=0x7fffffffc5c0) at > pcomm_free.c:62 > #8 0x000015555393fb68 in PetscCommDestroy (comm=0x3943a60) at > /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/tagm.c:217 > #9 0x0000155553941e07 in PetscHeaderDestroy_Private (h=0x3943a20) at > /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/inherit.c:121 > #10 0x000015555408edfe in MatDestroy (A=0x3558c18) at > /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/matrix.c:1306 > #11 0x00001555540cb5fa in matdestroy_ (A=0x3558c18, __ierr=0x7fffffffc73c) at > /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/ftn-auto/matrixf.c:770 > > Valgrind output: > > ==1026905== Invalid read of size 1 > ==1026905== at 0x19184538: PMPI_Comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x6943B61: superlu_gridexit (in > /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) > ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1912447B: ompi_attr_delete_impl (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19126FFE: ompi_attr_delete_all (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x4FEE49D: PetscCommDestroy (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5317899: MatDestroy (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5336E58: matdestroy_ (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit > (fsi.F90:2297) > ==1026905== Address 0x2ce67398 is 11,112 bytes inside an unallocated block > of size 11,232 in arena "client" > ==1026905== > ==1026905== Invalid read of size 8 > ==1026905== at 0x1912AC9A: ompi_comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x6943B61: superlu_gridexit (in > /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) > ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1912447B: ompi_attr_delete_impl (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19126FFE: ompi_attr_delete_all (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x4FEE49D: PetscCommDestroy (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5317899: MatDestroy (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5336E58: matdestroy_ (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== Address 0x2ce673c0 is 11,152 bytes inside an unallocated block > of size 11,232 in arena "client" > ==1026905== > ==1026905== Invalid read of size 8 > ==1026905== at 0x19126E5B: ompi_attr_delete_all (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x6943B61: superlu_gridexit (in > /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) > ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1912447B: ompi_attr_delete_impl (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19126FFE: ompi_attr_delete_all (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x4FEE49D: PetscCommDestroy (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5317899: MatDestroy (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== Address 0x91 is not stack'd, malloc'd or (recently) free'd > ==1026905== > ==1026905== > ==1026905== Process terminating with default action of signal 11 (SIGSEGV) > ==1026905== Access not within mapped region at address 0x91 > ==1026905== at 0x19126E5B: ompi_attr_delete_all (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x6943B61: superlu_gridexit (in > /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0) > ==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x1912447B: ompi_attr_delete_impl (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19126FFE: ompi_attr_delete_all (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x1912ACC6: ompi_comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x19184555: PMPI_Comm_free (in > /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2) > ==1026905== by 0x4FEE49D: PetscCommDestroy (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== by 0x5317899: MatDestroy (in > /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5) > ==1026905== If you believe this happened as a result of a stack > ==1026905== overflow in your program's main thread (unlikely but > ==1026905== possible), you can try to increase the size of the > ==1026905== main thread stack using the --main-stacksize= flag. > ==1026905== The main thread stack size used in this run was 16777216. > > > dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development > MARIN | T +31 317 49 35 06 | [email protected] <mailto:[email protected]> | > www.marin.nl <http://www.marin.nl/> > > <imagebf865c.PNG> <https://www.linkedin.com/company/marin> <image1edec1.PNG> > <http://www.youtube.com/marinmultimedia> <imagedbdbd7.PNG> > <https://twitter.com/MARIN_nieuws> <image4abcc0.PNG> > <https://www.facebook.com/marin.wageningen> > MARIN news: WASP webinar & WiSP workshop > <https://www.marin.nl/news/wasp-webinar-wisp-workshop-april-22> > > > > From: Barry Smith <[email protected] <mailto:[email protected]>> > Sent: Friday, April 23, 2021 7:09 PM > To: Deij-van Rijswijk, Menno <[email protected] <mailto:[email protected]>> > Cc: [email protected] <mailto:[email protected]> > Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and > SUPERLU_DIST > > > Thanks for looking. Do these modules have any "automatic freeing" when > variables go out of scope (like C++ classes do)? > > Do you make specific new MPI communicators to use create the matrices? > > Have you tried MPICH or a different version of OpenMPI. > > Maybe run the program with valgrind. The stack frames you sent look > "funny", that is I would not normally expect them to be in such an order. > > Barry > > > > Help us improve the spam filter. If this message contains SPAM, click here > <https://www.mailcontrol.com/sr/lqKC67CZnPPGX2PQPOmvUhkLFoJbzkFEyBNkQNATPXFrmmQ3cY8Q4d5cDBrY7_s6LHWuLmbsjXSzbAWAmKJQAw==> > to report. Thank you, MARIN Digital Services >
