The modules have automatic freeing in as much as that when a variable that is
local to a subroutine is ALLOCATE'd, it is automatically freed when the
subroutine returns. I don't think that is problematic, as MatDestroy is used a
lot in the code and normally executes just fine.
As far as I can see, no specific new communicators are created; MatCreateAIJ or
MatCreateSeqAIJ are called with PETSC_COMM_WORLD, resp. PETSC_COMM_SELF as
first argument.
We also run this with the Intel MPI library, which is based on MPICH. There
this problem does not occur.
The Valgrind run did not produce any new insights (at least not for me), I have
pasted the relevant bits at the end of this message. I did a run on debug
versions of PETSc (v3.14.5) and OpenMPI (v 3.1.2) and I find the following
stack trace with line numbers for each frame. Maybe that helps in further
pinpointing the problem.
0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at
/home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470
1470 if ( ! OMPI_COMM_IS_INTRINSIC((*comm)->c_local_comm)) {
Missing separate debuginfos, use: yum debuginfo-install
libgcc-8.3.1-5.el8.0.2.x86_64 libgfortran-8.3.1-5.el8.0.2.x86_64
libibumad-47mlnx1-1.47329.x86_64 libibverbs-47mlnx1-1.47329.x86_64
libnl3-3.5.0-1.el8.x86_64 libquadmath-8.3.1-5.el8.0.2.x86_64
librdmacm-47mlnx1-1.47329.x86_64 libstdc++-8.3.1-5.el8.0.2.x86_64
libxml2-2.9.7-7.el8.x86_64 numactl-libs-2.0.12-9.el8.x86_64
opensm-libs-5.5.1.MLNX20191120.0c8dde0-0.1.47329.x86_64
openssl-libs-1.1.1c-15.el8.x86_64 python3-libs-3.6.8-23.el8.x86_64
sssd-client-2.2.3-20.el8.x86_64 ucx-cma-1.7.0-1.47329.x86_64
ucx-ib-1.7.0-1.47329.x86_64 xz-libs-5.2.4-3.el8.x86_64
zlib-1.2.11-16.el8_2.x86_64
(gdb) bt
#0 0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at
/home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470
#1 0x0000155540d4f1af in PMPI_Comm_free (comm=0x483f4e0) at pcomm_free.c:62
#2 0x000015555346329a in superlu_gridexit (grid=0x483f4e0) at
/home/mdeij/install-gnu/extLibs/Linux-x86_64-Intel/superlu_dist-6.3.0/SRC/superlu_grid.c:174
#3 0x0000155553ca2ff1 in Petsc_Superlu_dist_keyval_Delete_Fn (comm=0x3921b10,
keyval=16, attr_val=0x483f4d0, extra_state=0x0) at
/home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:97
#4 0x0000155540d0baa1 in ompi_attr_delete_impl (type=COMM_ATTR,
object=0x3921b10, attr_hash=0x377efe0, key=16, predefined=true) at
/home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1062
#5 0x0000155540d0c039 in ompi_attr_delete_all (type=COMM_ATTR,
object=0x3921b10, attr_hash=0x377efe0) at
/home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1166
#6 0x0000155540d11676 in ompi_comm_free (comm=0x7fffffffc5c0) at
/home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1462
#7 0x0000155540d4f1af in PMPI_Comm_free (comm=0x7fffffffc5c0) at
pcomm_free.c:62
#8 0x000015555393fb68 in PetscCommDestroy (comm=0x3943a60) at
/home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/tagm.c:217
#9 0x0000155553941e07 in PetscHeaderDestroy_Private (h=0x3943a20) at
/home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/inherit.c:121
#10 0x000015555408edfe in MatDestroy (A=0x3558c18) at
/home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/matrix.c:1306
#11 0x00001555540cb5fa in matdestroy_ (A=0x3558c18, __ierr=0x7fffffffc73c) at
/home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/ftn-auto/matrixf.c:770
Valgrind output:
==1026905== Invalid read of size 1
==1026905== at 0x19184538: PMPI_Comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x6943B61: superlu_gridexit (in
/home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0)
==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x1912447B: ompi_attr_delete_impl (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x19126FFE: ompi_attr_delete_all (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x1912ACC6: ompi_comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x19184555: PMPI_Comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x4FEE49D: PetscCommDestroy (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x5317899: MatDestroy (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x5336E58: matdestroy_ (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit
(fsi.F90:2297)
==1026905== Address 0x2ce67398 is 11,112 bytes inside an unallocated block of
size 11,232 in arena "client"
==1026905==
==1026905== Invalid read of size 8
==1026905== at 0x1912AC9A: ompi_comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x19184555: PMPI_Comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x6943B61: superlu_gridexit (in
/home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0)
==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x1912447B: ompi_attr_delete_impl (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x19126FFE: ompi_attr_delete_all (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x1912ACC6: ompi_comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x19184555: PMPI_Comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x4FEE49D: PetscCommDestroy (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x5317899: MatDestroy (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x5336E58: matdestroy_ (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== Address 0x2ce673c0 is 11,152 bytes inside an unallocated block of
size 11,232 in arena "client"
==1026905==
==1026905== Invalid read of size 8
==1026905== at 0x19126E5B: ompi_attr_delete_all (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x1912ACC6: ompi_comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x19184555: PMPI_Comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x6943B61: superlu_gridexit (in
/home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0)
==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x1912447B: ompi_attr_delete_impl (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x19126FFE: ompi_attr_delete_all (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x1912ACC6: ompi_comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x19184555: PMPI_Comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x4FEE49D: PetscCommDestroy (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x5317899: MatDestroy (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== Address 0x91 is not stack'd, malloc'd or (recently) free'd
==1026905==
==1026905==
==1026905== Process terminating with default action of signal 11 (SIGSEGV)
==1026905== Access not within mapped region at address 0x91
==1026905== at 0x19126E5B: ompi_attr_delete_all (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x1912ACC6: ompi_comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x19184555: PMPI_Comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x6943B61: superlu_gridexit (in
/home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0)
==1026905== by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x1912447B: ompi_attr_delete_impl (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x19126FFE: ompi_attr_delete_all (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x1912ACC6: ompi_comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x19184555: PMPI_Comm_free (in
/home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
==1026905== by 0x4FEE49D: PetscCommDestroy (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x4FF0EE1: PetscHeaderDestroy_Private (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== by 0x5317899: MatDestroy (in
/home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
==1026905== If you believe this happened as a result of a stack
==1026905== overflow in your program's main thread (unlikely but
==1026905== possible), you can try to increase the size of the
==1026905== main thread stack using the --main-stacksize= flag.
==1026905== The main thread stack size used in this run was 16777216.
dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development
MARIN | T +31 317 49 35 06 | [email protected]<mailto:[email protected]> |
www.marin.nl<http://www.marin.nl>
[LinkedIn]<https://www.linkedin.com/company/marin> [YouTube]
<http://www.youtube.com/marinmultimedia> [Twitter]
<https://twitter.com/MARIN_nieuws> [Facebook]
<https://www.facebook.com/marin.wageningen>
MARIN news: WASP webinar & WiSP
workshop<https://www.marin.nl/news/wasp-webinar-wisp-workshop-april-22>
From: Barry Smith <[email protected]>
Sent: Friday, April 23, 2021 7:09 PM
To: Deij-van Rijswijk, Menno <[email protected]>
Cc: [email protected]
Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and
SUPERLU_DIST
Thanks for looking. Do these modules have any "automatic freeing" when
variables go out of scope (like C++ classes do)?
Do you make specific new MPI communicators to use create the matrices?
Have you tried MPICH or a different version of OpenMPI.
Maybe run the program with valgrind. The stack frames you sent look
"funny", that is I would not normally expect them to be in such an order.
Barry