Are you using complex numbers?


> On May 4, 2021, at 2:55 AM, Deij-van Rijswijk, Menno <[email protected]> wrote:
> 
> 
> Hi Barry,
>  
> Thank you for this message about finalisation. I have checked that 
> PetscFinalize is called after the problematic call to MatDestroy, and that is 
> indeed the case. Furthermore, the module does not use "final".
>  
> Menno
>  
> 
> dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development
> MARIN | T +31 317 49 35 06 | [email protected] <mailto:[email protected]> | 
> www.marin.nl <http://www.marin.nl/>
> 
> <image021333.PNG> <https://www.linkedin.com/company/marin> <image6d0c90.PNG> 
> <http://www.youtube.com/marinmultimedia> <image8d0af3.PNG> 
> <https://twitter.com/MARIN_nieuws> <image92bfe7.PNG> 
> <https://www.facebook.com/marin.wageningen>
> MARIN news: Working paper on the Design of the Wageningen F-series 
> <https://www.marin.nl/news/working-paper-on-the-design-of-the-wageningen-f-series>
> 
> 
> 
> From: Barry Smith <[email protected] <mailto:[email protected]>> 
> Sent: Sunday, May 2, 2021 6:30 PM
> To: Deij-van Rijswijk, Menno <[email protected] <mailto:[email protected]>>
> Cc: [email protected] <mailto:[email protected]>
> Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and 
> SUPERLU_DIST
>  
>  
> ==1026905==    by 0x5317899: MatDestroy (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x5336E58: matdestroy_ (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit 
> (fsi.F90:2297)
> ==1026905==  Address 0x2ce67398 is 11,112 bytes inside an unallocated block 
> of size 11,232 in arena "client"
>  
>    Is it possible that this __fsi_MOD_fem_constructmatricespetscexit  is 
> being called AFTER PetscFinalize()? Perhaps it is defined with a "final" and 
> the compiler/linker schedule it to be called after the program has 
> "completed".
>  
>    This would explain the crash, the valgrind stack frames and why it even 
> does not crash with MPICH. This can happen with C++ destructors in code such 
> as
>  
>    MyC++Class my;  <-- has a destructor that destroys PETSc objects
>    PetscInitialize()
>    ....
>    PetscFinalize()
>    <--  the destructor gets called here and messes with MPI data that no 
> longer exists.
>    return 0;
>    }   
>  
> The fix is to force the destructor to be called before PETSc finalize and 
> this can be done with 
>  
>    PetscInitialize()
>    {
>         MyC++Class my;  <-- has a destructor that destroys PETSc objects
>         ....
>          <--  the destructor gets called here and everything is fine
>    }
>    PetscFinalize()
>    return 0;
>    }   
>  
> I don't know the details of how Fortran's final is implemented but this is my 
> current guess as to what is happening in your code and you need to somehow 
> arrange for the module final to be called before PetscFinalize().
>  
>   Barry
>  
>  
> On Apr 28, 2021, at 7:22 AM, Deij-van Rijswijk, Menno <[email protected] 
> <mailto:[email protected]>> wrote:
>  
>  
> The modules have automatic freeing in as much as that when a variable that is 
> local to a subroutine is ALLOCATE'd, it is automatically freed when the 
> subroutine returns. I don't think that is problematic, as MatDestroy is used 
> a lot in the code and normally executes just fine.
>  
> As far as I can see, no specific new communicators are created; MatCreateAIJ 
> or MatCreateSeqAIJ are called with PETSC_COMM_WORLD, resp. PETSC_COMM_SELF as 
> first argument.
>  
> We also run this with the Intel MPI library, which is based on MPICH. There 
> this problem does not occur.
>  
> The Valgrind run did not produce any new insights (at least not for me), I 
> have pasted the relevant bits at the end of this message. I did a run on 
> debug versions of PETSc (v3.14.5) and OpenMPI (v 3.1.2) and I find the 
> following stack trace with line numbers for each frame. Maybe that helps in 
> further pinpointing the problem.
>  
> 0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at 
> /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470
> 1470            if ( ! OMPI_COMM_IS_INTRINSIC((*comm)->c_local_comm)) {
> Missing separate debuginfos, use: yum debuginfo-install 
> libgcc-8.3.1-5.el8.0.2.x86_64 libgfortran-8.3.1-5.el8.0.2.x86_64 
> libibumad-47mlnx1-1.47329.x86_64 libibverbs-47mlnx1-1.47329.x86_64 
> libnl3-3.5.0-1.el8.x86_64 libquadmath-8.3.1-5.el8.0.2.x86_64 
> librdmacm-47mlnx1-1.47329.x86_64 libstdc++-8.3.1-5.el8.0.2.x86_64 
> libxml2-2.9.7-7.el8.x86_64 numactl-libs-2.0.12-9.el8.x86_64 
> opensm-libs-5.5.1.MLNX20191120.0c8dde0-0.1.47329.x86_64 
> openssl-libs-1.1.1c-15.el8.x86_64 python3-libs-3.6.8-23.el8.x86_64 
> sssd-client-2.2.3-20.el8.x86_64 ucx-cma-1.7.0-1.47329.x86_64 
> ucx-ib-1.7.0-1.47329.x86_64 xz-libs-5.2.4-3.el8.x86_64 
> zlib-1.2.11-16.el8_2.x86_64
> (gdb) bt
> #0  0x0000155540d11719 in ompi_comm_free (comm=0x483f4e0) at 
> /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1470
> #1  0x0000155540d4f1af in PMPI_Comm_free (comm=0x483f4e0) at pcomm_free.c:62
> #2  0x000015555346329a in superlu_gridexit (grid=0x483f4e0) at 
> /home/mdeij/install-gnu/extLibs/Linux-x86_64-Intel/superlu_dist-6.3.0/SRC/superlu_grid.c:174
> #3  0x0000155553ca2ff1 in Petsc_Superlu_dist_keyval_Delete_Fn 
> (comm=0x3921b10, keyval=16, attr_val=0x483f4d0, extra_state=0x0) at 
> /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:97
> #4  0x0000155540d0baa1 in ompi_attr_delete_impl (type=COMM_ATTR, 
> object=0x3921b10, attr_hash=0x377efe0, key=16, predefined=true) at 
> /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1062
> #5  0x0000155540d0c039 in ompi_attr_delete_all (type=COMM_ATTR, 
> object=0x3921b10, attr_hash=0x377efe0) at 
> /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/attribute/attribute.c:1166
> #6  0x0000155540d11676 in ompi_comm_free (comm=0x7fffffffc5c0) at 
> /home/mdeij/build-libs-gnu/superbuild/openmpi/src/ompi/communicator/comm.c:1462
> #7  0x0000155540d4f1af in PMPI_Comm_free (comm=0x7fffffffc5c0) at 
> pcomm_free.c:62
> #8  0x000015555393fb68 in PetscCommDestroy (comm=0x3943a60) at 
> /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/tagm.c:217
> #9  0x0000155553941e07 in PetscHeaderDestroy_Private (h=0x3943a20) at 
> /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/sys/objects/inherit.c:121
> #10 0x000015555408edfe in MatDestroy (A=0x3558c18) at 
> /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/matrix.c:1306
> #11 0x00001555540cb5fa in matdestroy_ (A=0x3558c18, __ierr=0x7fffffffc73c) at 
> /home/mdeij/build-libs-gnu/superbuild/petsc/src/src/mat/interface/ftn-auto/matrixf.c:770
>  
> Valgrind output:
>  
> ==1026905== Invalid read of size 1
> ==1026905==    at 0x19184538: PMPI_Comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x6943B61: superlu_gridexit (in 
> /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0)
> ==1026905==    by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x1912447B: ompi_attr_delete_impl (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x19126FFE: ompi_attr_delete_all (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x1912ACC6: ompi_comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x19184555: PMPI_Comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x4FEE49D: PetscCommDestroy (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x4FF0EE1: PetscHeaderDestroy_Private (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x5317899: MatDestroy (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x5336E58: matdestroy_ (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x1528710: __fsi_MOD_fem_constructmatricespetscexit 
> (fsi.F90:2297)
> ==1026905==  Address 0x2ce67398 is 11,112 bytes inside an unallocated block 
> of size 11,232 in arena "client"
> ==1026905==
> ==1026905== Invalid read of size 8
> ==1026905==    at 0x1912AC9A: ompi_comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x19184555: PMPI_Comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x6943B61: superlu_gridexit (in 
> /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0)
> ==1026905==    by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x1912447B: ompi_attr_delete_impl (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x19126FFE: ompi_attr_delete_all (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x1912ACC6: ompi_comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x19184555: PMPI_Comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x4FEE49D: PetscCommDestroy (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x4FF0EE1: PetscHeaderDestroy_Private (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x5317899: MatDestroy (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x5336E58: matdestroy_ (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==  Address 0x2ce673c0 is 11,152 bytes inside an unallocated block 
> of size 11,232 in arena "client"
> ==1026905==
> ==1026905== Invalid read of size 8
> ==1026905==    at 0x19126E5B: ompi_attr_delete_all (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x1912ACC6: ompi_comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x19184555: PMPI_Comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x6943B61: superlu_gridexit (in 
> /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0)
> ==1026905==    by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x1912447B: ompi_attr_delete_impl (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x19126FFE: ompi_attr_delete_all (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x1912ACC6: ompi_comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x19184555: PMPI_Comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x4FEE49D: PetscCommDestroy (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x4FF0EE1: PetscHeaderDestroy_Private (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x5317899: MatDestroy (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==  Address 0x91 is not stack'd, malloc'd or (recently) free'd
> ==1026905==
> ==1026905==
> ==1026905== Process terminating with default action of signal 11 (SIGSEGV)
> ==1026905==  Access not within mapped region at address 0x91
> ==1026905==    at 0x19126E5B: ompi_attr_delete_all (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x1912ACC6: ompi_comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x19184555: PMPI_Comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x6943B61: superlu_gridexit (in 
> /home/mdeij/install-gnu/extLibs/lib/libsuperlu_dist.so.6.3.0)
> ==1026905==    by 0x56F398E: Petsc_Superlu_dist_keyval_Delete_Fn (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x1912447B: ompi_attr_delete_impl (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x19126FFE: ompi_attr_delete_all (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x1912ACC6: ompi_comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x19184555: PMPI_Comm_free (in 
> /home/mdeij/install-gnu/extLibs/lib/libmpi.so.40.10.2)
> ==1026905==    by 0x4FEE49D: PetscCommDestroy (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x4FF0EE1: PetscHeaderDestroy_Private (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==    by 0x5317899: MatDestroy (in 
> /home/mdeij/install-gnu/extLibs/lib/libpetsc.so.3.14.5)
> ==1026905==  If you believe this happened as a result of a stack
> ==1026905==  overflow in your program's main thread (unlikely but
> ==1026905==  possible), you can try to increase the size of the
> ==1026905==  main thread stack using the --main-stacksize= flag.
> ==1026905==  The main thread stack size used in this run was 16777216.
>  
> 
> dr. ir. Menno A. Deij-van Rijswijk | Researcher | Research & Development
> MARIN | T +31 317 49 35 06 | [email protected] <mailto:[email protected]> | 
> www.marin.nl <http://www.marin.nl/>
> 
> <imagebf865c.PNG> <https://www.linkedin.com/company/marin> <image1edec1.PNG> 
> <http://www.youtube.com/marinmultimedia> <imagedbdbd7.PNG> 
> <https://twitter.com/MARIN_nieuws> <image4abcc0.PNG> 
> <https://www.facebook.com/marin.wageningen>
> MARIN news: WASP webinar & WiSP workshop 
> <https://www.marin.nl/news/wasp-webinar-wisp-workshop-april-22>
>  
>  
>  
> From: Barry Smith <[email protected] <mailto:[email protected]>> 
> Sent: Friday, April 23, 2021 7:09 PM
> To: Deij-van Rijswijk, Menno <[email protected] <mailto:[email protected]>>
> Cc: [email protected] <mailto:[email protected]>
> Subject: Re: [petsc-users] MatDestroy problem with multiple matrices and 
> SUPERLU_DIST
>  
>  
>    Thanks for looking. Do these modules have any "automatic freeing" when 
> variables go out of scope (like C++ classes do)? 
>  
>     Do you make specific new MPI communicators to use create the matrices? 
>  
>     Have you tried MPICH or a different version of OpenMPI. 
>  
>     Maybe run the program with valgrind.  The stack frames you sent look 
> "funny", that is I would not normally expect them to be in such an order.
>  
>    Barry
>  
>  
> 
> Help us improve the spam filter. If this message contains SPAM, click here 
> <https://www.mailcontrol.com/sr/lqKC67CZnPPGX2PQPOmvUhkLFoJbzkFEyBNkQNATPXFrmmQ3cY8Q4d5cDBrY7_s6LHWuLmbsjXSzbAWAmKJQAw==>
>  to report. Thank you, MARIN Digital Services
> 

Reply via email to