Mark,

   Can you run in valgrind? 

   Exactly what BLAS are you using? 

   Barry


> On Aug 24, 2020, at 7:54 AM, Mark Lohry <[email protected]> wrote:
> 
> Reran with debug mode and got a stack trace for this bus error, looks like 
> it's happening in BLASgemv, see pasted below. I did take care of the 
> ISColoring leak mentioned previously, although that was a very small amount 
> of data and I don't think is relevant here.
> 
> At this point it's happily run 222 timesteps prior to this, so I'm a little 
> mystified. Any ideas?
> 
> Thanks,
> Mark
> 
> 
> 222 TS dt 0.03 time 6.66
>     0 SNES Function norm 4.124287265556e+02 
>       0 KSP Residual norm 4.124287265556e+02 
>       1 KSP Residual norm 4.123248052318e+02 
>       2 KSP Residual norm 4.123173350456e+02 
>       3 KSP Residual norm 4.118769044110e+02 
>       4 KSP Residual norm 4.094856150740e+02 
>       5 KSP Residual norm 4.006000788078e+02 
>       6 KSP Residual norm 3.787922969183e+02 
> [clip]
>     Linear solve converged due to CONVERGED_RTOL iterations 9
>         Line search: Using full step: fnorm 4.015236590684e+01 gnorm 
> 3.173434863784e+00
>     2 SNES Function norm 3.173434863784e+00 
>   Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2
>     0 SNES Function norm 5.842010710080e+02 
>       0 KSP Residual norm 5.842010710080e+02 
>       1 KSP Residual norm 5.840526408234e+02 
>       2 KSP Residual norm 5.840431857354e+02 
>       3 KSP Residual norm 5.834351392302e+02 
>       4 KSP Residual norm 5.800901047861e+02 
>       5 KSP Residual norm 5.675562288567e+02 
>       6 KSP Residual norm 5.366287895681e+02 
>       7 KSP Residual norm 4.725811521866e+02 
> [911]PETSC ERROR: 
> ------------------------------------------------------------------------
> [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal 
> memory access
> [911]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [911]PETSC ERROR: or see 
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind 
> <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
> [911]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/> on 
> GNU/linux and Apple Mac OS X to find memory corruption errors
> [911]PETSC ERROR: likely location of problem given in stack below
> [911]PETSC ERROR: ---------------------  Stack Frames 
> ------------------------------------
> [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [911]PETSC ERROR:       INSTEAD the line number of the start of the function
> [911]PETSC ERROR:       is given.
> [911]PETSC ERROR: [911] BLASgemv line 1393 
> /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c
> [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 
> /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c
> [911]PETSC ERROR: [911] MatSolve line 3354 
> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c
> [911]PETSC ERROR: [911] PCApply_ILU line 201 
> /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c
> [911]PETSC ERROR: [911] PCApply line 426 
> /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c
> [911]PETSC ERROR: [911] KSP_PCApply line 279 
> /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h
> [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 
> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c
> [911]PETSC ERROR: [911] KSPSolve_Private line 590 
> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c
> [911]PETSC ERROR: [911] KSPSolve line 848 
> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c
> [911]PETSC ERROR: [911] PCApply_ASM line 441 
> /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c
> [911]PETSC ERROR: [911] PCApply line 426 
> /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c
> [911]PETSC ERROR: [911] KSP_PCApply line 279 
> /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h
> [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 
> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
> [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 
> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
> [911]PETSC ERROR: [911] KSPSolve_Private line 590 
> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c
> [911]PETSC ERROR: [911] KSPSolve line 848 
> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c
> [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 
> /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c
> [911]PETSC ERROR: [911] SNESSolve line 4403 
> /home/mlohry/build/external/petsc/src/snes/interface/snes.c
> [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 
> /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c
> [911]PETSC ERROR: [911] TSStep line 3682 
> /home/mlohry/build/external/petsc/src/ts/interface/ts.c
> [911]PETSC ERROR: [911] TSSolve line 4005 
> /home/mlohry/build/external/petsc/src/ts/interface/ts.c
> [911]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [911]PETSC ERROR: Signal received
> [911]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
> [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 
> [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by mlohry 
> Sun Aug 23 19:54:21 2020
> [911]PETSC ERROR: Configure options 
> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt 
> --with-cc=/usr/local/openmpi/3.1.3/gcc/x8
> [911]PETSC ERROR: #1 User provided function() line 0 in  unknown file
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD
> 
> On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry <[email protected] 
> <mailto:[email protected]>> wrote:
>    Perhaps you are calling ISColoringGetIS() and not calling 
> ISColoringRestoreIS()? 
> 
> I have matching ISColoringGet/Restore here, and it's only used prior to the 
> first iteration so at least it doesn't seem to be growing. At the bottom I 
> pasted the malloc_view and malloc_debug output from running 1 time step.
> 
> I'm sort of thinking this might be a red herring -- is it possible the rank 0 
> process is chewing up dramatically more memory than others, like with logging 
> or something? Like I mentioned earlier the total memory usage is well under 
> the machine limits. I'll spring in some PetscMemoryGetMaximumUsage logging at 
> every time step and try to get a big job going again.
> 
> 
> 
>    Are you using Fortran? 
> 
> C++ 
> 
> 
> 
> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c
> [ 0]80 bytes PetscSplitReductionCreate() line 57 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c
> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c
> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]80 bytes PetscLayoutCreate() line 55 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]16 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]16 bytes ISCreate_General() line 647 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]896 bytes ISCreate() line 37 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]80 bytes PetscLayoutCreate() line 55 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]16 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]16 bytes ISCreate_General() line 647 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]896 bytes ISCreate() line 37 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]80 bytes PetscLayoutCreate() line 55 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]16 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]16 bytes ISCreate_General() line 647 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]896 bytes ISCreate() line 37 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]80 bytes PetscLayoutCreate() line 55 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]16 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]16 bytes ISCreate_General() line 647 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]896 bytes ISCreate() line 37 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]80 bytes PetscLayoutCreate() line 55 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]16 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]16 bytes ISCreate_General() line 647 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]896 bytes ISCreate() line 37 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]80 bytes PetscLayoutCreate() line 55 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]16 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]16 bytes ISCreate_General() line 647 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]896 bytes ISCreate() line 37 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]80 bytes PetscLayoutCreate() line 55 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]16 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]16 bytes ISCreate_General() line 647 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]896 bytes ISCreate() line 37 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]80 bytes PetscLayoutCreate() line 55 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
> [ 0]16 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]32 bytes PetscStrallocpy() line 187 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
> [ 0]16 bytes ISCreate_General() line 647 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
> [ 0]896 bytes ISCreate() line 37 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
> [ 0]64 bytes ISColoringGetIS() line 266 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c
> [ 0]32 bytes PetscCommDuplicate() line 129 in 
> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c
> [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire process 
> 719073280
> [0] Memory usage sorted by function
> [0] 6 192 DMCoarsenHookAdd()
> [0] 2 9984 DMCreate()
> [0] 2 128 DMCreate_Shell()
> [0] 2 64 DMDSEnlarge_Static()
> [0] 1 672 DMKSPCreate()
> [0] 3 96 DMRefineHookAdd()
> [0] 3 2064 DMSNESCreate()
> [0] 4 128 DMSubDomainHookAdd()
> [0] 1 768 DMTSCreate()
> [0] 2 96 ISColoringCreate()
> [0] 8 12608 ISColoringGetIS()
> [0] 1 307200 ISConcatenate()
> [0] 29 25984 ISCreate()
> [0] 25 400 ISCreate_General()
> [0] 4 64 ISCreate_Stride()
> [0] 20 338016 ISGeneralSetIndices_General()
> [0] 3 921600 ISGetIndices_Stride()
> [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic()
> [0] 1 6144 ISInvertPermutation_General()
> [0] 3 308576 ISLocalToGlobalMappingCreate()
> [0] 2 32 KSPConvergedDefaultCreate()
> [0] 2 2816 KSPCreate()
> [0] 1 224 KSPCreate_FGMRES()
> [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization()
> [0] 2 16032 KSPSetUp_FGMRES()
> [0] 4 16084160 KSPSetUp_GMRES()
> [0] 2 36864 MatColoringApply_SL()
> [0] 1 656 MatColoringCreate()
> [0] 6 17088 MatCreate()
> [0] 1 16 MatCreateMFFD_WP()
> [0] 1 16 MatCreateSubMatrices_SeqBAIJ()
> [0] 1 12288 MatCreateSubMatrix_SeqBAIJ()
> [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private()
> [0] 2 1472 MatCreate_MFFD()
> [0] 1 416 MatCreate_SeqAIJ()
> [0] 3 864 MatCreate_SeqBAIJ()
> [0] 2 416 MatCreate_Shell()
> [0] 1 784 MatFDColoringCreate()
> [0] 2 12288 MatFDColoringDegreeSequence_Minpack()
> [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ()
> [0] 3 42512 MatGetColumnIJ_SeqAIJ()
> [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color()
> [0] 1 6144 MatGetOrdering_Natural()
> [0] 2 36384 MatGetRowIJ_SeqAIJ()
> [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ()
> [0] 2 313376 MatIncreaseOverlap_SeqBAIJ()
> [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N()
> [0] 1 6144 MatMarkDiagonal_SeqAIJ()
> [0] 1 6144 MatMarkDiagonal_SeqBAIJ()
> [0] 8 256 MatRegisterRootName()
> [0] 1 6160 MatSeqAIJCheckInode()
> [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ()
> [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ()
> [0] 13 576 MatSolverTypeRegister()
> [0] 1 16 PCASMCreateSubdomains()
> [0] 2 1664 PCCreate()
> [0] 1 160 PCCreate_ASM()
> [0] 1 192 PCCreate_ILU()
> [0] 5 307264 PCSetUp_ASM()
> [0] 2 416 PetscBTCreate()
> [0] 2 3216 PetscClassPerfLogCreate()
> [0] 2 1616 PetscClassRegLogCreate()
> [0] 2 32 PetscCommBuildTwoSided_Allreduce()
> [0] 2 64 PetscCommDuplicate()
> [0] 2 1888 PetscDSCreate()
> [0] 2 26416 PetscEventPerfLogCreate()
> [0] 2 158400 PetscEventPerfLogEnsureSize()
> [0] 2 1616 PetscEventRegLogCreate()
> [0] 2 9600 PetscEventRegLogRegister()
> [0] 8 102400 PetscFreeSpaceGet()
> [0] 474 15168 PetscFunctionListAdd_Private()
> [0] 2 528 PetscIntStackCreate()
> [0] 142 11360 PetscLayoutCreate()
> [0] 56 896 PetscLayoutSetUp()
> [0] 59 9440 PetscObjectComposedDataIncreaseReal()
> [0] 2 576 PetscObjectListAdd()
> [0] 33 768 PetscOptionsGetEList()
> [0] 1 16 PetscOptionsHelpPrintedCreate()
> [0] 1 32 PetscPushSignalHandler()
> [0] 7 6944 PetscSFCreate()
> [0] 3 432 PetscSFCreate_Basic()
> [0] 2 1472 PetscSFLinkCreate()
> [0] 11 1229040 PetscSFSetUpRanks()
> [0] 7 614512 PetscSFSetUp_Basic()
> [0] 4 20096 PetscSegBufferCreate()
> [0] 2 1488 PetscSplitReductionCreate()
> [0] 2 3008 PetscStageLogCreate()
> [0] 1148 23872 PetscStrallocpy()
> [0] 6 13056 PetscStrreplace()
> [0] 9 3456 PetscTableCreate()
> [0] 1 16 PetscViewerASCIIOpen()
> [0] 6 96 PetscViewerAndFormatCreate()
> [0] 1 752 PetscViewerCreate()
> [0] 1 96 PetscViewerCreate_ASCII()
> [0] 2 1424 SNESCreate()
> [0] 1 16 SNESCreate_NEWTONLS()
> [0] 1 1008 SNESLineSearchCreate()
> [0] 1 16 SNESLineSearchCreate_BT()
> [0] 16 1824 SNESMSRegister()
> [0] 46 9056 TSARKIMEXRegister()
> [0] 1 1264 TSAdaptCreate()
> [0] 8 384 TSBasicSymplecticRegister()
> [0] 1 2160 TSCreate()
> [0] 1 224 TSCreate_Theta()
> [0] 48 5968 TSGLEERegister()
> [0] 41 7728 TSRKRegister()
> [0] 89 14736 TSRosWRegister()
> [0] 71 110192 VecCreate()
> [0] 1 307200 VecCreateGhostWithArray()
> [0] 123 36874080 VecCreate_MPI_Private()
> [0] 7 4300800 VecCreate_Seq()
> [0] 8 256 VecCreate_Seq_Private()
> [0] 6 400 VecDuplicateVecs_Default()
> [0] 3 2352 VecScatterCreate()
> [0] 7 1843296 VecScatterSetUp_SF()
> [0] 126 2016 VecStashCreate_Private()
> [0] 1 3072 mapBlockColoringToJacobian()
> 
> On Wed, Aug 12, 2020 at 4:22 PM Barry Smith <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>    Yes, there are some PETSc objects or arrays that you are not freeing so 
> they are printed at the end of the run. For small runs this harmless but if 
> new objects/memory is allocated at each iteration and not suitably freed it 
> will eventually add up.
> 
>     Run with -malloc_view (small problem with say 2 iterations) it will print 
> everything allocated and might be helpful.
> 
>    Perhaps you are calling ISColoringGetIS() and not calling 
> ISColoringRestoreIS()? 
> 
>    It is also possible it is a leak in PETSc, but that is unlikely since we 
> test for them.
> 
>    Are you using Fortran? 
> 
>   Barry
> 
> 
>> On Aug 12, 2020, at 1:29 PM, Mark Lohry <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Thanks Matt and Barry. At Matt's suggestion I ran a smaller representative 
>> case with valgrind and didn't see anything alarming (apart from a small leak 
>> in an older boost version I was using: 
>> https://github.com/boostorg/serialization/issues/104 
>> <https://github.com/boostorg/serialization/issues/104>  although I don't 
>> think this was causing the issue).
>> 
>> -malloc_debug dumps quite a lot, this is supposed to be empty right? Output 
>> pasted below. It looks like the same sequence of calls is repeated 8 times, 
>> which is how many nonlinear solves occurred in this particular run. Thoughts?
>> 
>> 
>> 
>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c
>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c
>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c
>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]80 bytes PetscLayoutCreate() line 55 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]16 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]16 bytes ISCreate_General() line 647 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]896 bytes ISCreate() line 37 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]80 bytes PetscLayoutCreate() line 55 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]16 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]16 bytes ISCreate_General() line 647 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]896 bytes ISCreate() line 37 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]80 bytes PetscLayoutCreate() line 55 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]16 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]16 bytes ISCreate_General() line 647 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]896 bytes ISCreate() line 37 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]80 bytes PetscLayoutCreate() line 55 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]16 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]16 bytes ISCreate_General() line 647 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]896 bytes ISCreate() line 37 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]80 bytes PetscLayoutCreate() line 55 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]16 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]16 bytes ISCreate_General() line 647 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]896 bytes ISCreate() line 37 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]80 bytes PetscLayoutCreate() line 55 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]16 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]16 bytes ISCreate_General() line 647 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]896 bytes ISCreate() line 37 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]80 bytes PetscLayoutCreate() line 55 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]16 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]16 bytes ISCreate_General() line 647 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]896 bytes ISCreate() line 37 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]16 bytes PetscLayoutSetUp() line 269 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]80 bytes PetscLayoutCreate() line 55 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c
>> [ 0]16 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]32 bytes PetscStrallocpy() line 187 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c
>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c
>> [ 0]16 bytes ISCreate_General() line 647 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c
>> [ 0]896 bytes ISCreate() line 37 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c
>> [ 0]64 bytes ISColoringGetIS() line 266 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c
>> [ 0]32 bytes PetscCommDuplicate() line 129 in 
>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c
>> 
>> 
>> 
>> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>    Mark.
>> 
>>     When valgrind is not feasible (like on many centrally controlled batch 
>> systems) you can run PETSc with an extra flag to do some memory error checks
>>  -malloc_debug
>> 
>>  this 
>> 
>> 1) fills all malloced memory with Nan so if the code is using uninitialized 
>> memory it may be detected and 
>> 2) checks the beginning and end of each alloced memory region for 
>> out-of-bounds writes at each malloc and free.
>> 
>> it will slow the code down a little bit but generally not a huge amount.
>> 
>> It is no where near as good as valgrind or other memory corruption tools but 
>> it has the advantage you can run it anywhere on any size job.
>> 
>> 
>>   Barry
>> 
>> 
>> 
>> 
>> 
>>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> I'm getting seemingly random failures of late:
>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory access
>>> 
>>> The first thing I would do is run valgrind on as wide an array of tests as 
>>> you can. This will find problems
>>> on things that run completely fine.
>>> 
>>>   Thanks,
>>> 
>>>      Matt
>>>  
>>> Symptoms:
>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores
>>> 2) It doesn't happen right away -- this was running happily for several 
>>> hours over several hundred time steps with no indication of bad health in 
>>> the numerics
>>> 3) At least the total memory consumption seems to be within bounds, though 
>>> I'm not sure about individual processes. e.g. slurm here reported Memory 
>>> Efficiency: 75.23% of 1.76 TB (180.00 GB/node)
>>> 4) running the same setup twice it fails at different points
>>> 
>>> Any suggestions on what to look for? This is a bit painful to work on as I 
>>> can only reproduce it on large runs and then it's seemingly random.
>>> 
>>> 
>>> Thanks,
>>> Mark
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their 
>>> experiments is infinitely more interesting than any results to which their 
>>> experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>> 
> 

Reply via email to