> > Do you ever use regular malloc()? PETSc malloc aligns automatically, but > the system one does not.
Indirectly via new, yes. On Mon, Aug 24, 2020 at 11:10 AM Matthew Knepley <[email protected]> wrote: > On Mon, Aug 24, 2020 at 10:56 AM Mark Lohry <[email protected]> wrote: > >> Thanks Barry, I'll give -malloc_debug a shot. >> >> I know this is not necessarily a reasonable test but if you run the >>> exact same thing twice does it crash at the same location in terms of >>> iterations or does it seem to crash eventually "randomly" just after a long >>> time? >>> >> >> Crashes after a different number of iterations, seemingly random. >> >> >>> >>> I understand the frustration with this kind of crash, it just >>> shouldn't happen because the same BLAS calls have been made in the same way >>> thousands of times and yet suddenly trouble and very hard to debug. >>> >> >> Eventually makes for a good war story. >> >> Thinking back, I have seen some disturbing memory behavior that I think >> falls back to my use of eigen... e.g. in the past when running my full test >> suite a particular case would fail with NaNs, but if I ran that case in >> isolation it passes. I wonder if some object isn't getting properly aligned >> and at some point some kind of corruption occurs? >> > > Do you ever use regular malloc()? PETSc malloc aligns automatically, but > the system one does not. > > Thanks, > > Matt > > >> On Mon, Aug 24, 2020 at 10:35 AM Barry Smith <[email protected]> wrote: >> >>> >>> Mark, >>> >>> Ok, I'd generally trust the stock BLAS for not failing over OpenBLAS. >>> >>> Since valgrind is not viable have you tried with -malloc_debug with >>> the bad case it will be a little bit slower but not to bad and can find >>> some memory corruption issues. >>> >>> It might be useful to get the stack trace inside the BLAS to see >>> exactly where it crashes. If you ./configure with debugging and use >>> --download-fblaslapack or --download-f2cblaslapack it will compile the BLAS >>> with debugging, but just running a batch job still won't display the stack >>> frames inside the BLAS call. >>> >>> We have an option -on_error_attach_debugger which is useful for longer >>> many rank runs that attaches the debugger ONLY when the error is detected >>> but it may not play well with batch systems. But if you can make your run >>> on a non-batch system it might be able, along with the >>> --download-fblaslapack or --download-f2cblaslapack to get the exact stack >>> frames. And in the debugger look at the variables and address points to try >>> to determine how it could have gone wrong. >>> >>> I know this is not necessarily a reasonable test but if you run the >>> exact same thing twice does it crash at the same location in terms of >>> iterations or does it seem to crash eventually "randomly" just after a long >>> time? >>> >>> I understand the frustration with this kind of crash, it just >>> shouldn't happen because the same BLAS calls have been made in the same way >>> thousands of times and yet suddenly trouble and very hard to debug. >>> >>> Barry >>> >>> >>> >>> >>> On Aug 24, 2020, at 9:15 AM, Mark Lohry <[email protected]> wrote: >>> >>> valgrind: I ran a much smaller case and didn't see any issues in >>> valgrind. I'm only seeing this bus error on several hundred cores a few >>> hours wallclock in, so it might not be feasible to run that in valgrind. >>> >>> blas: i'm not entirely sure -- it's the stock one in PUIAS linux (red >>> hat derivative), libblas.so.3.4.2.. i'm going to try with intel and if that >>> fails use the openblas downloaded via petsc and see if it alleviates itself. >>> >>> >>> >>> On Mon, Aug 24, 2020 at 9:48 AM Barry Smith <[email protected]> wrote: >>> >>>> >>>> Mark, >>>> >>>> Can you run in valgrind? >>>> >>>> Exactly what BLAS are you using? >>>> >>>> Barry >>>> >>>> >>>> On Aug 24, 2020, at 7:54 AM, Mark Lohry <[email protected]> wrote: >>>> >>>> Reran with debug mode and got a stack trace for this bus error, looks >>>> like it's happening in BLASgemv, see pasted below. I did take care of the >>>> ISColoring leak mentioned previously, although that was a very small amount >>>> of data and I don't think is relevant here. >>>> >>>> At this point it's happily run 222 timesteps prior to this, so I'm a >>>> little mystified. Any ideas? >>>> >>>> Thanks, >>>> Mark >>>> >>>> >>>> 222 TS dt 0.03 time 6.66 >>>> 0 SNES Function norm 4.124287265556e+02 >>>> 0 KSP Residual norm 4.124287265556e+02 >>>> 1 KSP Residual norm 4.123248052318e+02 >>>> 2 KSP Residual norm 4.123173350456e+02 >>>> 3 KSP Residual norm 4.118769044110e+02 >>>> 4 KSP Residual norm 4.094856150740e+02 >>>> 5 KSP Residual norm 4.006000788078e+02 >>>> 6 KSP Residual norm 3.787922969183e+02 >>>> [clip] >>>> Linear solve converged due to CONVERGED_RTOL iterations 9 >>>> Line search: Using full step: fnorm 4.015236590684e+01 gnorm >>>> 3.173434863784e+00 >>>> 2 SNES Function norm 3.173434863784e+00 >>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 >>>> 0 SNES Function norm 5.842010710080e+02 >>>> 0 KSP Residual norm 5.842010710080e+02 >>>> 1 KSP Residual norm 5.840526408234e+02 >>>> 2 KSP Residual norm 5.840431857354e+02 >>>> 3 KSP Residual norm 5.834351392302e+02 >>>> 4 KSP Residual norm 5.800901047861e+02 >>>> 5 KSP Residual norm 5.675562288567e+02 >>>> 6 KSP Residual norm 5.366287895681e+02 >>>> 7 KSP Residual norm 4.725811521866e+02 >>>> [911]PETSC ERROR: >>>> ------------------------------------------------------------------------ >>>> [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly >>>> illegal memory access >>>> [911]PETSC ERROR: Try option -start_in_debugger or >>>> -on_error_attach_debugger >>>> [911]PETSC ERROR: or see >>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>>> [911]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple >>>> Mac OS X to find memory corruption errors >>>> [911]PETSC ERROR: likely location of problem given in stack below >>>> [911]PETSC ERROR: --------------------- Stack Frames >>>> ------------------------------------ >>>> [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not >>>> available, >>>> [911]PETSC ERROR: INSTEAD the line number of the start of the >>>> function >>>> [911]PETSC ERROR: is given. >>>> [911]PETSC ERROR: [911] BLASgemv line 1393 >>>> /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>>> [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 >>>> /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>>> [911]PETSC ERROR: [911] MatSolve line 3354 >>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c >>>> [911]PETSC ERROR: [911] PCApply_ILU line 201 >>>> /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >>>> [911]PETSC ERROR: [911] PCApply line 426 >>>> /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>>> [911]PETSC ERROR: [911] KSP_PCApply line 279 >>>> /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>>> [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c >>>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>> [911]PETSC ERROR: [911] KSPSolve line 848 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>> [911]PETSC ERROR: [911] PCApply_ASM line 441 >>>> /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c >>>> [911]PETSC ERROR: [911] PCApply line 426 >>>> /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>>> [911]PETSC ERROR: [911] KSP_PCApply line 279 >>>> /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>>> [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>>> [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>> [911]PETSC ERROR: [911] KSPSolve line 848 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>> [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 >>>> /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c >>>> [911]PETSC ERROR: [911] SNESSolve line 4403 >>>> /home/mlohry/build/external/petsc/src/snes/interface/snes.c >>>> [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 >>>> /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c >>>> [911]PETSC ERROR: [911] TSStep line 3682 >>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>>> [911]PETSC ERROR: [911] TSSolve line 4005 >>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>>> [911]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [911]PETSC ERROR: Signal received >>>> [911]PETSC ERROR: See >>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>> shooting. >>>> [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>>> [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by >>>> mlohry Sun Aug 23 19:54:21 2020 >>>> [911]PETSC ERROR: Configure options >>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt >>>> --with-cc=/usr/local/openmpi/3.1.3/gcc/x8 >>>> [911]PETSC ERROR: #1 User provided function() line 0 in unknown file >>>> >>>> -------------------------------------------------------------------------- >>>> MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD >>>> >>>> On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry <[email protected]> wrote: >>>> >>>>> Perhaps you are calling ISColoringGetIS() and not calling >>>>>> ISColoringRestoreIS()? >>>>>> >>>>> >>>>> I have matching ISColoringGet/Restore here, and it's only used prior >>>>> to the first iteration so at least it doesn't seem to be growing. At the >>>>> bottom I pasted the malloc_view and malloc_debug output from running 1 >>>>> time >>>>> step. >>>>> >>>>> I'm sort of thinking this might be a red herring -- is it possible the >>>>> rank 0 process is chewing up dramatically more memory than others, like >>>>> with logging or something? Like I mentioned earlier the total memory usage >>>>> is well under the machine limits. I'll spring in some >>>>> PetscMemoryGetMaximumUsage logging at every time step and try to get a big >>>>> job going again. >>>>> >>>>> >>>>> >>>>> Are you using Fortran? >>>>>> >>>>> >>>>> C++ >>>>> >>>>> >>>>> >>>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]64 bytes ISColoringGetIS() line 266 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>>> [ 0]32 bytes PetscCommDuplicate() line 129 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>>> [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire >>>>> process 719073280 >>>>> [0] Memory usage sorted by function >>>>> [0] 6 192 DMCoarsenHookAdd() >>>>> [0] 2 9984 DMCreate() >>>>> [0] 2 128 DMCreate_Shell() >>>>> [0] 2 64 DMDSEnlarge_Static() >>>>> [0] 1 672 DMKSPCreate() >>>>> [0] 3 96 DMRefineHookAdd() >>>>> [0] 3 2064 DMSNESCreate() >>>>> [0] 4 128 DMSubDomainHookAdd() >>>>> [0] 1 768 DMTSCreate() >>>>> [0] 2 96 ISColoringCreate() >>>>> [0] 8 12608 ISColoringGetIS() >>>>> [0] 1 307200 ISConcatenate() >>>>> [0] 29 25984 ISCreate() >>>>> [0] 25 400 ISCreate_General() >>>>> [0] 4 64 ISCreate_Stride() >>>>> [0] 20 338016 ISGeneralSetIndices_General() >>>>> [0] 3 921600 ISGetIndices_Stride() >>>>> [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() >>>>> [0] 1 6144 ISInvertPermutation_General() >>>>> [0] 3 308576 ISLocalToGlobalMappingCreate() >>>>> [0] 2 32 KSPConvergedDefaultCreate() >>>>> [0] 2 2816 KSPCreate() >>>>> [0] 1 224 KSPCreate_FGMRES() >>>>> [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() >>>>> [0] 2 16032 KSPSetUp_FGMRES() >>>>> [0] 4 16084160 KSPSetUp_GMRES() >>>>> [0] 2 36864 MatColoringApply_SL() >>>>> [0] 1 656 MatColoringCreate() >>>>> [0] 6 17088 MatCreate() >>>>> [0] 1 16 MatCreateMFFD_WP() >>>>> [0] 1 16 MatCreateSubMatrices_SeqBAIJ() >>>>> [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() >>>>> [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() >>>>> [0] 2 1472 MatCreate_MFFD() >>>>> [0] 1 416 MatCreate_SeqAIJ() >>>>> [0] 3 864 MatCreate_SeqBAIJ() >>>>> [0] 2 416 MatCreate_Shell() >>>>> [0] 1 784 MatFDColoringCreate() >>>>> [0] 2 12288 MatFDColoringDegreeSequence_Minpack() >>>>> [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() >>>>> [0] 3 42512 MatGetColumnIJ_SeqAIJ() >>>>> [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() >>>>> [0] 1 6144 MatGetOrdering_Natural() >>>>> [0] 2 36384 MatGetRowIJ_SeqAIJ() >>>>> [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() >>>>> [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() >>>>> [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() >>>>> [0] 1 6144 MatMarkDiagonal_SeqAIJ() >>>>> [0] 1 6144 MatMarkDiagonal_SeqBAIJ() >>>>> [0] 8 256 MatRegisterRootName() >>>>> [0] 1 6160 MatSeqAIJCheckInode() >>>>> [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() >>>>> [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() >>>>> [0] 13 576 MatSolverTypeRegister() >>>>> [0] 1 16 PCASMCreateSubdomains() >>>>> [0] 2 1664 PCCreate() >>>>> [0] 1 160 PCCreate_ASM() >>>>> [0] 1 192 PCCreate_ILU() >>>>> [0] 5 307264 PCSetUp_ASM() >>>>> [0] 2 416 PetscBTCreate() >>>>> [0] 2 3216 PetscClassPerfLogCreate() >>>>> [0] 2 1616 PetscClassRegLogCreate() >>>>> [0] 2 32 PetscCommBuildTwoSided_Allreduce() >>>>> [0] 2 64 PetscCommDuplicate() >>>>> [0] 2 1888 PetscDSCreate() >>>>> [0] 2 26416 PetscEventPerfLogCreate() >>>>> [0] 2 158400 PetscEventPerfLogEnsureSize() >>>>> [0] 2 1616 PetscEventRegLogCreate() >>>>> [0] 2 9600 PetscEventRegLogRegister() >>>>> [0] 8 102400 PetscFreeSpaceGet() >>>>> [0] 474 15168 PetscFunctionListAdd_Private() >>>>> [0] 2 528 PetscIntStackCreate() >>>>> [0] 142 11360 PetscLayoutCreate() >>>>> [0] 56 896 PetscLayoutSetUp() >>>>> [0] 59 9440 PetscObjectComposedDataIncreaseReal() >>>>> [0] 2 576 PetscObjectListAdd() >>>>> [0] 33 768 PetscOptionsGetEList() >>>>> [0] 1 16 PetscOptionsHelpPrintedCreate() >>>>> [0] 1 32 PetscPushSignalHandler() >>>>> [0] 7 6944 PetscSFCreate() >>>>> [0] 3 432 PetscSFCreate_Basic() >>>>> [0] 2 1472 PetscSFLinkCreate() >>>>> [0] 11 1229040 PetscSFSetUpRanks() >>>>> [0] 7 614512 PetscSFSetUp_Basic() >>>>> [0] 4 20096 PetscSegBufferCreate() >>>>> [0] 2 1488 PetscSplitReductionCreate() >>>>> [0] 2 3008 PetscStageLogCreate() >>>>> [0] 1148 23872 PetscStrallocpy() >>>>> [0] 6 13056 PetscStrreplace() >>>>> [0] 9 3456 PetscTableCreate() >>>>> [0] 1 16 PetscViewerASCIIOpen() >>>>> [0] 6 96 PetscViewerAndFormatCreate() >>>>> [0] 1 752 PetscViewerCreate() >>>>> [0] 1 96 PetscViewerCreate_ASCII() >>>>> [0] 2 1424 SNESCreate() >>>>> [0] 1 16 SNESCreate_NEWTONLS() >>>>> [0] 1 1008 SNESLineSearchCreate() >>>>> [0] 1 16 SNESLineSearchCreate_BT() >>>>> [0] 16 1824 SNESMSRegister() >>>>> [0] 46 9056 TSARKIMEXRegister() >>>>> [0] 1 1264 TSAdaptCreate() >>>>> [0] 8 384 TSBasicSymplecticRegister() >>>>> [0] 1 2160 TSCreate() >>>>> [0] 1 224 TSCreate_Theta() >>>>> [0] 48 5968 TSGLEERegister() >>>>> [0] 41 7728 TSRKRegister() >>>>> [0] 89 14736 TSRosWRegister() >>>>> [0] 71 110192 VecCreate() >>>>> [0] 1 307200 VecCreateGhostWithArray() >>>>> [0] 123 36874080 VecCreate_MPI_Private() >>>>> [0] 7 4300800 VecCreate_Seq() >>>>> [0] 8 256 VecCreate_Seq_Private() >>>>> [0] 6 400 VecDuplicateVecs_Default() >>>>> [0] 3 2352 VecScatterCreate() >>>>> [0] 7 1843296 VecScatterSetUp_SF() >>>>> [0] 126 2016 VecStashCreate_Private() >>>>> [0] 1 3072 mapBlockColoringToJacobian() >>>>> >>>>> On Wed, Aug 12, 2020 at 4:22 PM Barry Smith <[email protected]> wrote: >>>>> >>>>>> >>>>>> Yes, there are some PETSc objects or arrays that you are not >>>>>> freeing so they are printed at the end of the run. For small runs this >>>>>> harmless but if new objects/memory is allocated at each iteration and not >>>>>> suitably freed it will eventually add up. >>>>>> >>>>>> Run with -malloc_view (small problem with say 2 iterations) it >>>>>> will print everything allocated and might be helpful. >>>>>> >>>>>> Perhaps you are calling ISColoringGetIS() and not calling >>>>>> ISColoringRestoreIS()? >>>>>> >>>>>> It is also possible it is a leak in PETSc, but that is unlikely >>>>>> since we test for them. >>>>>> >>>>>> Are you using Fortran? >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On Aug 12, 2020, at 1:29 PM, Mark Lohry <[email protected]> wrote: >>>>>> >>>>>> Thanks Matt and Barry. At Matt's suggestion I ran a smaller >>>>>> representative case with valgrind and didn't see anything alarming (apart >>>>>> from a small leak in an older boost version I was using: >>>>>> https://github.com/boostorg/serialization/issues/104 although I >>>>>> don't think this was causing the issue). >>>>>> >>>>>> -malloc_debug dumps quite a lot, this is supposed to be empty right? >>>>>> Output pasted below. It looks like the same sequence of calls is >>>>>> repeated 8 >>>>>> times, which is how many nonlinear solves occurred in this particular >>>>>> run. >>>>>> Thoughts? >>>>>> >>>>>> >>>>>> >>>>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]64 bytes ISColoringGetIS() line 266 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>>>> [ 0]32 bytes PetscCommDuplicate() line 129 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith <[email protected]> wrote: >>>>>> >>>>>>> >>>>>>> Mark. >>>>>>> >>>>>>> When valgrind is not feasible (like on many centrally controlled >>>>>>> batch systems) you can run PETSc with an extra flag to do some memory >>>>>>> error >>>>>>> checks >>>>>>> -malloc_debug >>>>>>> >>>>>>> this >>>>>>> >>>>>>> 1) fills all malloced memory with Nan so if the code is using >>>>>>> uninitialized memory it may be detected and >>>>>>> 2) checks the beginning and end of each alloced memory region for >>>>>>> out-of-bounds writes at each malloc and free. >>>>>>> >>>>>>> it will slow the code down a little bit but generally not a huge >>>>>>> amount. >>>>>>> >>>>>>> It is no where near as good as valgrind or other memory corruption >>>>>>> tools but it has the advantage you can run it anywhere on any size job. >>>>>>> >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry <[email protected]> wrote: >>>>>>> >>>>>>>> I'm getting seemingly random failures of late: >>>>>>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory >>>>>>>> access >>>>>>>> >>>>>>> >>>>>>> The first thing I would do is run valgrind on as wide an array of >>>>>>> tests as you can. This will find problems >>>>>>> on things that run completely fine. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Symptoms: >>>>>>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>>>>>>> 2) It doesn't happen right away -- this was running happily for >>>>>>>> several hours over several hundred time steps with no indication of bad >>>>>>>> health in the numerics >>>>>>>> 3) At least the total memory consumption seems to be within bounds, >>>>>>>> though I'm not sure about individual processes. e.g. slurm here >>>>>>>> reported >>>>>>>> Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>>>>>>> 4) running the same setup twice it fails at different points >>>>>>>> >>>>>>>> Any suggestions on what to look for? This is a bit painful to work >>>>>>>> on as I can only reproduce it on large runs and then it's seemingly >>>>>>>> random. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mark >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which >>>>>>> their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> >
