Thanks Matt and Barry. At Matt's suggestion I ran a smaller representative case with valgrind and didn't see anything alarming (apart from a small leak in an older boost version I was using: https://github.com/boostorg/serialization/issues/104 although I don't think this was causing the issue).
-malloc_debug dumps quite a lot, this is supposed to be empty right? Output pasted below. It looks like the same sequence of calls is repeated 8 times, which is how many nonlinear solves occurred in this particular run. Thoughts? [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c On Wed, Aug 12, 2020 at 1:46 PM Barry Smith <[email protected]> wrote: > > Mark. > > When valgrind is not feasible (like on many centrally controlled batch > systems) you can run PETSc with an extra flag to do some memory error checks > -malloc_debug > > this > > 1) fills all malloced memory with Nan so if the code is using > uninitialized memory it may be detected and > 2) checks the beginning and end of each alloced memory region for > out-of-bounds writes at each malloc and free. > > it will slow the code down a little bit but generally not a huge amount. > > It is no where near as good as valgrind or other memory corruption tools > but it has the advantage you can run it anywhere on any size job. > > > Barry > > > > > > On Aug 12, 2020, at 7:46 AM, Matthew Knepley <[email protected]> wrote: > > On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry <[email protected]> wrote: > >> I'm getting seemingly random failures of late: >> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >> > > The first thing I would do is run valgrind on as wide an array of tests as > you can. This will find problems > on things that run completely fine. > > Thanks, > > Matt > > >> Symptoms: >> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >> 2) It doesn't happen right away -- this was running happily for several >> hours over several hundred time steps with no indication of bad health in >> the numerics >> 3) At least the total memory consumption seems to be within bounds, >> though I'm not sure about individual processes. e.g. slurm here reported >> Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >> 4) running the same setup twice it fails at different points >> >> Any suggestions on what to look for? This is a bit painful to work on as >> I can only reproduce it on large runs and then it's seemingly random. >> >> >> Thanks, >> Mark >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> > > >
