> On Jul 22, 2015, at 11:33 AM, Florian Lindner <[email protected]> wrote: > > Am Dienstag, 21. Juli 2015, 18:32:02 schrieben Sie: >> >> Try putting a breakpoint in KSPSetUp_GMRES and check the values of all the >> pointers immediately after the >> ierr = >> PetscCalloc5(hh,&gmres->hh_origin,hes,&gmres->hes_origin,rs,&gmres->rs_origin,cc,&gmres->cc_origin,cc,&gmres->ss_origin);CHKERRQ(ierr); >> >> then put your second break point in KSPReset_GMRES and check all the >> pointers agin just before the >>> ierr = >>> PetscFree5(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin);CHKERRQ(ierr); >> >> Of course the pointers should be the same, are they? > > Num Type Disp Enb Address What > 3 breakpoint keep y 0x00007ffff6ff6cb5 in KSPReset_GMRES at > /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:258 > 4 breakpoint keep y 0x00007ffff6ff49a1 in KSPSetUp_GMRES at > /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:54 > > The pointer gmres is the same. Just one function call later, at mal.c:72 it > crashes. The pointer that is freed is gmres->hh_origin which also hasn't > changed. > > What confuses me is that: > > Breakpoint 3, KSPReset_GMRES (ksp=0xe904b0) at > /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:258 > 258 ierr = > PetscFree5(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin);CHKERRQ(ierr); > (gdb) print gmres->hh_origin > $24 = (PetscScalar *) 0xf10250 > > hh_origin is the first argument, I step into PetscFree5: > > (gdb) s > PetscFreeAlign (ptr=0xf15aa0, line=258, func=0x7ffff753c4c8 <__func__.20306> > "KSPReset_GMRES", file=0x7ffff753b8b0 > "/home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c") at > /home/florian/software/petsc/src/sys/memory/mal.c:54 > 54 if (!ptr) return 0; > (gdb) print ptr > $25 = (void *) 0xf15aa0 > > Why have the value changed? I expect gmres->hh_origin == ptr.
Definitely a problem here. > Could this be a sign of stack corruption at same ealier stage? Could be, but valgrind usually finds such things. You can do the following: edit $PETSC_DIR/$PETSC_ARCH/include/petscconf.h and add the lines #if !defined(PETSC_USE_MALLOC_COALESCED) #define PETSC_USE_MALLOC_COALESCED #endif then run make gnumake in the $PETSC_DIR directory. Then relink your program and try running it. Barry > > I was also trying to build petsc with clang for using its memory-sanitizer, > but without success. Same for precice. > > >> If so you can run in the debugger and check the values at some points >> between the creation and destruction to see where they get changed to bad >> values. Normally, of course, valgrind would be very helpful in finding >> exactly when things go bad. > > What do you mean with changing to bad? They are the same after Calloc and > before PetscFree5. > > Best Regards, > Florian > >> I'm afraid I'm going to have to give up on building this stuff myself; too >> painful. > > Sorry about that. > >> >> Barry >> >> >>> On Jul 21, 2015, at 8:54 AM, Florian Lindner <[email protected]> wrote: >>> >>> Hey Barry, >>> >>> were you able to reproduce the error? >>> >>> I tried to set a breakpoint at >>> >>> PetscErrorCode KSPReset_GMRES(KSP ksp) >>> { >>> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >>> PetscErrorCode ierr; >>> PetscInt i; >>> >>> PetscFunctionBegin; >>> /* Free the Hessenberg matrices */ >>> ierr = >>> PetscFree5(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin);CHKERRQ(ierr); >>> >>> in gmres.c, the last line produces the error... >>> >>> Interestingly this piece of code is traversed only once, so at least no >>> double calling of the same code that frees the pointer... >>> >>> Best Regards, >>> Florian >>> >>> >>> Am Donnerstag, 16. Juli 2015, 17:59:15 schrieben Sie: >>>> >>>> I am on a mac, no idea what the 'lo' local host loop back should be >>>> >>>> $ ./pmpi B >>>> MPI rank 0 of 1 >>>> [PRECICE] Run in coupling mode >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00], >>>> [3.20000000000000006661e-01, 0.00000000000000000000e+00], >>>> [5.20000000000000017764e-01, 0.00000000000000000000e+00], >>>> [7.20000000000000084377e-01, 0.00000000000000000000e+00], >>>> [9.20000000000000039968e-01, 0.00000000000000000000e+00]] >>>> Setting up master communication to coupling partner/s >>>> (0) [PRECICE] ERROR: Network "lo" not found for socket connection! >>>> Run finished at Thu Jul 16 17:50:39 2015 >>>> Global runtime = 41ms / 0s >>>> >>>> Event Count Total[ms] Max[ms] Min[ms] >>>> Avg[ms] T% >>>> -------------------------------------------------------------------------------- >>>> Properties from all Events, accumulated >>>> --------------------------------------- >>>> >>>> Abort trap: 6 >>>> ~/Src/prempi (master *=) arch-debug >>>> $ ./pmpi B >>>> MPI rank 0 of 1 >>>> [PRECICE] Run in coupling mode >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00], >>>> [3.20000000000000006661e-01, 0.00000000000000000000e+00], >>>> [5.20000000000000017764e-01, 0.00000000000000000000e+00], >>>> [7.20000000000000084377e-01, 0.00000000000000000000e+00], >>>> [9.20000000000000039968e-01, 0.00000000000000000000e+00]] >>>> Setting up master communication to coupling partner/s >>>> (0) [PRECICE] ERROR: Network "localhost" not found for socket connection! >>>> Run finished at Thu Jul 16 17:50:52 2015 >>>> Global runtime = 40ms / 0s >>>> >>>> Event Count Total[ms] Max[ms] Min[ms] >>>> Avg[ms] T% >>>> -------------------------------------------------------------------------------- >>>> Properties from all Events, accumulated >>>> --------------------------------------- >>>> >>>> Abort trap: 6 >>>> ~/Src/prempi (master *=) arch-debug >>>> $ hostname >>>> Barrys-MacBook-Pro.local >>>> ~/Src/prempi (master *=) arch-debug >>>> $ ./pmpi B >>>> MPI rank 0 of 1 >>>> [PRECICE] Run in coupling mode >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00], >>>> [3.20000000000000006661e-01, 0.00000000000000000000e+00], >>>> [5.20000000000000017764e-01, 0.00000000000000000000e+00], >>>> [7.20000000000000084377e-01, 0.00000000000000000000e+00], >>>> [9.20000000000000039968e-01, 0.00000000000000000000e+00]] >>>> Setting up master communication to coupling partner/s >>>> (0) [PRECICE] ERROR: Network "Barrys-MacBook-Pro.local" not found for >>>> socket connection! >>>> Run finished at Thu Jul 16 17:51:12 2015 >>>> Global runtime = 39ms / 0s >>>> >>>> Event Count Total[ms] Max[ms] Min[ms] >>>> Avg[ms] T% >>>> -------------------------------------------------------------------------------- >>>> Properties from all Events, accumulated >>>> --------------------------------------- >>>> >>>> Abort trap: 6 >>>> ~/Src/prempi (master *=) arch-debug >>>> $ ./pmpi B >>>> MPI rank 0 of 1 >>>> [PRECICE] Run in coupling mode >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00], >>>> [3.20000000000000006661e-01, 0.00000000000000000000e+00], >>>> [5.20000000000000017764e-01, 0.00000000000000000000e+00], >>>> [7.20000000000000084377e-01, 0.00000000000000000000e+00], >>>> [9.20000000000000039968e-01, 0.00000000000000000000e+00]] >>>> Setting up master communication to coupling partner/s >>>> (0) [PRECICE] ERROR: Network "10.0.1.2" not found for socket connection! >>>> Run finished at Thu Jul 16 17:53:02 2015 >>>> Global runtime = 42ms / 0s >>>> >>>> Event Count Total[ms] Max[ms] Min[ms] >>>> Avg[ms] T% >>>> -------------------------------------------------------------------------------- >>>> Properties from all Events, accumulated >>>> --------------------------------------- >>>> >>>> Abort trap: 6 >>>> ~/Src/prempi (master *=) arch-debug >>>> >>>>> On Jul 15, 2015, at 1:53 AM, Florian Lindner <[email protected]> wrote: >>>>> >>>>> Hey >>>>> >>>>> Am Dienstag, 14. Juli 2015, 13:20:33 schrieben Sie: >>>>>> >>>>>> How to install Eigen? I tried brew install eigen but it didn't help. >>>>> >>>>> You may need to set the CPLUS_INCLUDE_PATH to something like >>>>> "/usr/include/eigen3" >>>>> Easiest way however is probably to download eigen from >>>>> http://bitbucket.org/eigen/eigen/get/3.2.5.tar.bz2 and move the Eigen >>>>> folder from that archive to precice/src. >>>>> >>>>>> Also what about the PRECICE_MPI_ stuff. It sure doesn't point to >>>>>> anything valid. >>>>> >>>>> You probably don't need to set it if you use a mpic++ or mpicxx compiler >>>>> wrapper that take care of that. >>>>> >>>>> Thx, >>>>> Florian >>>>> >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> $ MPI_CXX="clang++" scons -j 4 boost_inst=on python=off petsc=on mpi=on >>>>>> compiler=/Users/barrysmith/Src/petsc/arch-debug/bin/mpic++ build=debug >>>>>> scons: Reading SConscript files ... >>>>>> >>>>>> Build options ... >>>>>> (default) builddir = build Directory holding >>>>>> build files. ( /path/to/builddir ) >>>>>> (default) build = debug Build type, either >>>>>> release or debug (release|debug) >>>>>> (modified) compiler = >>>>>> /Users/barrysmith/Src/petsc/arch-debug/bin/mpic++ Compiler to use. >>>>>> (modified) mpi = True Enables MPI-based >>>>>> communication and running coupling tests. (yes|no) >>>>>> (default) sockets = True Enables Socket-based >>>>>> communication. (yes|no) >>>>>> (modified) boost_inst = True Enable if Boost is >>>>>> available compiled and installed. (yes|no) >>>>>> (default) spirit2 = True Used for parsing VRML >>>>>> file geometries and checkpointing. (yes|no) >>>>>> (modified) petsc = True Enable use of the >>>>>> Petsc linear algebra library. (yes|no) >>>>>> (modified) python = False Used for Python >>>>>> scripted solver actions. (yes|no) >>>>>> (default) gprof = False Used in detailed >>>>>> performance analysis. (yes|no) >>>>>> ... done >>>>>> >>>>>> Environment variables used for this build ... >>>>>> (have to be defined by the user to configure build) >>>>>> (modified) PETSC_DIR = /Users/barrysmith/Src/PETSc >>>>>> (modified) PETSC_ARCH = arch-debug >>>>>> (default) PRECICE_BOOST_SYSTEM_LIB = boost_system >>>>>> (default) PRECICE_BOOST_FILESYSTEM_LIB = boost_filesystem >>>>>> (default) PRECICE_MPI_LIB_PATH = /usr/lib/ >>>>>> (default) PRECICE_MPI_LIB = mpich >>>>>> (default) PRECICE_MPI_INC_PATH = /usr/include/mpich2 >>>>>> (default) PRECICE_PTHREAD_LIB_PATH = /usr/lib >>>>>> (default) PRECICE_PTHREAD_LIB = pthread >>>>>> (default) PRECICE_PTHREAD_INC_PATH = /usr/include >>>>>> ... done >>>>>> >>>>>> Configuring build variables ... >>>>>> Checking whether the C++ compiler works... yes >>>>>> Checking for C library petsc... yes >>>>>> Checking for C++ header file Eigen/Dense... no >>>>>> ERROR: Header 'Eigen/Dense' (needed for Eigen) not found or does not >>>>>> compile! >>>>>> $ brew install eigen >>>>>> ==> Downloading >>>>>> https://downloads.sf.net/project/machomebrew/Bottles/eigen-3.2.3.yosemite.bottle.tar.gz >>>>>> ######################################################################## >>>>>> 100.0% >>>>>> ==> Pouring eigen-3.2.3.yosemite.bottle.tar.gz >>>>>> 🍺 /usr/local/Cellar/eigen/3.2.3: 361 files, 4.1M >>>>>> ~/Src/precice (develop=) arch-debug >>>>>> $ MPI_CXX="clang++" scons -j 4 boost_inst=on python=off petsc=on mpi=on >>>>>> compiler=/Users/barrysmith/Src/petsc/arch-debug/bin/mpic++ build=debug >>>>>> scons: Reading SConscript files ... >>>>>> >>>>>> Build options ... >>>>>> (default) builddir = build Directory holding >>>>>> build files. ( /path/to/builddir ) >>>>>> (default) build = debug Build type, either >>>>>> release or debug (release|debug) >>>>>> (modified) compiler = >>>>>> /Users/barrysmith/Src/petsc/arch-debug/bin/mpic++ Compiler to use. >>>>>> (modified) mpi = True Enables MPI-based >>>>>> communication and running coupling tests. (yes|no) >>>>>> (default) sockets = True Enables Socket-based >>>>>> communication. (yes|no) >>>>>> (modified) boost_inst = True Enable if Boost is >>>>>> available compiled and installed. (yes|no) >>>>>> (default) spirit2 = True Used for parsing VRML >>>>>> file geometries and checkpointing. (yes|no) >>>>>> (modified) petsc = True Enable use of the >>>>>> Petsc linear algebra library. (yes|no) >>>>>> (modified) python = False Used for Python >>>>>> scripted solver actions. (yes|no) >>>>>> (default) gprof = False Used in detailed >>>>>> performance analysis. (yes|no) >>>>>> ... done >>>>>> >>>>>> Environment variables used for this build ... >>>>>> (have to be defined by the user to configure build) >>>>>> (modified) PETSC_DIR = /Users/barrysmith/Src/PETSc >>>>>> (modified) PETSC_ARCH = arch-debug >>>>>> (default) PRECICE_BOOST_SYSTEM_LIB = boost_system >>>>>> (default) PRECICE_BOOST_FILESYSTEM_LIB = boost_filesystem >>>>>> (default) PRECICE_MPI_LIB_PATH = /usr/lib/ >>>>>> (default) PRECICE_MPI_LIB = mpich >>>>>> (default) PRECICE_MPI_INC_PATH = /usr/include/mpich2 >>>>>> (default) PRECICE_PTHREAD_LIB_PATH = /usr/lib >>>>>> (default) PRECICE_PTHREAD_LIB = pthread >>>>>> (default) PRECICE_PTHREAD_INC_PATH = /usr/include >>>>>> ... done >>>>>> >>>>>> Configuring build variables ... >>>>>> Checking whether the C++ compiler works... yes >>>>>> Checking for C library petsc... yes >>>>>> Checking for C++ header file Eigen/Dense... no >>>>>> ERROR: Header 'Eigen/Dense' (needed for Eigen) not found or does not >>>>>> compile! >>>>>> ~/Src/precice (develop=) arch-debug >>>>>> >>>>>> >>>>>>> On Jul 14, 2015, at 2:14 AM, Florian Lindner <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Am Montag, 13. Juli 2015, 12:26:21 schrieb Barry Smith: >>>>>>>> >>>>>>>> Run under valgrind first, see if it gives any more details about the >>>>>>>> memory issue >>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>>>>>> >>>>>>> I tried running it like that: >>>>>>> >>>>>>> valgrind --tool=memcheck ./pmpi A -malloc off >>>>>>> >>>>>>> (pmpi is my application, no mpirun) >>>>>>> >>>>>>> but it reported no errors at all. >>>>>>> >>>>>>>> Can you send the code that produces this problem? >>>>>>> >>>>>>> I was not able to isolate that problem, you can of course have a look >>>>>>> at our application: >>>>>>> >>>>>>> git clone [email protected]:precice/precice.git >>>>>>> MPI_CXX="clang++" scons -j 4 boost_inst=on python=off petsc=on mpi=on >>>>>>> compiler=mpic++ build=debug >>>>>>> >>>>>>> The test client: >>>>>>> git clone [email protected]:floli/prempi.git >>>>>>> you need to adapt line 5 in SConstruct: preciceRoot >>>>>>> scons >>>>>>> >>>>>>> Take one terminal run ./pmpi A, another to run ./pmpi B >>>>>>> >>>>>>> Thanks for taking a look! Mail me if any problem with the build occurs. >>>>>>> >>>>>>> Florian >>>>>>> >>>>>>>> >>>>>>>>> On Jul 13, 2015, at 10:56 AM, Florian Lindner <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> our petsc application suffers from a memory error (double free or >>>>>>>>> corruption). >>>>>>>>> >>>>>>>>> Situation is a like that: >>>>>>>>> >>>>>>>>> A KSP is private member of a C++ class. In its constructor I call >>>>>>>>> KSPCreate. Inbetween it may haben that I call KSPREset. In the class' >>>>>>>>> destructor I call KSPDestroy. That's where the memory error appears: >>>>>>>>> >>>>>>>>> gdb backtrace: >>>>>>>>> >>>>>>>>> >>>>>>>>> #4 0x00007ffff490b8db in _int_free () from /usr/lib/libc.so.6 >>>>>>>>> #5 0x00007ffff6188c9c in PetscFreeAlign (ptr=0xfcd990, line=258, >>>>>>>>> func=0x7ffff753c4c8 <__func__.20304> "KSPReset_GMRES", >>>>>>>>> file=0x7ffff753b8b0 >>>>>>>>> "/home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c") >>>>>>>>> at /home/florian/software/petsc/src/sys/memory/mal.c:72 >>>>>>>>> #6 0x00007ffff6ff6cdc in KSPReset_GMRES (ksp=0xf48470) at >>>>>>>>> /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:258 >>>>>>>>> #7 0x00007ffff70ad804 in KSPReset (ksp=0xf48470) at >>>>>>>>> /home/florian/software/petsc/src/ksp/ksp/interface/itfunc.c:885 >>>>>>>>> #8 0x00007ffff70ae2e8 in KSPDestroy (ksp=0xeb89d8) at >>>>>>>>> /home/florian/software/petsc/src/ksp/ksp/interface/itfunc.c:933 >>>>>>>>> >>>>>>>>> #9 0x0000000000599b24 in >>>>>>>>> precice::mapping::PetRadialBasisFctMapping<precice::mapping::Gaussian>::~PetRadialBasisFctMapping >>>>>>>>> (this=0xeb8960) at src/mapping/PetRadialBasisFctMapping.hpp:148 >>>>>>>>> #10 0x0000000000599bc9 in >>>>>>>>> precice::mapping::PetRadialBasisFctMapping<precice::mapping::Gaussian>::~PetRadialBasisFctMapping >>>>>>>>> (this=0xeb8960) at src/mapping/PetRadialBasisFctMapping.hpp:146 >>>>>>>>> >>>>>>>>> Complete backtrace at http://pastebin.com/ASjibeNF >>>>>>>>> >>>>>>>>> >>>>>>>>> Could it be a problem it objects set by KSPSetOperators are destroyed >>>>>>>>> afterwards? I don't think so, since KSPReset is called before. >>>>>>>>> >>>>>>>>> I've wrapped a class (just a bunch of helper function, no >>>>>>>>> encapsulating wrapper) round Mat and Vec objects. Nothing fancy, the >>>>>>>>> ctor calls MatCreate, the dtor MatDestroy, you can have a look at >>>>>>>>> https://github.com/precice/precice/blob/develop/src/mapping/petnum.cpp >>>>>>>>> / .hpp. >>>>>>>>> >>>>>>>>> These objects are also members of the same class like KSP, so their >>>>>>>>> dtor is called after KSPDestroy. >>>>>>>>> >>>>>>>>> What could cause the memory corruption here? >>>>>>>>> >>>>>>>>> Thanks a lot, >>>>>>>>> Florian
