On Thu, Jul 23, 2015 at 10:07 AM, Florian Lindner <[email protected]> wrote:
> > Am Mittwoch, 22. Juli 2015, 13:05:57 schrieben Sie: > > > > > On Jul 22, 2015, at 11:33 AM, Florian Lindner <[email protected]> > wrote: > > > > > > Am Dienstag, 21. Juli 2015, 18:32:02 schrieben Sie: > > >> > > >> Try putting a breakpoint in KSPSetUp_GMRES and check the values of > all the pointers immediately after the > > >> ierr = > PetscCalloc5(hh,&gmres->hh_origin,hes,&gmres->hes_origin,rs,&gmres->rs_origin,cc,&gmres->cc_origin,cc,&gmres->ss_origin);CHKERRQ(ierr); > > >> > > >> then put your second break point in KSPReset_GMRES and check all the > pointers agin just before the > > >>> ierr = > PetscFree5(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin);CHKERRQ(ierr); > > >> > > >> Of course the pointers should be the same, are they? > > > > > > Num Type Disp Enb Address What > > > 3 breakpoint keep y 0x00007ffff6ff6cb5 in KSPReset_GMRES > at /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:258 > > > 4 breakpoint keep y 0x00007ffff6ff49a1 in KSPSetUp_GMRES > at /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:54 > > > > > > The pointer gmres is the same. Just one function call later, at > mal.c:72 it crashes. The pointer that is freed is gmres->hh_origin which > also hasn't changed. > > > > > > What confuses me is that: > > > > > > Breakpoint 3, KSPReset_GMRES (ksp=0xe904b0) at > /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:258 > > > 258 ierr = > PetscFree5(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin);CHKERRQ(ierr); > > > (gdb) print gmres->hh_origin > > > $24 = (PetscScalar *) 0xf10250 > > > > > > hh_origin is the first argument, I step into PetscFree5: > > > > > > (gdb) s > > > PetscFreeAlign (ptr=0xf15aa0, line=258, func=0x7ffff753c4c8 > <__func__.20306> "KSPReset_GMRES", file=0x7ffff753b8b0 > "/home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c") at > /home/florian/software/petsc/src/sys/memory/mal.c:54 > > > 54 if (!ptr) return 0; > > > (gdb) print ptr > > > $25 = (void *) 0xf15aa0 > > > > > > Why have the value changed? I expect gmres->hh_origin == ptr. > > > > Definitely a problem here. > > > > > Could this be a sign of stack corruption at same ealier stage? > > > > Could be, but valgrind usually finds such things. > > > > You can do the following: edit > $PETSC_DIR/$PETSC_ARCH/include/petscconf.h and add the lines > > > > #if !defined(PETSC_USE_MALLOC_COALESCED) > > #define PETSC_USE_MALLOC_COALESCED > > #endif > > > > then run > > > > make gnumake in the $PETSC_DIR directory. Then relink your program and > try running it. > > Sorry, no success. Same story. :-( > > I have removed my small petsc wrapper lib from the code and it's pure > petsc now. Everything petsc related (beside Init and Finalize) is done in > that piece of code. Everything petsc is private, so no outside should mess > with it. > > > https://github.com/floli/precice/blob/petsc_debugging/src/mapping/PetRadialBasisFctMapping.hpp > > if you wanna have a look... Maybe you see something evil I'm doing. > I can't see anything by eye. Can you tell me how to run a small problem which shows the error? I will go through it until we find out what is going on. Thanks, Matt > Best Thanks, > Florian > > > > > > Barry > > > > > > > > > > > > > > I was also trying to build petsc with clang for using its > memory-sanitizer, but without success. Same for precice. > > > > > > > > >> If so you can run in the debugger and check the values at some points > between the creation and destruction to see where they get changed to bad > values. Normally, of course, valgrind would be very helpful in finding > exactly when things go bad. > > > > > > What do you mean with changing to bad? They are the same after Calloc > and before PetscFree5. > > > > > > Best Regards, > > > Florian > > > > > >> I'm afraid I'm going to have to give up on building this stuff > myself; too painful. > > > > > > Sorry about that. > > > > > >> > > >> Barry > > >> > > >> > > >>> On Jul 21, 2015, at 8:54 AM, Florian Lindner <[email protected]> > wrote: > > >>> > > >>> Hey Barry, > > >>> > > >>> were you able to reproduce the error? > > >>> > > >>> I tried to set a breakpoint at > > >>> > > >>> PetscErrorCode KSPReset_GMRES(KSP ksp) > > >>> { > > >>> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >>> PetscErrorCode ierr; > > >>> PetscInt i; > > >>> > > >>> PetscFunctionBegin; > > >>> /* Free the Hessenberg matrices */ > > >>> ierr = > PetscFree5(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin);CHKERRQ(ierr); > > >>> > > >>> in gmres.c, the last line produces the error... > > >>> > > >>> Interestingly this piece of code is traversed only once, so at least > no double calling of the same code that frees the pointer... > > >>> > > >>> Best Regards, > > >>> Florian > > >>> > > >>> > > >>> Am Donnerstag, 16. Juli 2015, 17:59:15 schrieben Sie: > > >>>> > > >>>> I am on a mac, no idea what the 'lo' local host loop back should be > > >>>> > > >>>> $ ./pmpi B > > >>>> MPI rank 0 of 1 > > >>>> [PRECICE] Run in coupling mode > > >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00], > [3.20000000000000006661e-01, 0.00000000000000000000e+00], > [5.20000000000000017764e-01, 0.00000000000000000000e+00], > [7.20000000000000084377e-01, 0.00000000000000000000e+00], > [9.20000000000000039968e-01, 0.00000000000000000000e+00]] > > >>>> Setting up master communication to coupling partner/s > > >>>> (0) [PRECICE] ERROR: Network "lo" not found for socket connection! > > >>>> Run finished at Thu Jul 16 17:50:39 2015 > > >>>> Global runtime = 41ms / 0s > > >>>> > > >>>> Event Count Total[ms] Max[ms] Min[ms] > Avg[ms] T% > > >>>> > -------------------------------------------------------------------------------- > > >>>> Properties from all Events, accumulated > > >>>> --------------------------------------- > > >>>> > > >>>> Abort trap: 6 > > >>>> ~/Src/prempi (master *=) arch-debug > > >>>> $ ./pmpi B > > >>>> MPI rank 0 of 1 > > >>>> [PRECICE] Run in coupling mode > > >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00], > [3.20000000000000006661e-01, 0.00000000000000000000e+00], > [5.20000000000000017764e-01, 0.00000000000000000000e+00], > [7.20000000000000084377e-01, 0.00000000000000000000e+00], > [9.20000000000000039968e-01, 0.00000000000000000000e+00]] > > >>>> Setting up master communication to coupling partner/s > > >>>> (0) [PRECICE] ERROR: Network "localhost" not found for socket > connection! > > >>>> Run finished at Thu Jul 16 17:50:52 2015 > > >>>> Global runtime = 40ms / 0s > > >>>> > > >>>> Event Count Total[ms] Max[ms] Min[ms] > Avg[ms] T% > > >>>> > -------------------------------------------------------------------------------- > > >>>> Properties from all Events, accumulated > > >>>> --------------------------------------- > > >>>> > > >>>> Abort trap: 6 > > >>>> ~/Src/prempi (master *=) arch-debug > > >>>> $ hostname > > >>>> Barrys-MacBook-Pro.local > > >>>> ~/Src/prempi (master *=) arch-debug > > >>>> $ ./pmpi B > > >>>> MPI rank 0 of 1 > > >>>> [PRECICE] Run in coupling mode > > >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00], > [3.20000000000000006661e-01, 0.00000000000000000000e+00], > [5.20000000000000017764e-01, 0.00000000000000000000e+00], > [7.20000000000000084377e-01, 0.00000000000000000000e+00], > [9.20000000000000039968e-01, 0.00000000000000000000e+00]] > > >>>> Setting up master communication to coupling partner/s > > >>>> (0) [PRECICE] ERROR: Network "Barrys-MacBook-Pro.local" not found > for socket connection! > > >>>> Run finished at Thu Jul 16 17:51:12 2015 > > >>>> Global runtime = 39ms / 0s > > >>>> > > >>>> Event Count Total[ms] Max[ms] Min[ms] > Avg[ms] T% > > >>>> > -------------------------------------------------------------------------------- > > >>>> Properties from all Events, accumulated > > >>>> --------------------------------------- > > >>>> > > >>>> Abort trap: 6 > > >>>> ~/Src/prempi (master *=) arch-debug > > >>>> $ ./pmpi B > > >>>> MPI rank 0 of 1 > > >>>> [PRECICE] Run in coupling mode > > >>>> Mesh = [[1.19999999999999995559e-01, 0.00000000000000000000e+00], > [3.20000000000000006661e-01, 0.00000000000000000000e+00], > [5.20000000000000017764e-01, 0.00000000000000000000e+00], > [7.20000000000000084377e-01, 0.00000000000000000000e+00], > [9.20000000000000039968e-01, 0.00000000000000000000e+00]] > > >>>> Setting up master communication to coupling partner/s > > >>>> (0) [PRECICE] ERROR: Network "10.0.1.2" not found for socket > connection! > > >>>> Run finished at Thu Jul 16 17:53:02 2015 > > >>>> Global runtime = 42ms / 0s > > >>>> > > >>>> Event Count Total[ms] Max[ms] Min[ms] > Avg[ms] T% > > >>>> > -------------------------------------------------------------------------------- > > >>>> Properties from all Events, accumulated > > >>>> --------------------------------------- > > >>>> > > >>>> Abort trap: 6 > > >>>> ~/Src/prempi (master *=) arch-debug > > >>>> > > >>>>> On Jul 15, 2015, at 1:53 AM, Florian Lindner <[email protected]> > wrote: > > >>>>> > > >>>>> Hey > > >>>>> > > >>>>> Am Dienstag, 14. Juli 2015, 13:20:33 schrieben Sie: > > >>>>>> > > >>>>>> How to install Eigen? I tried brew install eigen but it didn't > help. > > >>>>> > > >>>>> You may need to set the CPLUS_INCLUDE_PATH to something like > "/usr/include/eigen3" > > >>>>> Easiest way however is probably to download eigen from > http://bitbucket.org/eigen/eigen/get/3.2.5.tar.bz2 and move the Eigen > folder from that archive to precice/src. > > >>>>> > > >>>>>> Also what about the PRECICE_MPI_ stuff. It sure doesn't point to > anything valid. > > >>>>> > > >>>>> You probably don't need to set it if you use a mpic++ or mpicxx > compiler wrapper that take care of that. > > >>>>> > > >>>>> Thx, > > >>>>> Florian > > >>>>> > > >>>>>> > > >>>>>> > > >>>>>> Barry > > >>>>>> > > >>>>>> $ MPI_CXX="clang++" scons -j 4 boost_inst=on python=off petsc=on > mpi=on compiler=/Users/barrysmith/Src/petsc/arch-debug/bin/mpic++ > build=debug > > >>>>>> scons: Reading SConscript files ... > > >>>>>> > > >>>>>> Build options ... > > >>>>>> (default) builddir = build Directory > holding build files. ( /path/to/builddir ) > > >>>>>> (default) build = debug Build type, > either release or debug (release|debug) > > >>>>>> (modified) compiler = > /Users/barrysmith/Src/petsc/arch-debug/bin/mpic++ Compiler to use. > > >>>>>> (modified) mpi = True Enables > MPI-based communication and running coupling tests. (yes|no) > > >>>>>> (default) sockets = True Enables > Socket-based communication. (yes|no) > > >>>>>> (modified) boost_inst = True Enable if Boost > is available compiled and installed. (yes|no) > > >>>>>> (default) spirit2 = True Used for > parsing VRML file geometries and checkpointing. (yes|no) > > >>>>>> (modified) petsc = True Enable use of > the Petsc linear algebra library. (yes|no) > > >>>>>> (modified) python = False Used for Python > scripted solver actions. (yes|no) > > >>>>>> (default) gprof = False Used in > detailed performance analysis. (yes|no) > > >>>>>> ... done > > >>>>>> > > >>>>>> Environment variables used for this build ... > > >>>>>> (have to be defined by the user to configure build) > > >>>>>> (modified) PETSC_DIR = /Users/barrysmith/Src/PETSc > > >>>>>> (modified) PETSC_ARCH = arch-debug > > >>>>>> (default) PRECICE_BOOST_SYSTEM_LIB = boost_system > > >>>>>> (default) PRECICE_BOOST_FILESYSTEM_LIB = boost_filesystem > > >>>>>> (default) PRECICE_MPI_LIB_PATH = /usr/lib/ > > >>>>>> (default) PRECICE_MPI_LIB = mpich > > >>>>>> (default) PRECICE_MPI_INC_PATH = /usr/include/mpich2 > > >>>>>> (default) PRECICE_PTHREAD_LIB_PATH = /usr/lib > > >>>>>> (default) PRECICE_PTHREAD_LIB = pthread > > >>>>>> (default) PRECICE_PTHREAD_INC_PATH = /usr/include > > >>>>>> ... done > > >>>>>> > > >>>>>> Configuring build variables ... > > >>>>>> Checking whether the C++ compiler works... yes > > >>>>>> Checking for C library petsc... yes > > >>>>>> Checking for C++ header file Eigen/Dense... no > > >>>>>> ERROR: Header 'Eigen/Dense' (needed for Eigen) not found or does > not compile! > > >>>>>> $ brew install eigen > > >>>>>> ==> Downloading > https://downloads.sf.net/project/machomebrew/Bottles/eigen-3.2.3.yosemite.bottle.tar.gz > > >>>>>> > ######################################################################## > 100.0% > > >>>>>> ==> Pouring eigen-3.2.3.yosemite.bottle.tar.gz > > >>>>>> 🍺 /usr/local/Cellar/eigen/3.2.3: 361 files, 4.1M > > >>>>>> ~/Src/precice (develop=) arch-debug > > >>>>>> $ MPI_CXX="clang++" scons -j 4 boost_inst=on python=off petsc=on > mpi=on compiler=/Users/barrysmith/Src/petsc/arch-debug/bin/mpic++ > build=debug > > >>>>>> scons: Reading SConscript files ... > > >>>>>> > > >>>>>> Build options ... > > >>>>>> (default) builddir = build Directory > holding build files. ( /path/to/builddir ) > > >>>>>> (default) build = debug Build type, > either release or debug (release|debug) > > >>>>>> (modified) compiler = > /Users/barrysmith/Src/petsc/arch-debug/bin/mpic++ Compiler to use. > > >>>>>> (modified) mpi = True Enables > MPI-based communication and running coupling tests. (yes|no) > > >>>>>> (default) sockets = True Enables > Socket-based communication. (yes|no) > > >>>>>> (modified) boost_inst = True Enable if Boost > is available compiled and installed. (yes|no) > > >>>>>> (default) spirit2 = True Used for > parsing VRML file geometries and checkpointing. (yes|no) > > >>>>>> (modified) petsc = True Enable use of > the Petsc linear algebra library. (yes|no) > > >>>>>> (modified) python = False Used for Python > scripted solver actions. (yes|no) > > >>>>>> (default) gprof = False Used in > detailed performance analysis. (yes|no) > > >>>>>> ... done > > >>>>>> > > >>>>>> Environment variables used for this build ... > > >>>>>> (have to be defined by the user to configure build) > > >>>>>> (modified) PETSC_DIR = /Users/barrysmith/Src/PETSc > > >>>>>> (modified) PETSC_ARCH = arch-debug > > >>>>>> (default) PRECICE_BOOST_SYSTEM_LIB = boost_system > > >>>>>> (default) PRECICE_BOOST_FILESYSTEM_LIB = boost_filesystem > > >>>>>> (default) PRECICE_MPI_LIB_PATH = /usr/lib/ > > >>>>>> (default) PRECICE_MPI_LIB = mpich > > >>>>>> (default) PRECICE_MPI_INC_PATH = /usr/include/mpich2 > > >>>>>> (default) PRECICE_PTHREAD_LIB_PATH = /usr/lib > > >>>>>> (default) PRECICE_PTHREAD_LIB = pthread > > >>>>>> (default) PRECICE_PTHREAD_INC_PATH = /usr/include > > >>>>>> ... done > > >>>>>> > > >>>>>> Configuring build variables ... > > >>>>>> Checking whether the C++ compiler works... yes > > >>>>>> Checking for C library petsc... yes > > >>>>>> Checking for C++ header file Eigen/Dense... no > > >>>>>> ERROR: Header 'Eigen/Dense' (needed for Eigen) not found or does > not compile! > > >>>>>> ~/Src/precice (develop=) arch-debug > > >>>>>> > > >>>>>> > > >>>>>>> On Jul 14, 2015, at 2:14 AM, Florian Lindner < > [email protected]> wrote: > > >>>>>>> > > >>>>>>> Hello, > > >>>>>>> > > >>>>>>> Am Montag, 13. Juli 2015, 12:26:21 schrieb Barry Smith: > > >>>>>>>> > > >>>>>>>> Run under valgrind first, see if it gives any more details > about the memory issue > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > >>>>>>> > > >>>>>>> I tried running it like that: > > >>>>>>> > > >>>>>>> valgrind --tool=memcheck ./pmpi A -malloc off > > >>>>>>> > > >>>>>>> (pmpi is my application, no mpirun) > > >>>>>>> > > >>>>>>> but it reported no errors at all. > > >>>>>>> > > >>>>>>>> Can you send the code that produces this problem? > > >>>>>>> > > >>>>>>> I was not able to isolate that problem, you can of course have a > look at our application: > > >>>>>>> > > >>>>>>> git clone [email protected]:precice/precice.git > > >>>>>>> MPI_CXX="clang++" scons -j 4 boost_inst=on python=off petsc=on > mpi=on compiler=mpic++ build=debug > > >>>>>>> > > >>>>>>> The test client: > > >>>>>>> git clone [email protected]:floli/prempi.git > > >>>>>>> you need to adapt line 5 in SConstruct: preciceRoot > > >>>>>>> scons > > >>>>>>> > > >>>>>>> Take one terminal run ./pmpi A, another to run ./pmpi B > > >>>>>>> > > >>>>>>> Thanks for taking a look! Mail me if any problem with the build > occurs. > > >>>>>>> > > >>>>>>> Florian > > >>>>>>> > > >>>>>>>> > > >>>>>>>>> On Jul 13, 2015, at 10:56 AM, Florian Lindner < > [email protected]> wrote: > > >>>>>>>>> > > >>>>>>>>> Hello, > > >>>>>>>>> > > >>>>>>>>> our petsc application suffers from a memory error (double free > or corruption). > > >>>>>>>>> > > >>>>>>>>> Situation is a like that: > > >>>>>>>>> > > >>>>>>>>> A KSP is private member of a C++ class. In its constructor I > call KSPCreate. Inbetween it may haben that I call KSPREset. In the class' > destructor I call KSPDestroy. That's where the memory error appears: > > >>>>>>>>> > > >>>>>>>>> gdb backtrace: > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> #4 0x00007ffff490b8db in _int_free () from /usr/lib/libc.so.6 > > >>>>>>>>> #5 0x00007ffff6188c9c in PetscFreeAlign (ptr=0xfcd990, > line=258, func=0x7ffff753c4c8 <__func__.20304> "KSPReset_GMRES", > file=0x7ffff753b8b0 > "/home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c") > > >>>>>>>>> at /home/florian/software/petsc/src/sys/memory/mal.c:72 > > >>>>>>>>> #6 0x00007ffff6ff6cdc in KSPReset_GMRES (ksp=0xf48470) at > /home/florian/software/petsc/src/ksp/ksp/impls/gmres/gmres.c:258 > > >>>>>>>>> #7 0x00007ffff70ad804 in KSPReset (ksp=0xf48470) at > /home/florian/software/petsc/src/ksp/ksp/interface/itfunc.c:885 > > >>>>>>>>> #8 0x00007ffff70ae2e8 in KSPDestroy (ksp=0xeb89d8) at > /home/florian/software/petsc/src/ksp/ksp/interface/itfunc.c:933 > > >>>>>>>>> > > >>>>>>>>> #9 0x0000000000599b24 in > precice::mapping::PetRadialBasisFctMapping<precice::mapping::Gaussian>::~PetRadialBasisFctMapping > (this=0xeb8960) at src/mapping/PetRadialBasisFctMapping.hpp:148 > > >>>>>>>>> #10 0x0000000000599bc9 in > precice::mapping::PetRadialBasisFctMapping<precice::mapping::Gaussian>::~PetRadialBasisFctMapping > (this=0xeb8960) at src/mapping/PetRadialBasisFctMapping.hpp:146 > > >>>>>>>>> > > >>>>>>>>> Complete backtrace at http://pastebin.com/ASjibeNF > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Could it be a problem it objects set by KSPSetOperators are > destroyed afterwards? I don't think so, since KSPReset is called before. > > >>>>>>>>> > > >>>>>>>>> I've wrapped a class (just a bunch of helper function, no > encapsulating wrapper) round Mat and Vec objects. Nothing fancy, the ctor > calls MatCreate, the dtor MatDestroy, you can have a look at > https://github.com/precice/precice/blob/develop/src/mapping/petnum.cpp / > .hpp. > > >>>>>>>>> > > >>>>>>>>> These objects are also members of the same class like KSP, so > their dtor is called after KSPDestroy. > > >>>>>>>>> > > >>>>>>>>> What could cause the memory corruption here? > > >>>>>>>>> > > >>>>>>>>> Thanks a lot, > > >>>>>>>>> Florian > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
