On Sat, Apr 28, 2012 at 8:59 PM, Andrew Spott <andrew.spott at gmail.com>wrote:
> What makes it easier if autotools makes it hard? > This is a joke, but I do think autotools makes everything hard. > On Apr 28, 2012 6:43 PM, "Matthew Knepley" <knepley at gmail.com> wrote: > >> On Sat, Apr 28, 2012 at 8:36 PM, Andrew Spott <andrew.spott at >> gmail.com>wrote: >> >>> When I attach debugger on error on the local machine, I get a bunch of >>> lines like this one: >>> >>> warning: Could not find object file >>> "/private/tmp/homebrew-gcc-4.6.2-HNPr/gcc-4.6.2/build/x86_64-apple-darwin11.3.0/libstdc++-v3/src/../libsupc++/.libs/libsupc++convenience.a(cp-demangle.o)" >>> - no debug information available for "cp-demangle.c". >>> >> >> It looks like you build with autotools. That just makes things hard :) >> >> >>> then ". done" And then nothing. It looks like the program exits before >>> the debugger can attach. after a while I get this: >>> >> >> You can use -debugger_pause 10 to make it wait 10s before continuing >> after spawning the debugger. Make it long >> enough to attach. >> > Did this work? Matt > Matt >> >> >>> /Users/spott/Documents/Code/EnergyBasisSchrodingerSolver/data/ebss-input/basis_rmax1.00e+02_rmin1.00e-06_dr1.00e-01/76151: >>> No such file or directory >>> Unable to access task for process-id 76151: (os/kern) failure. >>> >>> in the gdb window. In the terminal window, I get >>> >>> application called MPI_Abort(MPI_COMM_WORLD, 0) - process 3 >>> [cli_3]: aborting job: >>> >>> if I just "start_in_debugger" I just don't' get the "MPI_Abort" thing, >>> but everything else is the same. >>> >>> any ideas? >>> >>> -Andrew >>> >>> On Apr 28, 2012, at 6:11 PM, Matthew Knepley wrote: >>> >>> On Sat, Apr 28, 2012 at 8:07 PM, Andrew Spott <andrew.spott at >>> gmail.com>wrote: >>> >>>> are there any tricks to doing this across ssh? >>>> >>>> I've attempted it using the method given, but I can't get it to start >>>> in the debugger or to attach the debugger, the program just exits or hangs >>>> after telling me the error. >>>> >>> >>> Is there a reason you cannot run this problem on your local machine with >>> 4 processes? >>> >>> Matt >>> >>> >>>> -Andrew >>>> >>>> On Apr 28, 2012, at 4:45 PM, Matthew Knepley wrote: >>>> >>>> On Sat, Apr 28, 2012 at 6:39 PM, Andrew Spott <andrew.spott at >>>> gmail.com>wrote: >>>> >>>>> >-start_in-debugger noxterm -debugger_nodes 14 >>>>> >>>>> All my cores are on the same machine, is this supposed to start a >>>>> debugger on processor 14? or computer 14? >>>>> >>>> >>>> Neither. This spawns a gdb process on the same node as the process with >>>> MPI rank 14. Then attaches gdb >>>> to process 14. >>>> >>>> Matt >>>> >>>> >>>>> I don't think I have x11 setup properly for the compute nodes, so x11 >>>>> isn't really an option. >>>>> >>>>> Thanks for the help. >>>>> >>>>> -Andrew >>>>> >>>>> >>>>> On Apr 27, 2012, at 7:26 PM, Satish Balay wrote: >>>>> >>>>> > On Fri, 27 Apr 2012, Andrew Spott wrote: >>>>> > >>>>> >> I'm honestly stumped. >>>>> >> >>>>> >> I have some petsc code that essentially just populates a matrix in >>>>> parallel, then puts it in a file. All my code that uses floating point >>>>> computations is checked for NaN's and infinities and it doesn't seem to >>>>> show up. However, when I run it on more than 4 cores, I get floating >>>>> point >>>>> exceptions that kill the program. I tried turning off the exceptions from >>>>> petsc, but the program still dies from them, just without the petsc error >>>>> message. >>>>> >> >>>>> >> I honestly don't know where to go, I suppose I should attach a >>>>> debugger, but I'm not sure how to do that for multi-processor code. >>>>> > >>>>> > assuming you have X11 setup properly from compute nodes you can run >>>>> > with the extra option '-start_in_debugger' >>>>> > >>>>> > If X11 is not properly setup - and you'd like to run gdb on one of >>>>> the >>>>> > nodes [say node 14 where you see SEGV] - you can do: >>>>> > >>>>> > -start_in-debugger noxterm -debugger_nodes 14 >>>>> > >>>>> > Or try valgrind >>>>> > >>>>> > mpiexec -n 16 valgrind --tool=memcheck -q ./executable >>>>> > >>>>> > >>>>> > For debugging - its best to install with --download-mpich [so that >>>>> its >>>>> > valgrind clean] - and run all mpi stuff on a single machine - >>>>> [usually >>>>> > X11 works well from a single machine.] >>>>> > >>>>> > Satish >>>>> > >>>>> >> >>>>> >> any ideas? (long error message below): >>>>> >> >>>>> >> -Andrew >>>>> >> >>>>> >> [14]PETSC ERROR: >>>>> ------------------------------------------------------------------------ >>>>> >> [14]PETSC ERROR: Caught signal number 8 FPE: Floating Point >>>>> Exception,probably divide by zero >>>>> >> [14]PETSC ERROR: Try option -start_in_debugger or >>>>> -on_error_attach_debugger >>>>> >> [14]PETSC ERROR: or see >>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[14]PETSCERROR: >>>>> or try >>>>> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory >>>>> corruption errors >>>>> >> [14]PETSC ERROR: likely location of problem given in stack below >>>>> >> [14]PETSC ERROR: --------------------- Stack Frames >>>>> ------------------------------------ >>>>> >> [14]PETSC ERROR: Note: The EXACT line numbers in the stack are not >>>>> available, >>>>> >> [14]PETSC ERROR: INSTEAD the line number of the start of the >>>>> function >>>>> >> [14]PETSC ERROR: is given. >>>>> >> [14]PETSC ERROR: --------------------- Error Message >>>>> ------------------------------------ >>>>> >> [14]PETSC ERROR: Signal received! >>>>> >> [14]PETSC ERROR: >>>>> ------------------------------------------------------------------------ >>>>> >> [14]PE[15]PETSC ERROR: >>>>> ------------------------------------------------------------------------ >>>>> >> [15]PETSC ERROR: Caught signal number 8 FPE: Floating Point >>>>> Exception,probably divide by zero >>>>> >> [15]PETSC ERROR: Try option -start_in_debugger or >>>>> -on_error_attach_debugger >>>>> >> [15]PETSC ERROR: or see >>>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[15]PETSCERROR: >>>>> or try >>>>> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory >>>>> corruption errors >>>>> >> [15]PETSC ERROR: likely location of problem given in stack below >>>>> >> [15]PETSC ERROR: --------------------- Stack Frames >>>>> ------------------------------------ >>>>> >> [15]PETSC ERROR: Note: The EXACT line numbers in the stack are not >>>>> available, >>>>> >> [15]PETSC ERROR: INSTEAD the line number of the start of the >>>>> function >>>>> >> [15]PETSC ERROR: is given. >>>>> >> [15]PETSC ERROR: --------------------- Error Message >>>>> ------------------------------------ >>>>> >> [15]PETSC ERROR: Signal received! >>>>> >> [15]PETSC ERROR: >>>>> ------------------------------------------------------------------------ >>>>> >> [15]PETSC ERROR: Petsc Release Version 3.2.0, Patch 6, Wed Jan 11 >>>>> 09:28:45 CST 2012 >>>>> >> [14]PETSC ERROR: See docs/changes/index.html for recent updates. >>>>> >> [14]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >>>>> >> [14]PETSC ERROR: See docs/index.html for manual pages. >>>>> >> [14]PETSC ERROR: >>>>> ------------------------------------------------------------------------ >>>>> >> [14]PETSC ERROR: /home/becker/ansp6066/local/bin/finddme on a >>>>> linux-gnu named photon9.colorado.edu by ansp6066 Fri Apr 27 18:01:55 >>>>> 2012 >>>>> >> [14]PETSC ERROR: Libraries linked from >>>>> /home/becker/ansp6066/local/petsc-3.2-p6/lib >>>>> >> [14]PETSC ERROR: Configure run at Mon Feb 27 11:17:14 2012 >>>>> >> [14]PETSC ERROR: Configure options >>>>> --prefix=/home/becker/ansp6066/local/petsc-3.2-p6 --with-c++-support >>>>> --with-fortran --with-mpi-dir=/usr/local/mpich2 --with-shared-libraries=0 >>>>> --with-scalar-type=complex >>>>> --with-blas-lapack-libs=/central/intel/mkl/lib/em64t/libmkl_core.a >>>>> --with-clanguage=cxx >>>>> >> [14]PETSC ERROR: >>>>> ------------------------------------------------------------------------ >>>>> >> [14]TSC ERROR: Petsc Release Version 3.2.0, Patch 6, Wed Jan 11 >>>>> 09:28:45 CST 2012 >>>>> >> [15]PETSC ERROR: See docs/changes/index.html for recent updates. >>>>> >> [15]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >>>>> >> [15]PETSC ERROR: See docs/index.html for manual pages. >>>>> >> [15]PETSC ERROR: >>>>> ------------------------------------------------------------------------ >>>>> >> [15]PETSC ERROR: /home/becker/ansp6066/local/bin/finddme on a >>>>> linux-gnu named photon9.colorado.edu by ansp6066 Fri Apr 27 18:01:55 >>>>> 2012 >>>>> >> [15]PETSC ERROR: Libraries linked from >>>>> /home/becker/ansp6066/local/petsc-3.2-p6/lib >>>>> >> [15]PETSC ERROR: Configure run at Mon Feb 27 11:17:14 2012 >>>>> >> [15]PETSC ERROR: Configure options >>>>> --prefix=/home/becker/ansp6066/local/petsc-3.2-p6 --with-c++-support >>>>> --with-fortran --with-mpi-dir=/usr/local/mpich2 --with-shared-libraries=0 >>>>> --with-scalar-type=complex >>>>> --with-blas-lapack-libs=/central/intel/mkl/lib/em64t/libmkl_core.a >>>>> --with-clanguage=cxx >>>>> >> [15]PETSC ERROR: >>>>> ------------------------------------------------------------------------ >>>>> >> [15]PETSC ERROR: User provided function() line 0 in unknown >>>>> directory unknown file >>>>> >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 14PETSC >>>>> ERROR: User provided function() line 0 in unknown directory unknown file >>>>> >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process >>>>> 15[0]0:Return code = 0, signaled with Interrupt >>>>> > >>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120428/7274b441/attachment-0001.htm>
