Many thanks for the hint. Hitting "c" never returns the prompt, hitting "ctrl+c" and then "where" reveals a deadlock. Thanks again!
Dominik On Fri, Aug 19, 2011 at 7:28 PM, Aron Ahmadia <aron.ahmadia at kaust.edu.sa> wrote: > The debugger stops when you start up, that's this code [1]. ?Then you want > to hit 'continue' so your job runs normally to where it fails. ?You can also > set a break point on PetscError since PETSc is catching the error from MPI. > ?When you stop at the 'second breakpoint', you'll be at the part where your > code has detected an error condition in MPI. ?Type a 'where' there to get > the stack when the error was detected. > [1] > (gdb) where > #0 ?0x00007fae5b941590 in __nanosleep_nocancel () at > ../sysdeps/unix/syscall-template.S:82 > #1 ?0x00007fae5b94143c in __sleep (seconds=0) at > ../sysdeps/unix/sysv/linux/sleep.c:138 > #2 ?0x000000000056cc48 in PetscSleep (s=10) at psleep.c:56 > #3 ?0x0000000000838887 in PetscAttachDebugger () at adebug.c:410 > #4 ?0x00000000005590a7 in PetscOptionsCheckInitial_Private () at init.c:392 > #5 ?0x000000000055e40e in PetscInitialize (argc=0x7ffff403debc, > args=0x7ffff403deb0, file=0x0, > ? ?help=0x0) at pinit.c:639 > #6 ?0x0000000000524a16 in PetscSolver::InitializePetsc > (argc=0x7ffff403debc, argv=0x7ffff403deb0) > ? ?at /home/dsz/src/framework/trunk/solve/PetscSolver.cxx:124 > #7 ?0x00000000004c404f in main (argc=4, argv=0x7ffff403e4c8) > ? ?at /home/dsz/src/framework/trunk/solve/cd3t10mpi_main.cxx:526 > (gdb) > > > > On Fri, Aug 19, 2011 at 8:22 PM, Dominik Szczerba <dominik at itis.ethz.ch> > wrote: >> >> What do you mean by "the second break"? >> >> Dominik >> >> On Fri, Aug 19, 2011 at 6:47 PM, Aron Ahmadia <aron.ahmadia at kaust.edu.sa> >> wrote: >> > You want to do a 'where' on the second break, when your program is >> > raising >> > an abort signal... >> > A >> > >> > On Fri, Aug 19, 2011 at 6:57 PM, Dominik Szczerba <dominik at itis.ethz.ch> >> > wrote: >> >> >> >> (gdb) where >> >> #0 ?0x00007fae5b941590 in __nanosleep_nocancel () at >> >> ../sysdeps/unix/syscall-template.S:82 >> >> #1 ?0x00007fae5b94143c in __sleep (seconds=0) at >> >> ../sysdeps/unix/sysv/linux/sleep.c:138 >> >> #2 ?0x000000000056cc48 in PetscSleep (s=10) at psleep.c:56 >> >> #3 ?0x0000000000838887 in PetscAttachDebugger () at adebug.c:410 >> >> #4 ?0x00000000005590a7 in PetscOptionsCheckInitial_Private () at >> >> init.c:392 >> >> #5 ?0x000000000055e40e in PetscInitialize (argc=0x7ffff403debc, >> >> args=0x7ffff403deb0, file=0x0, >> >> ? ?help=0x0) at pinit.c:639 >> >> #6 ?0x0000000000524a16 in PetscSolver::InitializePetsc >> >> (argc=0x7ffff403debc, argv=0x7ffff403deb0) >> >> ? ?at /home/dsz/src/framework/trunk/solve/PetscSolver.cxx:124 >> >> #7 ?0x00000000004c404f in main (argc=4, argv=0x7ffff403e4c8) >> >> ? ?at /home/dsz/src/framework/trunk/solve/cd3t10mpi_main.cxx:526 >> >> (gdb) >> >> >> >> PetscSolver.cxx:124: >> >> >> >> ? ? ? ?ierr = PetscInitialize(argc, argv, (char *)0, (char *)0); >> >> CHKERRQ(ierr); >> >> >> >> Hmmm, not very helpful..... >> >> >> >> The app runs on one cpu, but silently crashes on two. >> >> >> >> Any hints are very appreciated. >> >> >> >> Dominik >> >> >> >> >> >> >> >> On Fri, Aug 19, 2011 at 5:49 PM, Satish Balay <balay at mcs.anl.gov> >> >> wrote: >> >> > On Fri, 19 Aug 2011, Dominik Szczerba wrote: >> >> > >> >> >> Hi, >> >> >> >> >> >> I am starting my app in the debugger as: >> >> >> >> >> >> mpiexec -np 2 sm3t4mpi run.xml -start_in_debugger -display :0.0 >> >> >> >> >> >> In the console I get: >> >> >> >> >> >> [1]PETSC ERROR: MPI error 14 >> >> >> >> >> >> in the two open terminals with gdb I get: >> >> >> >> >> >> 0x00007f2ecdd15590 in __nanosleep_nocancel () at >> >> >> ../sysdeps/unix/syscall-template.S:82 >> >> >> 82 ? ? ?../sysdeps/unix/syscall-template.S: No such file or >> >> >> directory. >> >> >> ? ? ? ? in ../sysdeps/unix/syscall-template.S >> >> >> (gdb) >> >> >> >> >> >> >> >> >> I type 'c' nonetheless and see: >> >> >> >> >> >> (gdb) c >> >> >> Continuing. >> >> >> [New Thread 0x7f268e975700 (LWP 22388)] >> >> >> >> >> >> Program received signal SIGABRT, Aborted. >> >> >> 0x00007f268f421d05 in raise (sig=6) at >> >> >> ../nptl/sysdeps/unix/sysv/linux/raise.c:64 >> >> >> 64 ? ? ?../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or >> >> >> directory. >> >> >> ? ? ? ? in ../nptl/sysdeps/unix/sysv/linux/raise.c >> >> >> >> >> >> >> >> >> >> >> >> How do I go on debugging? >> >> > >> >> > what do you get for: >> >> > >> >> > (gdb) where >> >> > >> >> > Satish >> >> > >> >> > >> >> >> >> >> >> Many thanks for any hints, >> >> >> >> >> >> Dominik >> >> >> >> >> > >> >> > >> > >> > > >
