Satish Things work fine on my linux machine (and other linux clusters) and valgrind shows no error. Unfortunately Totalview (GUI starts fine on the node) gives me a licensing error on the Cray.
I will continue to explore. Thanks Tabrez On 04/02/2012 09:05 PM, Satish Balay wrote: > Sounds like a Cray machine. > > start_in_debugger is useful for debugging on workstations [or > clusters] etc where there is some control on X11 tunnels. Also > 'xterm','gdb' or similar debugger should be available on the compute > nodes [along with a x/ssh tunnel]. > > On a cray - you are better off looking for a parallel debugger. Don't > know if cray has one available. > > Wrt debugging - you might want to run your code with valgrind on a > linux box.. > > Satish > > On Mon, 2 Apr 2012, Tabrez Ali wrote: > >> Hello >> >> I am trying to debug a program using the switch '-on_error_attach_debugger' >> but the vendor/sysadmin built PETSc 3.2.00 is unable to start the debugger >> in >> xterm (see text below). But xterm is installed. What am I doing wrong? >> >> Btw the segfault happens during a call to MatMult but only with >> vendor/sysadmin supplied PETSc 3.2 with PGI and Intel compilers only and >> _not_ >> with CRAY or GNU compilers. >> >> I also dont get the segfault if I build PETSc 3.2-p7 myself with PGI/Intel >> compilers. >> >> Any ideas on how to diagnose the problem? Unfortunately I cannot seem to run >> valgrind on this particular machine. >> >> Thanks in advance. >> >> Tabrez >> >> --- >> >> stali at krakenpf1:~/meshes> which xterm >> /usr/bin/xterm >> stali at krakenpf1:~/meshes> aprun -n 1 ./defmod -f >> 2d_point_load_dyn_abc.inp >> -on_error_attach_debugger >> ... >> ... >> ... >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably >> memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC >> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find >> memory corruption errors >> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and >> run >> [0]PETSC ERROR: to get more information on the crash. >> [0]PETSC ERROR: User provided function() line 0 in unknown directory unknown >> file >> [0]PETSC ERROR: PETSC: Attaching gdb to ./defmod of pid 32384 on display >> localhost:20.0 on machine nid10649 >> Unable to start debugger in xterm: No such file or directory >> aborting job: >> application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0 >> _pmii_daemon(SIGCHLD): [NID 10649] [c23-3c0s6n1] [Mon Apr 2 13:06:48 2012] >> PE >> 0 exit signal Aborted >> Application 133198 exit codes: 134 >> Application 133198 resources: utime ~1s, stime ~0s >>
