Thanks, Barry, I also was wondering why this happens randomly? Any explanations? If this is something in PETSc, that should happen always?
Thanks, Fande Kong, On Fri, Nov 27, 2015 at 1:20 PM, Barry Smith <[email protected]> wrote: > > Edit PETSC_ARCH/include/petscconf.h and add > > #if !defined(PETSC_MISSING_SIGTRAP) > #define PETSC_MISSING_SIGTRAP > #endif > > then do > > make gnumake > > It is possible that they system you are using uses SIGTRAP in managing the > IO; by making the change above you are telling PETSc to ignore SIGTRAPS. > Let us know how this works out. > > Barry > > > > On Nov 27, 2015, at 1:05 PM, Fande Kong <[email protected]> wrote: > > > > Hi all, > > > > I implemented a parallel IO based on the Vec and IS which uses HDF5. I > am testing this loader on a supercomputer. I occasionally (not always) > encounter the following errors (using 8192 cores): > > > > [7689]PETSC ERROR: > ------------------------------------------------------------------------ > > [7689]PETSC ERROR: Caught signal number 5 TRAP > > [7689]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > [7689]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple > Mac OS X to find memory corruption errors > > [7689]PETSC ERROR: configure using --with-debugging=yes, recompile, > link, and run > > [7689]PETSC ERROR: to get more information on the crash. > > [7689]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [7689]PETSC ERROR: Signal received > > [7689]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown > > [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by fandek > Fri Nov 27 11:26:30 2015 > > [7689]PETSC ERROR: Configure options --with-clanguage=cxx > --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 > --download-parmetis=1 --download-metis=1 --with-netcdf=1 > --download-exodusii=1 > --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5 > --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 > > [7689]PETSC ERROR: #1 User provided function() line 0 in unknown file > > Abort(59) on node 7689 (rank 7689 in comm 1140850688): application > called MPI_Abort(MPI_COMM_WORLD, 59) - process 7689 > > ERROR: 0031-300 Forcing all remote tasks to exit due to exit code 1 in > task 7689 > > > > Make and configure logs are attached. > > > > Thanks, > > > > Fande Kong, > > > > <configure_log><make_log> > >
