HI Matt, Thanks for your reply. I put my application data into PETSc Vec and IS that take advantage of HDF5 viewer (you implemented). In fact, I did not add any new output and input functions.
Thanks, Fande, On Fri, Nov 27, 2015 at 12:08 PM, Matthew Knepley <[email protected]> wrote: > On Fri, Nov 27, 2015 at 1:05 PM, Fande Kong <[email protected]> wrote: > >> Hi all, >> >> I implemented a parallel IO based on the Vec and IS which uses HDF5. I am >> testing this loader on a supercomputer. I occasionally (not always) >> encounter the following errors (using 8192 cores): >> > > What is different from the current HDF5 output routines? > > Thanks, > > Matt > > >> [7689]PETSC ERROR: >> ------------------------------------------------------------------------ >> [7689]PETSC ERROR: Caught signal number 5 TRAP >> [7689]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> [7689]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [7689]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac >> OS X to find memory corruption errors >> [7689]PETSC ERROR: configure using --with-debugging=yes, recompile, link, >> and run >> [7689]PETSC ERROR: to get more information on the crash. >> [7689]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [7689]PETSC ERROR: Signal received >> [7689]PETSC ERROR: See >> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [7689]PETSC ERROR: Petsc Release Version 3.6.2, unknown >> [7689]PETSC ERROR: ./fsi on a arch-linux2-cxx-opt named ys6103 by fandek >> Fri Nov 27 11:26:30 2015 >> [7689]PETSC ERROR: Configure options --with-clanguage=cxx >> --with-shared-libraries=1 --download-fblaslapack=1 --with-mpi=1 >> --download-parmetis=1 --download-metis=1 --with-netcdf=1 >> --download-exodusii=1 >> --with-hdf5-dir=/glade/apps/opt/hdf5-mpi/1.8.12/intel/12.1.5 >> --with-debugging=no --with-c2html=0 --with-64-bit-indices=1 >> [7689]PETSC ERROR: #1 User provided function() line 0 in unknown file >> Abort(59) on node 7689 (rank 7689 in comm 1140850688): application called >> MPI_Abort(MPI_COMM_WORLD, 59) - process 7689 >> ERROR: 0031-300 Forcing all remote tasks to exit due to exit code 1 in >> task 7689 >> >> Make and configure logs are attached. >> >> Thanks, >> >> Fande Kong, >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener >
