Hi Jed,

I tried the mpich version petsc on Janus (configured with option --download-mpich) and my code stopped at another place. The error message is followed. Do you have any suggestions?

For the core dump, I emailed the administrators of the Janus for help about a week ago but have not get any reply yet.

Best,
Rongliang

----------------------------
[0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the batch system) has told this process to end
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: --------------------- Stack Frames ------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the function
[0]PETSC ERROR:       is given.
[0]PETSC ERROR: [0] PetscBinaryRead line 234 /projects/ronglian/soft/petsc-3.4.3/src/sys/fileio/sysio.c
[0]PETSC ERROR: [0] GetPieceData line 1096 readbinary3d.c
[0]PETSC ERROR: [0] DataReadAndSplitGeneric line 962 readbinary3d.c
[0]PETSC ERROR: [0] DataRead line 621 readbinary3d.c
[0]PETSC ERROR: [0] ReadBinary line 184 readbinary3d.c
[0]PETSC ERROR: [0] LoadGrid line 720 loadgrid3d.c
[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: Signal received!
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: ./fsi3d on a Janus-debug-64bit-mpich named node0718 by ronglian Mon Nov 11 20:54:09 2013 [0]PETSC ERROR: Libraries linked from /projects/ronglian/soft/petsc-3.4.3/Janus-debug-64bit-mpich/lib
[0]PETSC ERROR: Configure run at Mon Nov 11 20:49:25 2013
[0]PETSC ERROR: Configure options --known-level1-dcache-size=32768 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=8 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-c-double-complex=1 --download-mpich=1 --download-blacs=1 --download-f-blas-lapack=1 --download-metis=1 --download-parmetis=1 --download-scalapack=1 --download-superlu_dist=1 --known-mpi-shared-libraries=0 --with-64-bit-indices --with-batch=1 --download-exodusii=1 --download-hdf5=1 --download-netcdf=1 --known-64-bit-blas-indices --with-debugging=1 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: User provided function() line 0 in unknown directory unknown file
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
[unset]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
[0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the batch system) has told this process to end
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below


On 11/07/2013 10:38 AM, Jed Brown wrote:
Rongliang Chen <[email protected]> writes:

Hi Jed,

I  have not find a way to "dump core on selected ranks" yet and I will
continue to do that.
Ask the administrators at your facility.  There are a few common ways,
but I'm not going to play a guessing game on the mailing list.

I run my code with the option "-on_error_attach_debugger" and got the
following message:

--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

    Local host:          node1529 (PID 3701)
    MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
[node1529:03700] 13 more processes have sent help message
help-mpi-runtime.txt / mpi_init:warn-fork
[node1529:03700] Set MCA parameter "orte_base_help_aggregate" to 0 to
see all help / error messages
--------------------------------------------------------------------------

Is this message useful for the debugging?
This is just a possibly technical problem attaching a debugger in your
environment, but you have to actually attach the debugger and poke
around (stack trace, etc).

Can you create an interactive session and run your job from there?

Reply via email to