On 14/4/2014 10:44 PM, Matthew Knepley wrote:
On Mon, Apr 14, 2014 at 9:40 AM, TAY wee-beng <[email protected] <mailto:[email protected]>> wrote:

    Hi Barry,

    I'm not too sure how to do it. I'm running mpi. So I run:

     mpirun -n 4 ./a.out -start_in_debugger


  add -debugger_pause 10

It seems that I need to use a value of 60. It gives a segmentation fault after the location I debug earlier:

#0  0x00002b672cdee78a in f90array3daccessscalar_ ()
   from /home/wtay/Lib/petsc-3.4.4_shared_rel/lib/libpetsc.so
#1  0x00002b672cdedcae in F90Array3dAccess ()
   from /home/wtay/Lib/petsc-3.4.4_shared_rel/lib/libpetsc.so
#2  0x00002b672d2ad044 in dmdavecrestorearrayf903_ ()
   from /home/wtay/Lib/petsc-3.4.4_shared_rel/lib/libpetsc.so
#3  0x00000000008a1d8d in fractional_initial_mp_initial_ ()
#4  0x0000000000539289 in MAIN__ ()
#5  0x000000000043c04c in main ()

What;'s that supposed to mean?

   Matt

    I got the msg below. Before the gdb windows appear (thru x11), the
    program aborts.

    Also I tried running in another cluster and it worked. Also tried
    in the current cluster in debug mode and it worked too.

    /_*mpirun -n 4 ./a.out -start_in_debugger*_//_*
    
*_//_*--------------------------------------------------------------------------*_//_*
    *_//_*An MPI process has executed an operation involving a call to
    the*_//_*
    *_//_*"fork()" system call to create a child process.  Open MPI is
    currently*_//_*
    *_//_*operating in a condition that could result in memory
    corruption or*_//_*
    *_//_*other system errors; your MPI job may hang, crash, or
    produce silent*_//_*
    *_//_*data corruption.  The use of fork() (or system() or other
    calls that*_//_*
    *_//_*create child processes) is strongly discouraged. *_//_*
    *_//_*
    *_//_*The process that invoked fork was:*_//_*
    *_//_*
    *_//_*  Local host:          n12-76 (PID 20235)*_//_*
    *_//_*  MPI_COMM_WORLD rank: 2*_//_*
    *_//_*
    *_//_*If you are *absolutely sure* that your application will
    successfully*_//_*
    *_//_*and correctly survive a call to fork(), you may disable this
    warning*_//_*
    *_//_*by setting the mpi_warn_on_fork MCA parameter to 0.*_//_*
    
*_//_*--------------------------------------------------------------------------*_//_*
    *_//_*[2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235
    on display localhost:50.0 on machine n12-76*_//_*
    *_//_*[0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233
    on display localhost:50.0 on machine n12-76*_//_*
    *_//_*[1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234
    on display localhost:50.0 on machine n12-76*_//_*
    *_//_*[3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236
    on display localhost:50.0 on machine n12-76*_//_*
    *_//_*[n12-76:20232] 3 more processes have sent help message
    help-mpi-runtime.txt / mpi_init:warn-fork*_//_*
    *_//_*[n12-76:20232] Set MCA parameter "orte_base_help_aggregate"
    to 0 to see all help / error messages*_//_*
    *_//_*
    *_//_*....*_//_*
    *_//_*
    *_//_* 1*_//_*
    *_//_*[1]PETSC ERROR:
    
------------------------------------------------------------------------*_//_*
    *_//_*[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
    Violation, probably memory access out of range*_//_*
    *_//_*[1]PETSC ERROR: Try option -start_in_debugger or
    -on_error_attach_debugger*_//_*
    *_//_*[1]PETSC ERROR: or see
    http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC
    ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X
    to find memory corruption errors*_//_*
    *_//_*[1]PETSC ERROR: configure using --with-debugging=yes,
    recompile, link, and run *_//_*
    *_//_*[1]PETSC ERROR: to get more information on the crash.*_//_*
    *_//_*[1]PETSC ERROR: User provided function() line 0 in unknown
    directory unknown file (null)*_//_*
    *_//_*[3]PETSC ERROR:
    
------------------------------------------------------------------------*_//_*
    *_//_*[3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
    Violation, probably memory access out of range*_//_*
    *_//_*[3]PETSC ERROR: Try option -start_in_debugger or
    -on_error_attach_debugger*_//_*
    *_//_*[3]PETSC ERROR: or see
    http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC
    ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X
    to find memory corruption errors*_//_*
    *_//_*[3]PETSC ERROR: configure using --with-debugging=yes,
    recompile, link, and run *_//_*
    *_//_*[3]PETSC ERROR: to get more information on the crash.*_//_*
    *_//_*[3]PETSC ERROR: User provided function() line 0 in unknown
    directory unknown file (null)*_/

    ...

    Thank you.

    Yours sincerely,

    TAY wee-beng

    On 14/4/2014 9:05 PM, Barry Smith wrote:
       Because IO doesn’t always get flushed immediately it may not be hanging 
at this point.  It is better to use the option -start_in_debugger then type 
cont in each debugger window and then when you think it is “hanging” do a 
control C in each debugger window and type where to see where each process is 
you can also look around in the debugger at variables to see why it is 
“hanging” at that point.

        Barry

       This routines don’t have any parallel communication in them so are 
unlikely to hang.

    On Apr 14, 2014, at 6:52 AM, TAY wee-beng<[email protected]>  
<mailto:[email protected]>  wrote:

    Hi,

    My code hangs and I added in mpi_barrier and print to catch the bug. I found that it 
hangs after printing "7". Is it because I'm doing something wrong? I need to 
access the u,v,w array so I use DMDAVecGetArrayF90. After access, I use 
DMDAVecRestoreArrayF90.

             call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
             call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"3"
             call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
             call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"4"
             call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
             call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"5"
             call 
I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
             call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"6"
             call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)  !must be 
in reverse order
             call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"7"
             call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
             call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"8"
             call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
-- Thank you.

    Yours sincerely,

    TAY wee-beng





--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

Reply via email to