On Mon, Apr 14, 2014 at 9:40 AM, TAY wee-beng <[email protected]> wrote:

>  Hi Barry,
>
> I'm not too sure how to do it. I'm running mpi. So I run:
>
>  mpirun -n 4 ./a.out -start_in_debugger
>

  add -debugger_pause 10

   Matt


> I got the msg below. Before the gdb windows appear (thru x11), the program
> aborts.
>
> Also I tried running in another cluster and it worked. Also tried in the
> current cluster in debug mode and it worked too.
>
> *mpirun -n 4 ./a.out -start_in_debugger*
>
> *--------------------------------------------------------------------------*
> *An MPI process has executed an operation involving a call to the*
> *"fork()" system call to create a child process.  Open MPI is currently*
> *operating in a condition that could result in memory corruption or*
> *other system errors; your MPI job may hang, crash, or produce silent*
> *data corruption.  The use of fork() (or system() or other calls that*
> *create child processes) is strongly discouraged.  *
>
> *The process that invoked fork was:*
>
> *  Local host:          n12-76 (PID 20235)*
> *  MPI_COMM_WORLD rank: 2*
>
> *If you are *absolutely sure* that your application will successfully*
> *and correctly survive a call to fork(), you may disable this warning*
> *by setting the mpi_warn_on_fork MCA parameter to 0.*
>
> *--------------------------------------------------------------------------*
> *[2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on display
> localhost:50.0 on machine n12-76*
> *[0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on display
> localhost:50.0 on machine n12-76*
> *[1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on display
> localhost:50.0 on machine n12-76*
> *[3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on display
> localhost:50.0 on machine n12-76*
> *[n12-76:20232] 3 more processes have sent help message
> help-mpi-runtime.txt / mpi_init:warn-fork*
> *[n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages*
>
> *....*
>
> * 1*
> *[1]PETSC ERROR:
> ------------------------------------------------------------------------*
> *[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range*
> *[1]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger*
> *[1]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> <http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>[1]PETSC
> ERROR: or try http://valgrind.org <http://valgrind.org> on GNU/linux and
> Apple Mac OS X to find memory corruption errors*
> *[1]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
> and run *
> *[1]PETSC ERROR: to get more information on the crash.*
> *[1]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file (null)*
> *[3]PETSC ERROR:
> ------------------------------------------------------------------------*
> *[3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range*
> *[3]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger*
> *[3]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> <http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>[3]PETSC
> ERROR: or try http://valgrind.org <http://valgrind.org> on GNU/linux and
> Apple Mac OS X to find memory corruption errors*
> *[3]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
> and run *
> *[3]PETSC ERROR: to get more information on the crash.*
> *[3]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file (null)*
>
> ...
>
> Thank you.
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 14/4/2014 9:05 PM, Barry Smith wrote:
>
>   Because IO doesn’t always get flushed immediately it may not be hanging at 
> this point.  It is better to use the option -start_in_debugger then type cont 
> in each debugger window and then when you think it is “hanging” do a control 
> C in each debugger window and type where to see where each process is you can 
> also look around in the debugger at variables to see why it is “hanging” at 
> that point.
>
>    Barry
>
>   This routines don’t have any parallel communication in them so are unlikely 
> to hang.
>
> On Apr 14, 2014, at 6:52 AM, TAY wee-beng <[email protected]> 
> <[email protected]> wrote:
>
>
>  Hi,
>
> My code hangs and I added in mpi_barrier and print to catch the bug. I found 
> that it hangs after printing "7". Is it because I'm doing something wrong? I 
> need to access the u,v,w array so I use DMDAVecGetArrayF90. After access, I 
> use DMDAVecRestoreArrayF90.
>
>         call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>         call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"3"
>         call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>         call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"4"
>         call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>         call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"5"
>         call 
> I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
>         call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"6"
>         call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)  !must be in 
> reverse order
>         call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"7"
>         call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>         call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"8"
>         call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
> --
> Thank you.
>
> Yours sincerely,
>
> TAY wee-beng
>
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

Reply via email to