On Mon, Apr 14, 2014 at 9:40 AM, TAY wee-beng <[email protected]> wrote:
> Hi Barry, > > I'm not too sure how to do it. I'm running mpi. So I run: > > mpirun -n 4 ./a.out -start_in_debugger > add -debugger_pause 10 Matt > I got the msg below. Before the gdb windows appear (thru x11), the program > aborts. > > Also I tried running in another cluster and it worked. Also tried in the > current cluster in debug mode and it worked too. > > *mpirun -n 4 ./a.out -start_in_debugger* > > *--------------------------------------------------------------------------* > *An MPI process has executed an operation involving a call to the* > *"fork()" system call to create a child process. Open MPI is currently* > *operating in a condition that could result in memory corruption or* > *other system errors; your MPI job may hang, crash, or produce silent* > *data corruption. The use of fork() (or system() or other calls that* > *create child processes) is strongly discouraged. * > > *The process that invoked fork was:* > > * Local host: n12-76 (PID 20235)* > * MPI_COMM_WORLD rank: 2* > > *If you are *absolutely sure* that your application will successfully* > *and correctly survive a call to fork(), you may disable this warning* > *by setting the mpi_warn_on_fork MCA parameter to 0.* > > *--------------------------------------------------------------------------* > *[2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on display > localhost:50.0 on machine n12-76* > *[0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on display > localhost:50.0 on machine n12-76* > *[1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on display > localhost:50.0 on machine n12-76* > *[3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on display > localhost:50.0 on machine n12-76* > *[n12-76:20232] 3 more processes have sent help message > help-mpi-runtime.txt / mpi_init:warn-fork* > *[n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 to see > all help / error messages* > > *....* > > * 1* > *[1]PETSC ERROR: > ------------------------------------------------------------------------* > *[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range* > *[1]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger* > *[1]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > <http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>[1]PETSC > ERROR: or try http://valgrind.org <http://valgrind.org> on GNU/linux and > Apple Mac OS X to find memory corruption errors* > *[1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run * > *[1]PETSC ERROR: to get more information on the crash.* > *[1]PETSC ERROR: User provided function() line 0 in unknown directory > unknown file (null)* > *[3]PETSC ERROR: > ------------------------------------------------------------------------* > *[3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range* > *[3]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger* > *[3]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > <http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>[3]PETSC > ERROR: or try http://valgrind.org <http://valgrind.org> on GNU/linux and > Apple Mac OS X to find memory corruption errors* > *[3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run * > *[3]PETSC ERROR: to get more information on the crash.* > *[3]PETSC ERROR: User provided function() line 0 in unknown directory > unknown file (null)* > > ... > > Thank you. > > Yours sincerely, > > TAY wee-beng > > On 14/4/2014 9:05 PM, Barry Smith wrote: > > Because IO doesn’t always get flushed immediately it may not be hanging at > this point. It is better to use the option -start_in_debugger then type cont > in each debugger window and then when you think it is “hanging” do a control > C in each debugger window and type where to see where each process is you can > also look around in the debugger at variables to see why it is “hanging” at > that point. > > Barry > > This routines don’t have any parallel communication in them so are unlikely > to hang. > > On Apr 14, 2014, at 6:52 AM, TAY wee-beng <[email protected]> > <[email protected]> wrote: > > > Hi, > > My code hangs and I added in mpi_barrier and print to catch the bug. I found > that it hangs after printing "7". Is it because I'm doing something wrong? I > need to access the u,v,w array so I use DMDAVecGetArrayF90. After access, I > use DMDAVecRestoreArrayF90. > > call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) > call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"3" > call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) > call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"4" > call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) > call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"5" > call > I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array) > call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"6" > call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) !must be in > reverse order > call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"7" > call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) > call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"8" > call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) > -- > Thank you. > > Yours sincerely, > > TAY wee-beng > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
