Try running under valgrind http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
On Apr 14, 2014, at 9:47 PM, TAY wee-beng <[email protected]> wrote: > > Hi Barry, > > As I mentioned earlier, the code works fine in PETSc debug mode but fails in > non-debug mode. > > I have attached my code. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 15/4/2014 2:26 AM, Barry Smith wrote: >> Please send the code that creates da_w and the declarations of w_array >> >> Barry >> >> On Apr 14, 2014, at 9:40 AM, TAY wee-beng >> <[email protected]> >> wrote: >> >> >>> Hi Barry, >>> >>> I'm not too sure how to do it. I'm running mpi. So I run: >>> >>> mpirun -n 4 ./a.out -start_in_debugger >>> >>> I got the msg below. Before the gdb windows appear (thru x11), the program >>> aborts. >>> >>> Also I tried running in another cluster and it worked. Also tried in the >>> current cluster in debug mode and it worked too. >>> >>> mpirun -n 4 ./a.out -start_in_debugger >>> -------------------------------------------------------------------------- >>> An MPI process has executed an operation involving a call to the >>> "fork()" system call to create a child process. Open MPI is currently >>> operating in a condition that could result in memory corruption or >>> other system errors; your MPI job may hang, crash, or produce silent >>> data corruption. The use of fork() (or system() or other calls that >>> create child processes) is strongly discouraged. >>> >>> The process that invoked fork was: >>> >>> Local host: n12-76 (PID 20235) >>> MPI_COMM_WORLD rank: 2 >>> >>> If you are *absolutely sure* that your application will successfully >>> and correctly survive a call to fork(), you may disable this warning >>> by setting the mpi_warn_on_fork MCA parameter to 0. >>> -------------------------------------------------------------------------- >>> [2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on display >>> localhost:50.0 on machine n12-76 >>> [0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on display >>> localhost:50.0 on machine n12-76 >>> [1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on display >>> localhost:50.0 on machine n12-76 >>> [3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on display >>> localhost:50.0 on machine n12-76 >>> [n12-76:20232] 3 more processes have sent help message help-mpi-runtime.txt >>> / mpi_init:warn-fork >>> [n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 to see all >>> help / error messages >>> >>> .... >>> >>> 1 >>> [1]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >>> probably memory access out of range >>> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [1]PETSC ERROR: or see >>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC ERROR: >>> or try http://valgrind.org >>> on GNU/linux and Apple Mac OS X to find memory corruption errors >>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and >>> run >>> [1]PETSC ERROR: to get more information on the crash. >>> [1]PETSC ERROR: User provided function() line 0 in unknown directory >>> unknown file (null) >>> [3]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >>> probably memory access out of range >>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [3]PETSC ERROR: or see >>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC ERROR: >>> or try http://valgrind.org >>> on GNU/linux and Apple Mac OS X to find memory corruption errors >>> [3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and >>> run >>> [3]PETSC ERROR: to get more information on the crash. >>> [3]PETSC ERROR: User provided function() line 0 in unknown directory >>> unknown file (null) >>> >>> ... >>> Thank you. >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 14/4/2014 9:05 PM, Barry Smith wrote: >>> >>>> Because IO doesn’t always get flushed immediately it may not be hanging >>>> at this point. It is better to use the option -start_in_debugger then >>>> type cont in each debugger window and then when you think it is “hanging” >>>> do a control C in each debugger window and type where to see where each >>>> process is you can also look around in the debugger at variables to see >>>> why it is “hanging” at that point. >>>> >>>> Barry >>>> >>>> This routines don’t have any parallel communication in them so are >>>> unlikely to hang. >>>> >>>> On Apr 14, 2014, at 6:52 AM, TAY wee-beng >>>> >>>> <[email protected]> >>>> >>>> wrote: >>>> >>>> >>>> >>>>> Hi, >>>>> >>>>> My code hangs and I added in mpi_barrier and print to catch the bug. I >>>>> found that it hangs after printing "7". Is it because I'm doing something >>>>> wrong? I need to access the u,v,w array so I use DMDAVecGetArrayF90. >>>>> After access, I use DMDAVecRestoreArrayF90. >>>>> >>>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"3" >>>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"4" >>>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"5" >>>>> call >>>>> I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array) >>>>> >>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"6" >>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) !must be >>>>> in reverse order >>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"7" >>>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"8" >>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>>>> -- >>>>> Thank you. >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> >>>>> > > > > <code.txt>
