Please send the code that creates da_w and the declarations of w_array Barry
On Apr 14, 2014, at 9:40 AM, TAY wee-beng <[email protected]> wrote: > Hi Barry, > > I'm not too sure how to do it. I'm running mpi. So I run: > > mpirun -n 4 ./a.out -start_in_debugger > > I got the msg below. Before the gdb windows appear (thru x11), the program > aborts. > > Also I tried running in another cluster and it worked. Also tried in the > current cluster in debug mode and it worked too. > > mpirun -n 4 ./a.out -start_in_debugger > -------------------------------------------------------------------------- > An MPI process has executed an operation involving a call to the > "fork()" system call to create a child process. Open MPI is currently > operating in a condition that could result in memory corruption or > other system errors; your MPI job may hang, crash, or produce silent > data corruption. The use of fork() (or system() or other calls that > create child processes) is strongly discouraged. > > The process that invoked fork was: > > Local host: n12-76 (PID 20235) > MPI_COMM_WORLD rank: 2 > > If you are *absolutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > -------------------------------------------------------------------------- > [2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on display > localhost:50.0 on machine n12-76 > [0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on display > localhost:50.0 on machine n12-76 > [1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on display > localhost:50.0 on machine n12-76 > [3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on display > localhost:50.0 on machine n12-76 > [n12-76:20232] 3 more processes have sent help message help-mpi-runtime.txt / > mpi_init:warn-fork > [n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 to see all > help / error messages > > .... > > 1 > [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [1]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC ERROR: > or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory > corruption errors > [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and > run > [1]PETSC ERROR: to get more information on the crash. > [1]PETSC ERROR: User provided function() line 0 in unknown directory unknown > file (null) > [3]PETSC ERROR: > ------------------------------------------------------------------------ > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [3]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC ERROR: > or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory > corruption errors > [3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and > run > [3]PETSC ERROR: to get more information on the crash. > [3]PETSC ERROR: User provided function() line 0 in unknown directory unknown > file (null) > > ... > Thank you. > > Yours sincerely, > > TAY wee-beng > > On 14/4/2014 9:05 PM, Barry Smith wrote: >> Because IO doesn’t always get flushed immediately it may not be hanging at >> this point. It is better to use the option -start_in_debugger then type >> cont in each debugger window and then when you think it is “hanging” do a >> control C in each debugger window and type where to see where each process >> is you can also look around in the debugger at variables to see why it is >> “hanging” at that point. >> >> Barry >> >> This routines don’t have any parallel communication in them so are >> unlikely to hang. >> >> On Apr 14, 2014, at 6:52 AM, TAY wee-beng >> <[email protected]> >> wrote: >> >> >>> Hi, >>> >>> My code hangs and I added in mpi_barrier and print to catch the bug. I >>> found that it hangs after printing "7". Is it because I'm doing something >>> wrong? I need to access the u,v,w array so I use DMDAVecGetArrayF90. After >>> access, I use DMDAVecRestoreArrayF90. >>> >>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"3" >>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"4" >>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"5" >>> call >>> I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array) >>> >>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"6" >>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) !must be in >>> reverse order >>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"7" >>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"8" >>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>> -- >>> Thank you. >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> >
