Hi Barry,
As I mentioned earlier, the code works fine in PETSc debug mode but
fails in non-debug mode.
I have attached my code.
Thank you
Yours sincerely,
TAY wee-beng
On 15/4/2014 2:26 AM, Barry Smith wrote:
Please send the code that creates da_w and the declarations of w_array
Barry
On Apr 14, 2014, at 9:40 AM, TAY wee-beng<[email protected]> wrote:
Hi Barry,
I'm not too sure how to do it. I'm running mpi. So I run:
mpirun -n 4 ./a.out -start_in_debugger
I got the msg below. Before the gdb windows appear (thru x11), the program
aborts.
Also I tried running in another cluster and it worked. Also tried in the
current cluster in debug mode and it worked too.
mpirun -n 4 ./a.out -start_in_debugger
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: n12-76 (PID 20235)
MPI_COMM_WORLD rank: 2
If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
[2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on display
localhost:50.0 on machine n12-76
[0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on display
localhost:50.0 on machine n12-76
[1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on display
localhost:50.0 on machine n12-76
[3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on display
localhost:50.0 on machine n12-76
[n12-76:20232] 3 more processes have sent help message help-mpi-runtime.txt /
mpi_init:warn-fork
[n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages
....
1
[1]PETSC ERROR:
------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably
memory access out of range
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or
seehttp://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC ERROR:
or tryhttp://valgrind.org on GNU/linux and Apple Mac OS X to find memory
corruption errors
[1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[1]PETSC ERROR: to get more information on the crash.
[1]PETSC ERROR: User provided function() line 0 in unknown directory unknown
file (null)
[3]PETSC ERROR:
------------------------------------------------------------------------
[3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably
memory access out of range
[3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[3]PETSC ERROR: or
seehttp://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC ERROR:
or tryhttp://valgrind.org on GNU/linux and Apple Mac OS X to find memory
corruption errors
[3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[3]PETSC ERROR: to get more information on the crash.
[3]PETSC ERROR: User provided function() line 0 in unknown directory unknown
file (null)
...
Thank you.
Yours sincerely,
TAY wee-beng
On 14/4/2014 9:05 PM, Barry Smith wrote:
Because IO doesn’t always get flushed immediately it may not be hanging at
this point. It is better to use the option -start_in_debugger then type cont
in each debugger window and then when you think it is “hanging” do a control C
in each debugger window and type where to see where each process is you can
also look around in the debugger at variables to see why it is “hanging” at
that point.
Barry
This routines don’t have any parallel communication in them so are unlikely
to hang.
On Apr 14, 2014, at 6:52 AM, TAY wee-beng
<[email protected]>
wrote:
Hi,
My code hangs and I added in mpi_barrier and print to catch the bug. I found that it
hangs after printing "7". Is it because I'm doing something wrong? I need to
access the u,v,w array so I use DMDAVecGetArrayF90. After access, I use
DMDAVecRestoreArrayF90.
call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"3"
call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"4"
call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"5"
call
I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"6"
call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) !must be in
reverse order
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"7"
call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"8"
call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
--
Thank you.
Yours sincerely,
TAY wee-beng
DM da_u,da_v,da_w
Vec u_local,u_global,v_local,v_global,w_local,w_global,p_local,p_global
call
DMDACreate3d(MPI_COMM_WORLD,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_STENCIL_STAR,size_x,size_y,&
size_z,1,1,num_procs,1,stencil_width,lx,ly,lz,da_u,ierr)
call
DMDACreate3d(MPI_COMM_WORLD,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_STENCIL_STAR,size_x,size_y,&
size_z,1,1,num_procs,1,stencil_width,lx,ly,lz,da_v,ierr)
call
DMDACreate3d(MPI_COMM_WORLD,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_STENCIL_STAR,size_x,size_y,&
size_z,1,1,num_procs,1,stencil_width,lx,ly,lz,da_w,ierr)
call DMCreateGlobalVector(da_u,u_global,ierr)
call DMCreateLocalVector(da_u,u_local,ierr)
call DMCreateGlobalVector(da_v,v_global,ierr)
call DMCreateLocalVector(da_v,v_local,ierr)
call DMCreateGlobalVector(da_w,w_global,ierr)
call DMCreateLocalVector(da_w,w_local,ierr)
...
To access the arrays, I did the following:
call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr); call
DMDAVecGetArrayF90(da_v,v_local,v_array,ierr); call
DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
call I_IIB_uv_initial_1st_dm0(...,u_array,v_array,w_array)
call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) -> hangs at this
point or
call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) -> hangs at this point
subroutine I_IIB_uv_initial_1st_dm0(...,I_cell_w,u,v,w)
integer :: i,j,k,ijk
...
real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
...
u(i-1,j-1,k-1)=0.
...
v(i-1,j-1,k-1)=0.
...
w(i-1,j-1,k-1)=0.
...
end subroutine I_IIB_uv_initial_1st_dm0