Hmm,
Interface DMDAVecGetArrayF90
Subroutine DMDAVecGetArrayF903(da1, v,d1,ierr)
USE_DM_HIDE
DM_HIDE da1
VEC_HIDE v
PetscScalar,pointer :: d1(:,:,:)
PetscErrorCode ierr
End Subroutine
So the d1 is a F90 POINTER. But your subroutine seems to be treating it as a
“plain old Fortran array”?
real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
Note also that the beginning and end indices of the u,v,w, are different for
each process see for example
http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F
(and they do not start at 1). This is how to get the loop bounds.
Barry
On Apr 18, 2014, at 10:58 AM, TAY wee-beng <[email protected]> wrote:
> Hi,
>
> I tried to pinpoint the problem. I reduced my job size and hence I can run on
> 1 processor. Tried using valgrind but perhaps I'm using the optimized
> version, it didn't catch the error, besides saying "Segmentation fault (core
> dumped)"
>
> However, by re-writing my code, I found out a few things:
>
> 1. if I write my code this way:
>
> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> u_array = ....
>
> v_array = ....
>
> w_array = ....
>
> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> The code runs fine.
>
> 2. if I write my code this way:
>
> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> call uvw_array_change(u_array,v_array,w_array) -> this subroutine does the
> same modification as the above.
>
> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) -> error
>
> where the subroutine is:
>
> subroutine uvw_array_change(u,v,w)
>
> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
>
> u ...
> v...
> w ...
>
> end subroutine uvw_array_change.
>
> The above will give an error at :
>
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> 3. Same as above, except I change the order of the last 3 lines to:
>
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> So they are now in reversed order. Now it works.
>
> 4. Same as 2 or 3, except the subroutine is changed to :
>
> subroutine uvw_array_change(u,v,w)
>
> real(8), intent(inout) ::
> u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
> real(8), intent(inout) ::
> v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
> real(8), intent(inout) ::
> w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
> u ...
> v...
> w ...
>
> end subroutine uvw_array_change.
>
> The start_indices and end_indices are simply to shift the 0 indices of C
> convention to that of the 1 indices of the Fortran convention. This is
> necessary in my case because most of my codes start array counting at 1,
> hence the "trick".
>
> However, now no matter which order of the DMDAVecRestoreArrayF90 (as in 2 or
> 3), error will occur at "call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) "
>
> So did I violate and cause memory corruption due to the trick above? But I
> can't think of any way other than the "trick" to continue using the 1 indices
> convention.
>
> Thank you.
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 15/4/2014 8:00 PM, Barry Smith wrote:
>> Try running under valgrind
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>
>>
>> On Apr 14, 2014, at 9:47 PM, TAY wee-beng <[email protected]> wrote:
>>
>>> Hi Barry,
>>>
>>> As I mentioned earlier, the code works fine in PETSc debug mode but fails
>>> in non-debug mode.
>>>
>>> I have attached my code.
>>>
>>> Thank you
>>>
>>> Yours sincerely,
>>>
>>> TAY wee-beng
>>>
>>> On 15/4/2014 2:26 AM, Barry Smith wrote:
>>>> Please send the code that creates da_w and the declarations of w_array
>>>>
>>>> Barry
>>>>
>>>> On Apr 14, 2014, at 9:40 AM, TAY wee-beng
>>>> <[email protected]>
>>>> wrote:
>>>>
>>>>
>>>>> Hi Barry,
>>>>>
>>>>> I'm not too sure how to do it. I'm running mpi. So I run:
>>>>>
>>>>> mpirun -n 4 ./a.out -start_in_debugger
>>>>>
>>>>> I got the msg below. Before the gdb windows appear (thru x11), the
>>>>> program aborts.
>>>>>
>>>>> Also I tried running in another cluster and it worked. Also tried in the
>>>>> current cluster in debug mode and it worked too.
>>>>>
>>>>> mpirun -n 4 ./a.out -start_in_debugger
>>>>> --------------------------------------------------------------------------
>>>>> An MPI process has executed an operation involving a call to the
>>>>> "fork()" system call to create a child process. Open MPI is currently
>>>>> operating in a condition that could result in memory corruption or
>>>>> other system errors; your MPI job may hang, crash, or produce silent
>>>>> data corruption. The use of fork() (or system() or other calls that
>>>>> create child processes) is strongly discouraged.
>>>>>
>>>>> The process that invoked fork was:
>>>>>
>>>>> Local host: n12-76 (PID 20235)
>>>>> MPI_COMM_WORLD rank: 2
>>>>>
>>>>> If you are *absolutely sure* that your application will successfully
>>>>> and correctly survive a call to fork(), you may disable this warning
>>>>> by setting the mpi_warn_on_fork MCA parameter to 0.
>>>>> --------------------------------------------------------------------------
>>>>> [2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on display
>>>>> localhost:50.0 on machine n12-76
>>>>> [0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on display
>>>>> localhost:50.0 on machine n12-76
>>>>> [1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on display
>>>>> localhost:50.0 on machine n12-76
>>>>> [3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on display
>>>>> localhost:50.0 on machine n12-76
>>>>> [n12-76:20232] 3 more processes have sent help message
>>>>> help-mpi-runtime.txt / mpi_init:warn-fork
>>>>> [n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 to see
>>>>> all help / error messages
>>>>>
>>>>> ....
>>>>>
>>>>> 1
>>>>> [1]PETSC ERROR:
>>>>> ------------------------------------------------------------------------
>>>>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>>>>> probably memory access out of range
>>>>> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>> [1]PETSC ERROR: or see
>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC
>>>>> ERROR: or try http://valgrind.org
>>>>> on GNU/linux and Apple Mac OS X to find memory corruption errors
>>>>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
>>>>> and run
>>>>> [1]PETSC ERROR: to get more information on the crash.
>>>>> [1]PETSC ERROR: User provided function() line 0 in unknown directory
>>>>> unknown file (null)
>>>>> [3]PETSC ERROR:
>>>>> ------------------------------------------------------------------------
>>>>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>>>>> probably memory access out of range
>>>>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>> [3]PETSC ERROR: or see
>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC
>>>>> ERROR: or try http://valgrind.org
>>>>> on GNU/linux and Apple Mac OS X to find memory corruption errors
>>>>> [3]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
>>>>> and run
>>>>> [3]PETSC ERROR: to get more information on the crash.
>>>>> [3]PETSC ERROR: User provided function() line 0 in unknown directory
>>>>> unknown file (null)
>>>>>
>>>>> ...
>>>>> Thank you.
>>>>>
>>>>> Yours sincerely,
>>>>>
>>>>> TAY wee-beng
>>>>>
>>>>> On 14/4/2014 9:05 PM, Barry Smith wrote:
>>>>>
>>>>>> Because IO doesn’t always get flushed immediately it may not be
>>>>>> hanging at this point. It is better to use the option
>>>>>> -start_in_debugger then type cont in each debugger window and then when
>>>>>> you think it is “hanging” do a control C in each debugger window and
>>>>>> type where to see where each process is you can also look around in the
>>>>>> debugger at variables to see why it is “hanging” at that point.
>>>>>>
>>>>>> Barry
>>>>>>
>>>>>> This routines don’t have any parallel communication in them so are
>>>>>> unlikely to hang.
>>>>>>
>>>>>> On Apr 14, 2014, at 6:52 AM, TAY wee-beng
>>>>>>
>>>>>> <[email protected]>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> My code hangs and I added in mpi_barrier and print to catch the bug. I
>>>>>>> found that it hangs after printing "7". Is it because I'm doing
>>>>>>> something wrong? I need to access the u,v,w array so I use
>>>>>>> DMDAVecGetArrayF90. After access, I use DMDAVecRestoreArrayF90.
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"3"
>>>>>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"4"
>>>>>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"5"
>>>>>>> call
>>>>>>> I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"6"
>>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) !must
>>>>>>> be in reverse order
>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"7"
>>>>>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"8"
>>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>>>>> --
>>>>>>> Thank you.
>>>>>>>
>>>>>>> Yours sincerely,
>>>>>>>
>>>>>>> TAY wee-beng
>>>>>>>
>>>>>>>
>>>>>>>
>>>
>>>
>>> <code.txt>
>