Hmm,

      Interface DMDAVecGetArrayF90
        Subroutine DMDAVecGetArrayF903(da1, v,d1,ierr)
          USE_DM_HIDE
          DM_HIDE da1
          VEC_HIDE v
          PetscScalar,pointer :: d1(:,:,:)
          PetscErrorCode ierr
        End Subroutine

   So the d1 is a F90 POINTER. But your subroutine seems to be treating it as a 
“plain old Fortran array”?
real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)

   Note also that the beginning and end indices of the u,v,w, are different for 
each process see for example 
http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F  
(and they do not start at 1). This is how to get the loop bounds.


  Barry 

On Apr 18, 2014, at 10:58 AM, TAY wee-beng <[email protected]> wrote:

> Hi,
> 
> I tried to pinpoint the problem. I reduced my job size and hence I can run on 
> 1 processor. Tried using valgrind but perhaps I'm using the optimized 
> version, it didn't catch the error, besides saying "Segmentation fault (core 
> dumped)"
> 
> However, by re-writing my code, I found out a few things:
> 
> 1. if I write my code this way:
> 
> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
> 
> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
> 
> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
> 
> u_array = ....
> 
> v_array = ....
> 
> w_array = ....
> 
> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
> 
> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
> 
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
> 
> The code runs fine.
> 
> 2. if I write my code this way:
> 
> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
> 
> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
> 
> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
> 
> call uvw_array_change(u_array,v_array,w_array) -> this subroutine does the 
> same modification as the above.
> 
> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
> 
> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
> 
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) -> error
> 
> where the subroutine is:
> 
> subroutine uvw_array_change(u,v,w)
> 
> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
> 
> u ...
> v...
> w ...
> 
> end subroutine uvw_array_change.
> 
> The above will give an error at :
> 
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
> 
> 3. Same as above, except I change the order of the last 3 lines to:
> 
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
> 
> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
> 
> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
> 
> So they are now in reversed order. Now it works.
> 
> 4. Same as 2 or 3, except the subroutine is changed to :
> 
> subroutine uvw_array_change(u,v,w)
> 
> real(8), intent(inout) :: 
> u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
> 
> real(8), intent(inout) :: 
> v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
> 
> real(8), intent(inout) :: 
> w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
> 
> u ...
> v...
> w ...
> 
> end subroutine uvw_array_change.
> 
> The start_indices and end_indices are simply to shift the 0 indices of C 
> convention to that of the 1 indices of the Fortran convention. This is 
> necessary in my case because most of my codes start array counting at 1, 
> hence the "trick".
> 
> However, now no matter which order of the DMDAVecRestoreArrayF90 (as in 2 or 
> 3), error will occur at "call 
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) "
> 
> So did I violate and cause memory corruption due to the trick above? But I 
> can't think of any way other than the "trick" to continue using the 1 indices 
> convention.
> 
> Thank you.
> 
> Yours sincerely,
> 
> TAY wee-beng
> 
> On 15/4/2014 8:00 PM, Barry Smith wrote:
>>   Try running under valgrind 
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> 
>> 
>> On Apr 14, 2014, at 9:47 PM, TAY wee-beng <[email protected]> wrote:
>> 
>>> Hi Barry,
>>> 
>>> As I mentioned earlier, the code works fine in PETSc debug mode but fails 
>>> in non-debug mode.
>>> 
>>> I have attached my code.
>>> 
>>> Thank you
>>> 
>>> Yours sincerely,
>>> 
>>> TAY wee-beng
>>> 
>>> On 15/4/2014 2:26 AM, Barry Smith wrote:
>>>>   Please send the code that creates da_w and the declarations of w_array
>>>> 
>>>>   Barry
>>>> 
>>>> On Apr 14, 2014, at 9:40 AM, TAY wee-beng
>>>> <[email protected]>
>>>>  wrote:
>>>> 
>>>> 
>>>>> Hi Barry,
>>>>> 
>>>>> I'm not too sure how to do it. I'm running mpi. So I run:
>>>>> 
>>>>>  mpirun -n 4 ./a.out -start_in_debugger
>>>>> 
>>>>> I got the msg below. Before the gdb windows appear (thru x11), the 
>>>>> program aborts.
>>>>> 
>>>>> Also I tried running in another cluster and it worked. Also tried in the 
>>>>> current cluster in debug mode and it worked too.
>>>>> 
>>>>> mpirun -n 4 ./a.out -start_in_debugger
>>>>> --------------------------------------------------------------------------
>>>>> An MPI process has executed an operation involving a call to the
>>>>> "fork()" system call to create a child process.  Open MPI is currently
>>>>> operating in a condition that could result in memory corruption or
>>>>> other system errors; your MPI job may hang, crash, or produce silent
>>>>> data corruption.  The use of fork() (or system() or other calls that
>>>>> create child processes) is strongly discouraged.
>>>>> 
>>>>> The process that invoked fork was:
>>>>> 
>>>>>   Local host:          n12-76 (PID 20235)
>>>>>   MPI_COMM_WORLD rank: 2
>>>>> 
>>>>> If you are *absolutely sure* that your application will successfully
>>>>> and correctly survive a call to fork(), you may disable this warning
>>>>> by setting the mpi_warn_on_fork MCA parameter to 0.
>>>>> --------------------------------------------------------------------------
>>>>> [2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on display 
>>>>> localhost:50.0 on machine n12-76
>>>>> [0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on display 
>>>>> localhost:50.0 on machine n12-76
>>>>> [1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on display 
>>>>> localhost:50.0 on machine n12-76
>>>>> [3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on display 
>>>>> localhost:50.0 on machine n12-76
>>>>> [n12-76:20232] 3 more processes have sent help message 
>>>>> help-mpi-runtime.txt / mpi_init:warn-fork
>>>>> [n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
>>>>> all help / error messages
>>>>> 
>>>>> ....
>>>>> 
>>>>>  1
>>>>> [1]PETSC ERROR: 
>>>>> ------------------------------------------------------------------------
>>>>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
>>>>> probably memory access out of range
>>>>> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>> [1]PETSC ERROR: or see
>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC 
>>>>> ERROR: or try http://valgrind.org
>>>>>  on GNU/linux and Apple Mac OS X to find memory corruption errors
>>>>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, 
>>>>> and run
>>>>> [1]PETSC ERROR: to get more information on the crash.
>>>>> [1]PETSC ERROR: User provided function() line 0 in unknown directory 
>>>>> unknown file (null)
>>>>> [3]PETSC ERROR: 
>>>>> ------------------------------------------------------------------------
>>>>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
>>>>> probably memory access out of range
>>>>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>> [3]PETSC ERROR: or see
>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC 
>>>>> ERROR: or try http://valgrind.org
>>>>>  on GNU/linux and Apple Mac OS X to find memory corruption errors
>>>>> [3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, 
>>>>> and run
>>>>> [3]PETSC ERROR: to get more information on the crash.
>>>>> [3]PETSC ERROR: User provided function() line 0 in unknown directory 
>>>>> unknown file (null)
>>>>> 
>>>>> ...
>>>>> Thank you.
>>>>> 
>>>>> Yours sincerely,
>>>>> 
>>>>> TAY wee-beng
>>>>> 
>>>>> On 14/4/2014 9:05 PM, Barry Smith wrote:
>>>>> 
>>>>>>   Because IO doesn’t always get flushed immediately it may not be 
>>>>>> hanging at this point.  It is better to use the option 
>>>>>> -start_in_debugger then type cont in each debugger window and then when 
>>>>>> you think it is “hanging” do a control C in each debugger window and 
>>>>>> type where to see where each process is you can also look around in the 
>>>>>> debugger at variables to see why it is “hanging” at that point.
>>>>>> 
>>>>>>    Barry
>>>>>> 
>>>>>>   This routines don’t have any parallel communication in them so are 
>>>>>> unlikely to hang.
>>>>>> 
>>>>>> On Apr 14, 2014, at 6:52 AM, TAY wee-beng
>>>>>> 
>>>>>> <[email protected]>
>>>>>> 
>>>>>>  wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> My code hangs and I added in mpi_barrier and print to catch the bug. I 
>>>>>>> found that it hangs after printing "7". Is it because I'm doing 
>>>>>>> something wrong? I need to access the u,v,w array so I use 
>>>>>>> DMDAVecGetArrayF90. After access, I use DMDAVecRestoreArrayF90.
>>>>>>> 
>>>>>>>         call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>>>>>         call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"3"
>>>>>>>         call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>         call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"4"
>>>>>>>         call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>>>>>         call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"5"
>>>>>>>         call 
>>>>>>> I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
>>>>>>>         call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"6"
>>>>>>>         call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)  !must 
>>>>>>> be in reverse order
>>>>>>>         call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"7"
>>>>>>>         call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>>>>>         call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"8"
>>>>>>>         call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>>>>> -- 
>>>>>>> Thank you.
>>>>>>> 
>>>>>>> Yours sincerely,
>>>>>>> 
>>>>>>> TAY wee-beng
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>> 
>>> 
>>> <code.txt>
> 

Reply via email to