On May 18, 2014, at 10:28 PM, TAY wee-beng <[email protected]> wrote:

> On 19/5/2014 9:53 AM, Matthew Knepley wrote:
>> On Sun, May 18, 2014 at 8:18 PM, TAY wee-beng <[email protected]> wrote:
>> Hi Barry,
>> 
>> I am trying to sort out the details so that it's easier to pinpoint the 
>> error. However, I tried on gnu gfortran and it worked well. On intel ifort, 
>> it stopped at one of the "DMDAVecGetArrayF90". Does it definitely mean that 
>> it's a bug in ifort? Do you work with both intel and gnu?
>> 
>> Yes it works with Intel. Is this using optimization?
> Hi Matt,
> 
> I forgot to add that in non-optimized cases, it works with gnu and intel. 
> However, in optimized cases, it works with gnu, but not intel.  Does it 
> definitely mean that it's a bug in ifort?

  No. Does it run clean under valgrind?


>> 
>>   Matt
>>  
>> 
>> Thank you
>> 
>> Yours sincerely,
>> 
>> TAY wee-beng
>> 
>> On 14/5/2014 12:03 AM, Barry Smith wrote:
>>    Please send you current code. So we may compile and run it.
>> 
>>    Barry
>> 
>> 
>>    On May 12, 2014, at 9:52 PM, TAY wee-beng <[email protected]> wrote:
>> 
>> Hi,
>> 
>> I have sent the entire code a while ago. Is there any answer? I was also 
>> trying myself but it worked for some intel compiler, and some not. I'm still 
>> not able to find the answer. gnu compilers for most cluster are old versions 
>> so they are not able to compile since I have allocatable structures.
>> 
>> Thank you.
>> 
>> Yours sincerely,
>> 
>> TAY wee-beng
>> 
>> On 21/4/2014 8:58 AM, Barry Smith wrote:
>>     Please send the entire code. If we can run it and reproduce the problem 
>> we can likely track down the issue much faster than through endless rounds 
>> of email.
>> 
>>     Barry
>> 
>> On Apr 20, 2014, at 7:49 PM, TAY wee-beng <[email protected]> wrote:
>> 
>> On 20/4/2014 8:39 AM, TAY wee-beng wrote:
>> On 20/4/2014 1:02 AM, Matthew Knepley wrote:
>> On Sat, Apr 19, 2014 at 10:49 AM, TAY wee-beng <[email protected]> wrote:
>> On 19/4/2014 11:39 PM, Matthew Knepley wrote:
>> On Sat, Apr 19, 2014 at 10:16 AM, TAY wee-beng <[email protected]> wrote:
>> On 19/4/2014 10:55 PM, Matthew Knepley wrote:
>> On Sat, Apr 19, 2014 at 9:14 AM, TAY wee-beng <[email protected]> wrote:
>> On 19/4/2014 6:48 PM, Matthew Knepley wrote:
>> On Sat, Apr 19, 2014 at 4:59 AM, TAY wee-beng <[email protected]> wrote:
>> On 19/4/2014 1:17 PM, Barry Smith wrote:
>> On Apr 19, 2014, at 12:11 AM, TAY wee-beng <[email protected]> wrote:
>> 
>> On 19/4/2014 12:10 PM, Barry Smith wrote:
>> On Apr 18, 2014, at 9:57 PM, TAY wee-beng <[email protected]> wrote:
>> 
>> On 19/4/2014 3:53 AM, Barry Smith wrote:
>>     Hmm,
>> 
>>         Interface DMDAVecGetArrayF90
>>           Subroutine DMDAVecGetArrayF903(da1, v,d1,ierr)
>>             USE_DM_HIDE
>>             DM_HIDE da1
>>             VEC_HIDE v
>>             PetscScalar,pointer :: d1(:,:,:)
>>             PetscErrorCode ierr
>>           End Subroutine
>> 
>>      So the d1 is a F90 POINTER. But your subroutine seems to be treating it 
>> as a “plain old Fortran array”?
>> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
>> Hi,
>> 
>> So d1 is a pointer, and it's different if I declare it as "plain old Fortran 
>> array"? Because I declare it as a Fortran array and it works w/o any problem 
>> if I only call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 with "u".
>> 
>> But if I call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 with "u", "v" 
>> and "w", error starts to happen. I wonder why...
>> 
>> Also, supposed I call:
>> 
>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>> 
>>      call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>> 
>>      call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>> 
>> u_array ....
>> 
>> v_array .... etc
>> 
>> Now to restore the array, does it matter the sequence they are restored?
>>      No it should not matter. If it matters that is a sign that memory has 
>> been written to incorrectly earlier in the code.
>> 
>> Hi,
>> 
>> Hmm, I have been getting different results on different intel compilers. I'm 
>> not sure if MPI played a part but I'm only using a single processor. In the 
>> debug mode, things run without problem. In optimized mode, in some cases, 
>> the code aborts even doing simple initialization:
>> 
>> 
>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>> 
>>      call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>> 
>>      call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>> 
>>      call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)
>> 
>>      u_array = 0.d0
>> 
>>      v_array = 0.d0
>> 
>>      w_array = 0.d0
>> 
>>      p_array = 0.d0
>> 
>> 
>>      call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)
>> 
>> 
>>      call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>> 
>>      call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>> 
>>      call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>> 
>> The code aborts at call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr), 
>> giving segmentation error. But other                                         
>>                                             version of intel compiler passes 
>> thru this part w/o error. Since the response is different among different 
>> compilers, is this PETSc or intel 's bug? Or mvapich or openmpi?
>> 
>> We do this is a bunch of examples. Can you reproduce this different behavior 
>> in src/dm/examples/tutorials/ex11f90.F?
>> Hi Matt,
>> 
>> Do you mean putting the above lines into ex11f90.F and test?
>> 
>> It already has DMDAVecGetArray(). Just run it.
>> Hi,
>> 
>> It worked. The differences between mine and the code is the way the fortran 
>> modules are defined, and the ex11f90 only uses global vectors. Does it make 
>> a difference whether global or local vectors are used? Because the way it 
>> accesses x1 only touches the local region.
>> 
>> No the global/local difference should not matter.
>>   Also, before using DMDAVecGetArrayF90, DMGetGlobalVector must be used 1st, 
>> is that so? I can't find the equivalent for local vector though.
>> 
>> DMGetLocalVector()
>> Ops, I do not have DMGetLocalVector and DMRestoreLocalVector in my code. 
>> Does it matter?
>> 
>> If so, when should I call them?
>> 
>> You just need a local vector from somewhere.
>> Hi,
>> 
>> Anyone can help with the questions below? Still trying to find why my code 
>> doesn't work.
>> 
>> Thanks.
>> Hi,
>> 
>> I insert part of my error region code into ex11f90:
>> 
>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>           call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>           call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>           call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)
>> 
>>      u_array = 0.d0
>>           v_array = 0.d0
>>           w_array = 0.d0
>>           p_array = 0.d0
>> 
>>      call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)
>> 
>>      call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>> 
>>      call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>> 
>>      call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>> 
>> It worked w/o error. I'm going to change the way the modules are defined in 
>> my code.
>> 
>> My code contains a main program and a number of modules files, with 
>> subroutines inside e.g.
>> 
>> module solve
>>                   <- add include file?
>> subroutine RRK
>>                   <- add include file?
>> end subroutine RRK
>> 
>> end module solve
>> 
>> So where should the include files (#include <finclude/petscdmda.h90>) be 
>> placed?
>> 
>> After the module or inside the subroutine?
>> 
>> Thanks.
>>    Matt
>>   Thanks.
>>     Matt
>>   Thanks.
>>     Matt
>>   Thanks
>> 
>> Regards.
>>     Matt
>>   As in w, then v and u?
>> 
>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>> 
>> thanks
>>      Note also that the beginning and end indices of the u,v,w, are 
>> different for each process see for example 
>> http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F  
>> (and they do not start at 1). This is how to get the loop bounds.
>> Hi,
>> 
>> In my case, I fixed the u,v,w such that their indices are the same. I also 
>> checked using DMDAGetCorners and DMDAGetGhostCorners. Now the problem lies 
>> in my subroutine treating it as a “plain old Fortran array”.
>> 
>> If I declare them as pointers, their indices follow the C 0 start 
>> convention, is that so?
>>      Not really. It is that in each process you need to access them from the 
>> indices indicated by DMDAGetCorners() for global vectors and                 
>>                 DMDAGetGhostCorners() for local vectors.  So really C or 
>> Fortran                                                           doesn’t 
>> make any difference.
>> 
>> 
>> So my problem now is that in my old MPI code, the u(i,j,k) follow the 
>> Fortran 1 start convention. Is there some way to manipulate such that I do 
>> not have to change my u(i,j,k) to u(i-1,j-1,k-1)?
>>     If you code wishes to access them with indices plus one from the values 
>> returned by DMDAGetCorners() for global vectors and DMDAGetGhostCorners() 
>> for local vectors then you need to manually subtract off the 1.
>> 
>>     Barry
>> 
>> Thanks.
>>     Barry
>> 
>> On Apr 18, 2014, at 10:58 AM, TAY wee-beng <[email protected]> wrote:
>> 
>> Hi,
>> 
>> I tried to pinpoint the problem. I reduced my job size and hence I can run 
>> on 1 processor. Tried using valgrind but perhaps I'm using the optimized 
>> version, it didn't catch the error, besides saying "Segmentation fault (core 
>> dumped)"
>> 
>> However, by re-writing my code, I found out a few things:
>> 
>> 1. if I write my code this way:
>> 
>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>> 
>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>> 
>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>> 
>> u_array = ....
>> 
>> v_array = ....
>> 
>> w_array = ....
>> 
>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>> 
>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>> 
>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>> 
>> The code runs fine.
>> 
>> 2. if I write my code this way:
>> 
>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>> 
>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>> 
>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>> 
>> call uvw_array_change(u_array,v_array,w_array) -> this subroutine does the 
>> same modification as the above.
>> 
>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>> 
>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>> 
>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) -> error
>> 
>> where the subroutine is:
>> 
>> subroutine uvw_array_change(u,v,w)
>> 
>> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
>> 
>> u ...
>> v...
>> w ...
>> 
>> end subroutine uvw_array_change.
>> 
>> The above will give an error at :
>> 
>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>> 
>> 3. Same as above, except I change the order of the last 3 lines to:
>> 
>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>> 
>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>> 
>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>> 
>> So they are now in reversed order. Now it works.
>> 
>> 4. Same as 2 or 3, except the subroutine is changed to :
>> 
>> subroutine uvw_array_change(u,v,w)
>> 
>> real(8), intent(inout) :: 
>> u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>> 
>> real(8), intent(inout) :: 
>> v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>> 
>> real(8), intent(inout) :: 
>> w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>> 
>> u ...
>> v...
>> w ...
>> 
>> end subroutine uvw_array_change.
>> 
>> The start_indices and end_indices are simply to shift the 0 indices of C 
>> convention to that of the 1 indices of the Fortran convention. This is 
>> necessary in my case because most of my codes start array counting at 1, 
>> hence the "trick".
>> 
>> However, now no matter which order of the DMDAVecRestoreArrayF90 (as in 2 or 
>> 3), error will occur at "call 
>> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) "
>> 
>> So did I violate and cause memory corruption due to the trick above? But I 
>> can't think of any way other                                                 
>>           than the "trick" to continue using the 1 indices convention.
>> 
>> Thank you.
>> 
>> Yours sincerely,
>> 
>> TAY wee-beng
>> 
>> On 15/4/2014 8:00 PM, Barry Smith wrote:
>>     Try running under valgrind 
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> 
>> 
>> On Apr 14, 2014, at 9:47 PM, TAY wee-beng <[email protected]> wrote:
>> 
>> Hi Barry,
>> 
>> As I mentioned earlier, the code works fine in PETSc debug mode but fails in 
>> non-debug mode.
>> 
>> I have attached my code.
>> 
>> Thank you
>> 
>> Yours sincerely,
>> 
>> TAY wee-beng
>> 
>> On 15/4/2014 2:26 AM, Barry Smith wrote:
>>     Please send the code that creates da_w and the declarations of w_array
>> 
>>     Barry
>> 
>> On Apr 14, 2014, at 9:40 AM, TAY wee-beng
>> <[email protected]>
>>    wrote:
>> 
>> 
>> Hi Barry,
>> 
>> I'm not too sure how to do it. I'm running mpi. So I run:
>> 
>>    mpirun -n 4 ./a.out -start_in_debugger
>> 
>> I got the msg below. Before the gdb windows appear (thru x11), the program 
>> aborts.
>> 
>> Also I tried running in another cluster and it worked. Also tried in the 
>> current cluster in debug mode and it worked too.
>> 
>> mpirun -n 4 ./a.out -start_in_debugger
>> --------------------------------------------------------------------------
>> An MPI process has executed an operation involving a call to the
>> "fork()" system call to create a child process.  Open MPI is currently
>> operating in a condition that could result in memory corruption or
>> other system errors; your MPI job may hang, crash, or produce silent
>> data corruption.  The use of fork() (or system() or other calls that
>> create child processes) is strongly discouraged.
>> 
>> The process that invoked fork was:
>> 
>>     Local host:          n12-76 (PID 20235)
>>     MPI_COMM_WORLD rank: 2
>> 
>> If you are *absolutely sure* that your application will successfully
>> and correctly survive a call to fork(), you may disable this warning
>> by setting the mpi_warn_on_fork MCA parameter to 0.
>> --------------------------------------------------------------------------
>> [2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on display 
>> localhost:50.0 on machine n12-76
>> [0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on display 
>> localhost:50.0 on machine n12-76
>> [1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on display 
>> localhost:50.0 on machine n12-76
>> [3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on display 
>> localhost:50.0 on machine n12-76
>> [n12-76:20232] 3 more processes have sent help message help-mpi-runtime.txt 
>> / mpi_init:warn-fork
>> [n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
>> help / error messages
>> 
>> ....
>> 
>>    1
>> [1]PETSC ERROR: 
>> ------------------------------------------------------------------------
>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
>> probably memory access out of range
>> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [1]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC ERROR: 
>> or try http://valgrind.org
>>    on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and 
>> run
>> [1]PETSC ERROR: to get more information on the crash.
>> [1]PETSC ERROR: User provided function() line 0 in unknown directory unknown 
>> file (null)
>> [3]PETSC ERROR: 
>> ------------------------------------------------------------------------
>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
>> probably memory access out of range
>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [3]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC ERROR: 
>> or try http://valgrind.org
>>    on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and 
>> run
>> [3]PETSC ERROR: to get more information on the crash.
>> [3]PETSC ERROR: User provided function() line 0 in unknown directory unknown 
>> file (null)
>> 
>> ...
>> Thank you.
>> 
>> Yours sincerely,
>> 
>> TAY wee-beng
>> 
>> On 14/4/2014 9:05 PM, Barry Smith wrote:
>> 
>>     Because IO doesn’t always get flushed immediately it may not be hanging 
>> at this point.  It is better to use the option -start_in_debugger then type 
>> cont in each debugger window and then when you think it is “hanging” do a 
>> control C in each debugger window and type where to see where each process 
>> is you can also look around in the debugger at variables to see why it is 
>> “hanging” at that point.
>> 
>>      Barry
>> 
>>     This routines don’t have any parallel communication in them so are 
>> unlikely to hang.
>> 
>> On Apr 14, 2014, at 6:52 AM, TAY wee-beng
>> 
>> <[email protected]>
>> 
>>    wrote:
>> 
>> 
>> 
>> Hi,
>> 
>> My code hangs and I added in mpi_barrier and print to catch the bug. I found 
>> that it hangs after printing "7". Is it because I'm doing something wrong? I 
>> need to access the u,v,w array so I use DMDAVecGetArrayF90. After access, I 
>> use DMDAVecRestoreArrayF90.
>> 
>>           call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>           call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"3"
>>           call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>           call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"4"
>>           call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>           call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"5"
>>           call 
>> I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
>>           call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"6"
>>           call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)  !must be 
>> in reverse order
>>           call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"7"
>>           call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>           call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"8"
>>           call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>> -- 
>> Thank you.
>> 
>> Yours sincerely,
>> 
>> TAY wee-beng
>> 
>> 
>> 
>> <code.txt>
>> 
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
> 

Reply via email to