On May 19, 2014, at 1:26 AM, TAY wee-beng <[email protected]> wrote: > On 19/5/2014 11:36 AM, Barry Smith wrote: >> On May 18, 2014, at 10:28 PM, TAY wee-beng <[email protected]> wrote: >> >>> On 19/5/2014 9:53 AM, Matthew Knepley wrote: >>>> On Sun, May 18, 2014 at 8:18 PM, TAY wee-beng <[email protected]> wrote: >>>> Hi Barry, >>>> >>>> I am trying to sort out the details so that it's easier to pinpoint the >>>> error. However, I tried on gnu gfortran and it worked well. On intel >>>> ifort, it stopped at one of the "DMDAVecGetArrayF90". Does it definitely >>>> mean that it's a bug in ifort? Do you work with both intel and gnu? >>>> >>>> Yes it works with Intel. Is this using optimization? >>> Hi Matt, >>> >>> I forgot to add that in non-optimized cases, it works with gnu and intel. >>> However, in optimized cases, it works with gnu, but not intel. Does it >>> definitely mean that it's a bug in ifort? >> No. Does it run clean under valgrind? > Hi, > > Do you mean the debug or optimized version?
Both. > > Thanks. >> >>>> Matt >>>> >>>> Thank you >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 14/5/2014 12:03 AM, Barry Smith wrote: >>>> Please send you current code. So we may compile and run it. >>>> >>>> Barry >>>> >>>> >>>> On May 12, 2014, at 9:52 PM, TAY wee-beng <[email protected]> wrote: >>>> >>>> Hi, >>>> >>>> I have sent the entire code a while ago. Is there any answer? I was also >>>> trying myself but it worked for some intel compiler, and some not. I'm >>>> still not able to find the answer. gnu compilers for most cluster are old >>>> versions so they are not able to compile since I have allocatable >>>> structures. >>>> >>>> Thank you. >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 21/4/2014 8:58 AM, Barry Smith wrote: >>>> Please send the entire code. If we can run it and reproduce the >>>> problem we can likely track down the issue much faster than through >>>> endless rounds of email. >>>> >>>> Barry >>>> >>>> On Apr 20, 2014, at 7:49 PM, TAY wee-beng <[email protected]> wrote: >>>> >>>> On 20/4/2014 8:39 AM, TAY wee-beng wrote: >>>> On 20/4/2014 1:02 AM, Matthew Knepley wrote: >>>> On Sat, Apr 19, 2014 at 10:49 AM, TAY wee-beng <[email protected]> wrote: >>>> On 19/4/2014 11:39 PM, Matthew Knepley wrote: >>>> On Sat, Apr 19, 2014 at 10:16 AM, TAY wee-beng <[email protected]> wrote: >>>> On 19/4/2014 10:55 PM, Matthew Knepley wrote: >>>> On Sat, Apr 19, 2014 at 9:14 AM, TAY wee-beng <[email protected]> wrote: >>>> On 19/4/2014 6:48 PM, Matthew Knepley wrote: >>>> On Sat, Apr 19, 2014 at 4:59 AM, TAY wee-beng <[email protected]> wrote: >>>> On 19/4/2014 1:17 PM, Barry Smith wrote: >>>> On Apr 19, 2014, at 12:11 AM, TAY wee-beng <[email protected]> wrote: >>>> >>>> On 19/4/2014 12:10 PM, Barry Smith wrote: >>>> On Apr 18, 2014, at 9:57 PM, TAY wee-beng <[email protected]> wrote: >>>> >>>> On 19/4/2014 3:53 AM, Barry Smith wrote: >>>> Hmm, >>>> >>>> Interface DMDAVecGetArrayF90 >>>> Subroutine DMDAVecGetArrayF903(da1, v,d1,ierr) >>>> USE_DM_HIDE >>>> DM_HIDE da1 >>>> VEC_HIDE v >>>> PetscScalar,pointer :: d1(:,:,:) >>>> PetscErrorCode ierr >>>> End Subroutine >>>> >>>> So the d1 is a F90 POINTER. But your subroutine seems to be treating >>>> it as a “plain old Fortran array”? >>>> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:) >>>> Hi, >>>> >>>> So d1 is a pointer, and it's different if I declare it as "plain old >>>> Fortran array"? Because I declare it as a Fortran array and it works w/o >>>> any problem if I only call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 >>>> with "u". >>>> >>>> But if I call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 with "u", "v" >>>> and "w", error starts to happen. I wonder why... >>>> >>>> Also, supposed I call: >>>> >>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >>>> >>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >>>> >>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >>>> >>>> u_array .... >>>> >>>> v_array .... etc >>>> >>>> Now to restore the array, does it matter the sequence they are restored? >>>> No it should not matter. If it matters that is a sign that memory has >>>> been written to incorrectly earlier in the code. >>>> >>>> Hi, >>>> >>>> Hmm, I have been getting different results on different intel compilers. >>>> I'm not sure if MPI played a part but I'm only using a single processor. >>>> In the debug mode, things run without problem. In optimized mode, in some >>>> cases, the code aborts even doing simple initialization: >>>> >>>> >>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >>>> >>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >>>> >>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >>>> >>>> call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr) >>>> >>>> u_array = 0.d0 >>>> >>>> v_array = 0.d0 >>>> >>>> w_array = 0.d0 >>>> >>>> p_array = 0.d0 >>>> >>>> >>>> call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr) >>>> >>>> >>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >>>> >>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>>> >>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>>> >>>> The code aborts at call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr), >>>> giving segmentation error. But other >>>> version of intel compiler >>>> passes thru this part w/o error. Since the response is different among >>>> different compilers, is this PETSc or intel 's bug? Or mvapich or openmpi? >>>> >>>> We do this is a bunch of examples. Can you reproduce this different >>>> behavior in src/dm/examples/tutorials/ex11f90.F? >>>> Hi Matt, >>>> >>>> Do you mean putting the above lines into ex11f90.F and test? >>>> >>>> It already has DMDAVecGetArray(). Just run it. >>>> Hi, >>>> >>>> It worked. The differences between mine and the code is the way the >>>> fortran modules are defined, and the ex11f90 only uses global vectors. >>>> Does it make a difference whether global or local vectors are used? >>>> Because the way it accesses x1 only touches the local region. >>>> >>>> No the global/local difference should not matter. >>>> Also, before using DMDAVecGetArrayF90, DMGetGlobalVector must be used >>>> 1st, is that so? I can't find the equivalent for local vector though. >>>> >>>> DMGetLocalVector() >>>> Ops, I do not have DMGetLocalVector and DMRestoreLocalVector in my code. >>>> Does it matter? >>>> >>>> If so, when should I call them? >>>> >>>> You just need a local vector from somewhere. >>>> Hi, >>>> >>>> Anyone can help with the questions below? Still trying to find why my code >>>> doesn't work. >>>> >>>> Thanks. >>>> Hi, >>>> >>>> I insert part of my error region code into ex11f90: >>>> >>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >>>> call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr) >>>> >>>> u_array = 0.d0 >>>> v_array = 0.d0 >>>> w_array = 0.d0 >>>> p_array = 0.d0 >>>> >>>> call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr) >>>> >>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >>>> >>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>>> >>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>>> >>>> It worked w/o error. I'm going to change the way the modules are defined >>>> in my code. >>>> >>>> My code contains a main program and a number of modules files, with >>>> subroutines inside e.g. >>>> >>>> module solve >>>> <- add include file? >>>> subroutine RRK >>>> <- add include file? >>>> end subroutine RRK >>>> >>>> end module solve >>>> >>>> So where should the include files (#include <finclude/petscdmda.h90>) be >>>> placed? >>>> >>>> After the module or inside the subroutine? >>>> >>>> Thanks. >>>> Matt >>>> Thanks. >>>> Matt >>>> Thanks. >>>> Matt >>>> Thanks >>>> >>>> Regards. >>>> Matt >>>> As in w, then v and u? >>>> >>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>>> >>>> thanks >>>> Note also that the beginning and end indices of the u,v,w, are >>>> different for each process see for example >>>> http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F >>>> (and they do not start at 1). This is how to get the loop bounds. >>>> Hi, >>>> >>>> In my case, I fixed the u,v,w such that their indices are the same. I also >>>> checked using DMDAGetCorners and DMDAGetGhostCorners. Now the problem lies >>>> in my subroutine treating it as a “plain old Fortran array”. >>>> >>>> If I declare them as pointers, their indices follow the C 0 start >>>> convention, is that so? >>>> Not really. It is that in each process you need to access them from >>>> the indices indicated by DMDAGetCorners() for global vectors and >>>> DMDAGetGhostCorners() for local vectors. So really >>>> C or Fortran >>>> doesn’t make any difference. >>>> >>>> >>>> So my problem now is that in my old MPI code, the u(i,j,k) follow the >>>> Fortran 1 start convention. Is there some way to manipulate such that I do >>>> not have to change my u(i,j,k) to u(i-1,j-1,k-1)? >>>> If you code wishes to access them with indices plus one from the >>>> values returned by DMDAGetCorners() for global vectors and >>>> DMDAGetGhostCorners() for local vectors then you need to manually subtract >>>> off the 1. >>>> >>>> Barry >>>> >>>> Thanks. >>>> Barry >>>> >>>> On Apr 18, 2014, at 10:58 AM, TAY wee-beng <[email protected]> wrote: >>>> >>>> Hi, >>>> >>>> I tried to pinpoint the problem. I reduced my job size and hence I can run >>>> on 1 processor. Tried using valgrind but perhaps I'm using the optimized >>>> version, it didn't catch the error, besides saying "Segmentation fault >>>> (core dumped)" >>>> >>>> However, by re-writing my code, I found out a few things: >>>> >>>> 1. if I write my code this way: >>>> >>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >>>> >>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >>>> >>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >>>> >>>> u_array = .... >>>> >>>> v_array = .... >>>> >>>> w_array = .... >>>> >>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >>>> >>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>>> >>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>>> >>>> The code runs fine. >>>> >>>> 2. if I write my code this way: >>>> >>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >>>> >>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >>>> >>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >>>> >>>> call uvw_array_change(u_array,v_array,w_array) -> this subroutine does the >>>> same modification as the above. >>>> >>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >>>> >>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>>> >>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) -> error >>>> >>>> where the subroutine is: >>>> >>>> subroutine uvw_array_change(u,v,w) >>>> >>>> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:) >>>> >>>> u ... >>>> v... >>>> w ... >>>> >>>> end subroutine uvw_array_change. >>>> >>>> The above will give an error at : >>>> >>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>>> >>>> 3. Same as above, except I change the order of the last 3 lines to: >>>> >>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>>> >>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>>> >>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >>>> >>>> So they are now in reversed order. Now it works. >>>> >>>> 4. Same as 2 or 3, except the subroutine is changed to : >>>> >>>> subroutine uvw_array_change(u,v,w) >>>> >>>> real(8), intent(inout) :: >>>> u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3)) >>>> >>>> real(8), intent(inout) :: >>>> v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3)) >>>> >>>> real(8), intent(inout) :: >>>> w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3)) >>>> >>>> u ... >>>> v... >>>> w ... >>>> >>>> end subroutine uvw_array_change. >>>> >>>> The start_indices and end_indices are simply to shift the 0 indices of C >>>> convention to that of the 1 indices of the Fortran convention. This is >>>> necessary in my case because most of my codes start array counting at 1, >>>> hence the "trick". >>>> >>>> However, now no matter which order of the DMDAVecRestoreArrayF90 (as in 2 >>>> or 3), error will occur at "call >>>> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) " >>>> >>>> So did I violate and cause memory corruption due to the trick above? But I >>>> can't think of any way other >>>> than the "trick" to continue using the 1 indices convention. >>>> >>>> Thank you. >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 15/4/2014 8:00 PM, Barry Smith wrote: >>>> Try running under valgrind >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>>> >>>> >>>> On Apr 14, 2014, at 9:47 PM, TAY wee-beng <[email protected]> wrote: >>>> >>>> Hi Barry, >>>> >>>> As I mentioned earlier, the code works fine in PETSc debug mode but fails >>>> in non-debug mode. >>>> >>>> I have attached my code. >>>> >>>> Thank you >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 15/4/2014 2:26 AM, Barry Smith wrote: >>>> Please send the code that creates da_w and the declarations of w_array >>>> >>>> Barry >>>> >>>> On Apr 14, 2014, at 9:40 AM, TAY wee-beng >>>> <[email protected]> >>>> wrote: >>>> >>>> >>>> Hi Barry, >>>> >>>> I'm not too sure how to do it. I'm running mpi. So I run: >>>> >>>> mpirun -n 4 ./a.out -start_in_debugger >>>> >>>> I got the msg below. Before the gdb windows appear (thru x11), the program >>>> aborts. >>>> >>>> Also I tried running in another cluster and it worked. Also tried in the >>>> current cluster in debug mode and it worked too. >>>> >>>> mpirun -n 4 ./a.out -start_in_debugger >>>> -------------------------------------------------------------------------- >>>> An MPI process has executed an operation involving a call to the >>>> "fork()" system call to create a child process. Open MPI is currently >>>> operating in a condition that could result in memory corruption or >>>> other system errors; your MPI job may hang, crash, or produce silent >>>> data corruption. The use of fork() (or system() or other calls that >>>> create child processes) is strongly discouraged. >>>> >>>> The process that invoked fork was: >>>> >>>> Local host: n12-76 (PID 20235) >>>> MPI_COMM_WORLD rank: 2 >>>> >>>> If you are *absolutely sure* that your application will successfully >>>> and correctly survive a call to fork(), you may disable this warning >>>> by setting the mpi_warn_on_fork MCA parameter to 0. >>>> -------------------------------------------------------------------------- >>>> [2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on display >>>> localhost:50.0 on machine n12-76 >>>> [0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on display >>>> localhost:50.0 on machine n12-76 >>>> [1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on display >>>> localhost:50.0 on machine n12-76 >>>> [3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on display >>>> localhost:50.0 on machine n12-76 >>>> [n12-76:20232] 3 more processes have sent help message >>>> help-mpi-runtime.txt / mpi_init:warn-fork >>>> [n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 to see >>>> all help / error messages >>>> >>>> .... >>>> >>>> 1 >>>> [1]PETSC ERROR: >>>> ------------------------------------------------------------------------ >>>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >>>> probably memory access out of range >>>> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>> [1]PETSC ERROR: or see >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC >>>> ERROR: or try http://valgrind.org >>>> on GNU/linux and Apple Mac OS X to find memory corruption errors >>>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and >>>> run >>>> [1]PETSC ERROR: to get more information on the crash. >>>> [1]PETSC ERROR: User provided function() line 0 in unknown directory >>>> unknown file (null) >>>> [3]PETSC ERROR: >>>> ------------------------------------------------------------------------ >>>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >>>> probably memory access out of range >>>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>> [3]PETSC ERROR: or see >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC >>>> ERROR: or try http://valgrind.org >>>> on GNU/linux and Apple Mac OS X to find memory corruption errors >>>> [3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and >>>> run >>>> [3]PETSC ERROR: to get more information on the crash. >>>> [3]PETSC ERROR: User provided function() line 0 in unknown directory >>>> unknown file (null) >>>> >>>> ... >>>> Thank you. >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> On 14/4/2014 9:05 PM, Barry Smith wrote: >>>> >>>> Because IO doesn’t always get flushed immediately it may not be >>>> hanging at this point. It is better to use the option -start_in_debugger >>>> then type cont in each debugger window and then when you think it is >>>> “hanging” do a control C in each debugger window and type where to see >>>> where each process is you can also look around in the debugger at >>>> variables to see why it is “hanging” at that point. >>>> >>>> Barry >>>> >>>> This routines don’t have any parallel communication in them so are >>>> unlikely to hang. >>>> >>>> On Apr 14, 2014, at 6:52 AM, TAY wee-beng >>>> >>>> <[email protected]> >>>> >>>> wrote: >>>> >>>> >>>> >>>> Hi, >>>> >>>> My code hangs and I added in mpi_barrier and print to catch the bug. I >>>> found that it hangs after printing "7". Is it because I'm doing something >>>> wrong? I need to access the u,v,w array so I use DMDAVecGetArrayF90. After >>>> access, I use DMDAVecRestoreArrayF90. >>>> >>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"3" >>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"4" >>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"5" >>>> call >>>> I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array) >>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"6" >>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) !must be >>>> in reverse order >>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"7" >>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"8" >>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>>> -- >>>> Thank you. >>>> >>>> Yours sincerely, >>>> >>>> TAY wee-beng >>>> >>>> >>>> >>>> <code.txt> >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >
