On May 18, 2014, at 10:28 PM, TAY wee-beng <[email protected]> wrote:
> On 19/5/2014 9:53 AM, Matthew Knepley wrote: >> On Sun, May 18, 2014 at 8:18 PM, TAY wee-beng <[email protected]> wrote: >> Hi Barry, >> >> I am trying to sort out the details so that it's easier to pinpoint the >> error. However, I tried on gnu gfortran and it worked well. On intel ifort, >> it stopped at one of the "DMDAVecGetArrayF90". Does it definitely mean that >> it's a bug in ifort? Do you work with both intel and gnu? >> >> Yes it works with Intel. Is this using optimization? > Hi Matt, > > I forgot to add that in non-optimized cases, it works with gnu and intel. > However, in optimized cases, it works with gnu, but not intel. Does it > definitely mean that it's a bug in ifort? No. Does it run clean under valgrind? >> >> Matt >> >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 14/5/2014 12:03 AM, Barry Smith wrote: >> Please send you current code. So we may compile and run it. >> >> Barry >> >> >> On May 12, 2014, at 9:52 PM, TAY wee-beng <[email protected]> wrote: >> >> Hi, >> >> I have sent the entire code a while ago. Is there any answer? I was also >> trying myself but it worked for some intel compiler, and some not. I'm still >> not able to find the answer. gnu compilers for most cluster are old versions >> so they are not able to compile since I have allocatable structures. >> >> Thank you. >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 21/4/2014 8:58 AM, Barry Smith wrote: >> Please send the entire code. If we can run it and reproduce the problem >> we can likely track down the issue much faster than through endless rounds >> of email. >> >> Barry >> >> On Apr 20, 2014, at 7:49 PM, TAY wee-beng <[email protected]> wrote: >> >> On 20/4/2014 8:39 AM, TAY wee-beng wrote: >> On 20/4/2014 1:02 AM, Matthew Knepley wrote: >> On Sat, Apr 19, 2014 at 10:49 AM, TAY wee-beng <[email protected]> wrote: >> On 19/4/2014 11:39 PM, Matthew Knepley wrote: >> On Sat, Apr 19, 2014 at 10:16 AM, TAY wee-beng <[email protected]> wrote: >> On 19/4/2014 10:55 PM, Matthew Knepley wrote: >> On Sat, Apr 19, 2014 at 9:14 AM, TAY wee-beng <[email protected]> wrote: >> On 19/4/2014 6:48 PM, Matthew Knepley wrote: >> On Sat, Apr 19, 2014 at 4:59 AM, TAY wee-beng <[email protected]> wrote: >> On 19/4/2014 1:17 PM, Barry Smith wrote: >> On Apr 19, 2014, at 12:11 AM, TAY wee-beng <[email protected]> wrote: >> >> On 19/4/2014 12:10 PM, Barry Smith wrote: >> On Apr 18, 2014, at 9:57 PM, TAY wee-beng <[email protected]> wrote: >> >> On 19/4/2014 3:53 AM, Barry Smith wrote: >> Hmm, >> >> Interface DMDAVecGetArrayF90 >> Subroutine DMDAVecGetArrayF903(da1, v,d1,ierr) >> USE_DM_HIDE >> DM_HIDE da1 >> VEC_HIDE v >> PetscScalar,pointer :: d1(:,:,:) >> PetscErrorCode ierr >> End Subroutine >> >> So the d1 is a F90 POINTER. But your subroutine seems to be treating it >> as a “plain old Fortran array”? >> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:) >> Hi, >> >> So d1 is a pointer, and it's different if I declare it as "plain old Fortran >> array"? Because I declare it as a Fortran array and it works w/o any problem >> if I only call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 with "u". >> >> But if I call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 with "u", "v" >> and "w", error starts to happen. I wonder why... >> >> Also, supposed I call: >> >> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >> >> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >> >> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >> >> u_array .... >> >> v_array .... etc >> >> Now to restore the array, does it matter the sequence they are restored? >> No it should not matter. If it matters that is a sign that memory has >> been written to incorrectly earlier in the code. >> >> Hi, >> >> Hmm, I have been getting different results on different intel compilers. I'm >> not sure if MPI played a part but I'm only using a single processor. In the >> debug mode, things run without problem. In optimized mode, in some cases, >> the code aborts even doing simple initialization: >> >> >> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >> >> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >> >> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >> >> call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr) >> >> u_array = 0.d0 >> >> v_array = 0.d0 >> >> w_array = 0.d0 >> >> p_array = 0.d0 >> >> >> call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr) >> >> >> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >> >> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >> >> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >> >> The code aborts at call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr), >> giving segmentation error. But other >> version of intel compiler passes >> thru this part w/o error. Since the response is different among different >> compilers, is this PETSc or intel 's bug? Or mvapich or openmpi? >> >> We do this is a bunch of examples. Can you reproduce this different behavior >> in src/dm/examples/tutorials/ex11f90.F? >> Hi Matt, >> >> Do you mean putting the above lines into ex11f90.F and test? >> >> It already has DMDAVecGetArray(). Just run it. >> Hi, >> >> It worked. The differences between mine and the code is the way the fortran >> modules are defined, and the ex11f90 only uses global vectors. Does it make >> a difference whether global or local vectors are used? Because the way it >> accesses x1 only touches the local region. >> >> No the global/local difference should not matter. >> Also, before using DMDAVecGetArrayF90, DMGetGlobalVector must be used 1st, >> is that so? I can't find the equivalent for local vector though. >> >> DMGetLocalVector() >> Ops, I do not have DMGetLocalVector and DMRestoreLocalVector in my code. >> Does it matter? >> >> If so, when should I call them? >> >> You just need a local vector from somewhere. >> Hi, >> >> Anyone can help with the questions below? Still trying to find why my code >> doesn't work. >> >> Thanks. >> Hi, >> >> I insert part of my error region code into ex11f90: >> >> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >> call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr) >> >> u_array = 0.d0 >> v_array = 0.d0 >> w_array = 0.d0 >> p_array = 0.d0 >> >> call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr) >> >> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >> >> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >> >> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >> >> It worked w/o error. I'm going to change the way the modules are defined in >> my code. >> >> My code contains a main program and a number of modules files, with >> subroutines inside e.g. >> >> module solve >> <- add include file? >> subroutine RRK >> <- add include file? >> end subroutine RRK >> >> end module solve >> >> So where should the include files (#include <finclude/petscdmda.h90>) be >> placed? >> >> After the module or inside the subroutine? >> >> Thanks. >> Matt >> Thanks. >> Matt >> Thanks. >> Matt >> Thanks >> >> Regards. >> Matt >> As in w, then v and u? >> >> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >> >> thanks >> Note also that the beginning and end indices of the u,v,w, are >> different for each process see for example >> http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F >> (and they do not start at 1). This is how to get the loop bounds. >> Hi, >> >> In my case, I fixed the u,v,w such that their indices are the same. I also >> checked using DMDAGetCorners and DMDAGetGhostCorners. Now the problem lies >> in my subroutine treating it as a “plain old Fortran array”. >> >> If I declare them as pointers, their indices follow the C 0 start >> convention, is that so? >> Not really. It is that in each process you need to access them from the >> indices indicated by DMDAGetCorners() for global vectors and >> DMDAGetGhostCorners() for local vectors. So really C or >> Fortran doesn’t >> make any difference. >> >> >> So my problem now is that in my old MPI code, the u(i,j,k) follow the >> Fortran 1 start convention. Is there some way to manipulate such that I do >> not have to change my u(i,j,k) to u(i-1,j-1,k-1)? >> If you code wishes to access them with indices plus one from the values >> returned by DMDAGetCorners() for global vectors and DMDAGetGhostCorners() >> for local vectors then you need to manually subtract off the 1. >> >> Barry >> >> Thanks. >> Barry >> >> On Apr 18, 2014, at 10:58 AM, TAY wee-beng <[email protected]> wrote: >> >> Hi, >> >> I tried to pinpoint the problem. I reduced my job size and hence I can run >> on 1 processor. Tried using valgrind but perhaps I'm using the optimized >> version, it didn't catch the error, besides saying "Segmentation fault (core >> dumped)" >> >> However, by re-writing my code, I found out a few things: >> >> 1. if I write my code this way: >> >> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >> >> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >> >> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >> >> u_array = .... >> >> v_array = .... >> >> w_array = .... >> >> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >> >> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >> >> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >> >> The code runs fine. >> >> 2. if I write my code this way: >> >> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >> >> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >> >> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >> >> call uvw_array_change(u_array,v_array,w_array) -> this subroutine does the >> same modification as the above. >> >> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >> >> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >> >> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) -> error >> >> where the subroutine is: >> >> subroutine uvw_array_change(u,v,w) >> >> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:) >> >> u ... >> v... >> w ... >> >> end subroutine uvw_array_change. >> >> The above will give an error at : >> >> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >> >> 3. Same as above, except I change the order of the last 3 lines to: >> >> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >> >> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >> >> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >> >> So they are now in reversed order. Now it works. >> >> 4. Same as 2 or 3, except the subroutine is changed to : >> >> subroutine uvw_array_change(u,v,w) >> >> real(8), intent(inout) :: >> u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3)) >> >> real(8), intent(inout) :: >> v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3)) >> >> real(8), intent(inout) :: >> w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3)) >> >> u ... >> v... >> w ... >> >> end subroutine uvw_array_change. >> >> The start_indices and end_indices are simply to shift the 0 indices of C >> convention to that of the 1 indices of the Fortran convention. This is >> necessary in my case because most of my codes start array counting at 1, >> hence the "trick". >> >> However, now no matter which order of the DMDAVecRestoreArrayF90 (as in 2 or >> 3), error will occur at "call >> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) " >> >> So did I violate and cause memory corruption due to the trick above? But I >> can't think of any way other >> than the "trick" to continue using the 1 indices convention. >> >> Thank you. >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 15/4/2014 8:00 PM, Barry Smith wrote: >> Try running under valgrind >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> >> >> On Apr 14, 2014, at 9:47 PM, TAY wee-beng <[email protected]> wrote: >> >> Hi Barry, >> >> As I mentioned earlier, the code works fine in PETSc debug mode but fails in >> non-debug mode. >> >> I have attached my code. >> >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 15/4/2014 2:26 AM, Barry Smith wrote: >> Please send the code that creates da_w and the declarations of w_array >> >> Barry >> >> On Apr 14, 2014, at 9:40 AM, TAY wee-beng >> <[email protected]> >> wrote: >> >> >> Hi Barry, >> >> I'm not too sure how to do it. I'm running mpi. So I run: >> >> mpirun -n 4 ./a.out -start_in_debugger >> >> I got the msg below. Before the gdb windows appear (thru x11), the program >> aborts. >> >> Also I tried running in another cluster and it worked. Also tried in the >> current cluster in debug mode and it worked too. >> >> mpirun -n 4 ./a.out -start_in_debugger >> -------------------------------------------------------------------------- >> An MPI process has executed an operation involving a call to the >> "fork()" system call to create a child process. Open MPI is currently >> operating in a condition that could result in memory corruption or >> other system errors; your MPI job may hang, crash, or produce silent >> data corruption. The use of fork() (or system() or other calls that >> create child processes) is strongly discouraged. >> >> The process that invoked fork was: >> >> Local host: n12-76 (PID 20235) >> MPI_COMM_WORLD rank: 2 >> >> If you are *absolutely sure* that your application will successfully >> and correctly survive a call to fork(), you may disable this warning >> by setting the mpi_warn_on_fork MCA parameter to 0. >> -------------------------------------------------------------------------- >> [2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on display >> localhost:50.0 on machine n12-76 >> [0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on display >> localhost:50.0 on machine n12-76 >> [1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on display >> localhost:50.0 on machine n12-76 >> [3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on display >> localhost:50.0 on machine n12-76 >> [n12-76:20232] 3 more processes have sent help message help-mpi-runtime.txt >> / mpi_init:warn-fork >> [n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 to see all >> help / error messages >> >> .... >> >> 1 >> [1]PETSC ERROR: >> ------------------------------------------------------------------------ >> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably memory access out of range >> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [1]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC ERROR: >> or try http://valgrind.org >> on GNU/linux and Apple Mac OS X to find memory corruption errors >> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and >> run >> [1]PETSC ERROR: to get more information on the crash. >> [1]PETSC ERROR: User provided function() line 0 in unknown directory unknown >> file (null) >> [3]PETSC ERROR: >> ------------------------------------------------------------------------ >> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably memory access out of range >> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [3]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC ERROR: >> or try http://valgrind.org >> on GNU/linux and Apple Mac OS X to find memory corruption errors >> [3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and >> run >> [3]PETSC ERROR: to get more information on the crash. >> [3]PETSC ERROR: User provided function() line 0 in unknown directory unknown >> file (null) >> >> ... >> Thank you. >> >> Yours sincerely, >> >> TAY wee-beng >> >> On 14/4/2014 9:05 PM, Barry Smith wrote: >> >> Because IO doesn’t always get flushed immediately it may not be hanging >> at this point. It is better to use the option -start_in_debugger then type >> cont in each debugger window and then when you think it is “hanging” do a >> control C in each debugger window and type where to see where each process >> is you can also look around in the debugger at variables to see why it is >> “hanging” at that point. >> >> Barry >> >> This routines don’t have any parallel communication in them so are >> unlikely to hang. >> >> On Apr 14, 2014, at 6:52 AM, TAY wee-beng >> >> <[email protected]> >> >> wrote: >> >> >> >> Hi, >> >> My code hangs and I added in mpi_barrier and print to catch the bug. I found >> that it hangs after printing "7". Is it because I'm doing something wrong? I >> need to access the u,v,w array so I use DMDAVecGetArrayF90. After access, I >> use DMDAVecRestoreArrayF90. >> >> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"3" >> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"4" >> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"5" >> call >> I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array) >> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"6" >> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) !must be >> in reverse order >> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"7" >> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"8" >> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >> -- >> Thank you. >> >> Yours sincerely, >> >> TAY wee-beng >> >> >> >> <code.txt> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >
