On Sat, Apr 19, 2014 at 4:59 AM, TAY wee-beng <[email protected]> wrote:
> On 19/4/2014 1:17 PM, Barry Smith wrote: > >> On Apr 19, 2014, at 12:11 AM, TAY wee-beng <[email protected]> wrote: >> >> On 19/4/2014 12:10 PM, Barry Smith wrote: >>> >>>> On Apr 18, 2014, at 9:57 PM, TAY wee-beng <[email protected]> wrote: >>>> >>>> On 19/4/2014 3:53 AM, Barry Smith wrote: >>>>> >>>>>> Hmm, >>>>>> >>>>>> Interface DMDAVecGetArrayF90 >>>>>> Subroutine DMDAVecGetArrayF903(da1, v,d1,ierr) >>>>>> USE_DM_HIDE >>>>>> DM_HIDE da1 >>>>>> VEC_HIDE v >>>>>> PetscScalar,pointer :: d1(:,:,:) >>>>>> PetscErrorCode ierr >>>>>> End Subroutine >>>>>> >>>>>> So the d1 is a F90 POINTER. But your subroutine seems to be >>>>>> treating it as a “plain old Fortran array”? >>>>>> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:) >>>>>> >>>>> Hi, >>> >>> So d1 is a pointer, and it's different if I declare it as "plain old >>> Fortran array"? Because I declare it as a Fortran array and it works w/o >>> any problem if I only call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 >>> with "u". >>> >>> But if I call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 with "u", >>> "v" and "w", error starts to happen. I wonder why... >>> >>> Also, supposed I call: >>> >>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >>> >>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >>> >>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >>> >>> u_array .... >>> >>> v_array .... etc >>> >>> Now to restore the array, does it matter the sequence they are restored? >>> >> No it should not matter. If it matters that is a sign that memory has >> been written to incorrectly earlier in the code. >> >> Hi, > > Hmm, I have been getting different results on different intel compilers. > I'm not sure if MPI played a part but I'm only using a single processor. In > the debug mode, things run without problem. In optimized mode, in some > cases, the code aborts even doing simple initialization: > > > call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) > > call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) > > call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) > > call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr) > > u_array = 0.d0 > > v_array = 0.d0 > > w_array = 0.d0 > > p_array = 0.d0 > > > call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr) > > > call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) > > call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) > > call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) > > The code aborts at call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr), > giving segmentation error. But other version of intel compiler passes thru > this part w/o error. Since the response is different among different > compilers, is this PETSc or intel 's bug? Or mvapich or openmpi? We do this is a bunch of examples. Can you reproduce this different behavior in src/dm/examples/tutorials/ex11f90.F? Matt > As in w, then v and u? >>> >>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>> >>> thanks >>> >>>> Note also that the beginning and end indices of the u,v,w, are >>>>>> different for each process see for example >>>>>> http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/ >>>>>> tutorials/ex11f90.F (and they do not start at 1). This is how to >>>>>> get the loop bounds. >>>>>> >>>>> Hi, >>>>> >>>>> In my case, I fixed the u,v,w such that their indices are the same. I >>>>> also checked using DMDAGetCorners and DMDAGetGhostCorners. Now the problem >>>>> lies in my subroutine treating it as a “plain old Fortran array”. >>>>> >>>>> If I declare them as pointers, their indices follow the C 0 start >>>>> convention, is that so? >>>>> >>>> Not really. It is that in each process you need to access them from >>>> the indices indicated by DMDAGetCorners() for global vectors and >>>> DMDAGetGhostCorners() for local vectors. So really C or Fortran doesn’t >>>> make any difference. >>>> >>>> >>>> So my problem now is that in my old MPI code, the u(i,j,k) follow the >>>>> Fortran 1 start convention. Is there some way to manipulate such that I do >>>>> not have to change my u(i,j,k) to u(i-1,j-1,k-1)? >>>>> >>>> If you code wishes to access them with indices plus one from the >>>> values returned by DMDAGetCorners() for global vectors and >>>> DMDAGetGhostCorners() for local vectors then you need to manually subtract >>>> off the 1. >>>> >>>> Barry >>>> >>>> Thanks. >>>>> >>>>>> Barry >>>>>> >>>>>> On Apr 18, 2014, at 10:58 AM, TAY wee-beng <[email protected]> wrote: >>>>>> >>>>>> Hi, >>>>>>> >>>>>>> I tried to pinpoint the problem. I reduced my job size and hence I >>>>>>> can run on 1 processor. Tried using valgrind but perhaps I'm using the >>>>>>> optimized version, it didn't catch the error, besides saying >>>>>>> "Segmentation >>>>>>> fault (core dumped)" >>>>>>> >>>>>>> However, by re-writing my code, I found out a few things: >>>>>>> >>>>>>> 1. if I write my code this way: >>>>>>> >>>>>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >>>>>>> >>>>>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >>>>>>> >>>>>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >>>>>>> >>>>>>> u_array = .... >>>>>>> >>>>>>> v_array = .... >>>>>>> >>>>>>> w_array = .... >>>>>>> >>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >>>>>>> >>>>>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>>>>>> >>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>>>>>> >>>>>>> The code runs fine. >>>>>>> >>>>>>> 2. if I write my code this way: >>>>>>> >>>>>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >>>>>>> >>>>>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >>>>>>> >>>>>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >>>>>>> >>>>>>> call uvw_array_change(u_array,v_array,w_array) -> this subroutine >>>>>>> does the same modification as the above. >>>>>>> >>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >>>>>>> >>>>>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>>>>>> >>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) -> error >>>>>>> >>>>>>> where the subroutine is: >>>>>>> >>>>>>> subroutine uvw_array_change(u,v,w) >>>>>>> >>>>>>> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:) >>>>>>> >>>>>>> u ... >>>>>>> v... >>>>>>> w ... >>>>>>> >>>>>>> end subroutine uvw_array_change. >>>>>>> >>>>>>> The above will give an error at : >>>>>>> >>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>>>>>> >>>>>>> 3. Same as above, except I change the order of the last 3 lines to: >>>>>>> >>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) >>>>>>> >>>>>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>>>>>> >>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >>>>>>> >>>>>>> So they are now in reversed order. Now it works. >>>>>>> >>>>>>> 4. Same as 2 or 3, except the subroutine is changed to : >>>>>>> >>>>>>> subroutine uvw_array_change(u,v,w) >>>>>>> >>>>>>> real(8), intent(inout) :: u(start_indices(1):end_ >>>>>>> indices(1),start_indices(2):end_indices(2),start_indices( >>>>>>> 3):end_indices(3)) >>>>>>> >>>>>>> real(8), intent(inout) :: v(start_indices(1):end_ >>>>>>> indices(1),start_indices(2):end_indices(2),start_indices( >>>>>>> 3):end_indices(3)) >>>>>>> >>>>>>> real(8), intent(inout) :: w(start_indices(1):end_ >>>>>>> indices(1),start_indices(2):end_indices(2),start_indices( >>>>>>> 3):end_indices(3)) >>>>>>> >>>>>>> u ... >>>>>>> v... >>>>>>> w ... >>>>>>> >>>>>>> end subroutine uvw_array_change. >>>>>>> >>>>>>> The start_indices and end_indices are simply to shift the 0 indices >>>>>>> of C convention to that of the 1 indices of the Fortran convention. >>>>>>> This is >>>>>>> necessary in my case because most of my codes start array counting at 1, >>>>>>> hence the "trick". >>>>>>> >>>>>>> However, now no matter which order of the DMDAVecRestoreArrayF90 (as >>>>>>> in 2 or 3), error will occur at "call >>>>>>> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) >>>>>>> " >>>>>>> >>>>>>> So did I violate and cause memory corruption due to the trick above? >>>>>>> But I can't think of any way other than the "trick" to continue using >>>>>>> the 1 >>>>>>> indices convention. >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 15/4/2014 8:00 PM, Barry Smith wrote: >>>>>>> >>>>>>>> Try running under valgrind http://www.mcs.anl.gov/petsc/ >>>>>>>> documentation/faq.html#valgrind >>>>>>>> >>>>>>>> >>>>>>>> On Apr 14, 2014, at 9:47 PM, TAY wee-beng <[email protected]> wrote: >>>>>>>> >>>>>>>> Hi Barry, >>>>>>>>> >>>>>>>>> As I mentioned earlier, the code works fine in PETSc debug mode >>>>>>>>> but fails in non-debug mode. >>>>>>>>> >>>>>>>>> I have attached my code. >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 15/4/2014 2:26 AM, Barry Smith wrote: >>>>>>>>> >>>>>>>>>> Please send the code that creates da_w and the declarations of >>>>>>>>>> w_array >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> On Apr 14, 2014, at 9:40 AM, TAY wee-beng >>>>>>>>>> <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Barry, >>>>>>>>>>> >>>>>>>>>>> I'm not too sure how to do it. I'm running mpi. So I run: >>>>>>>>>>> >>>>>>>>>>> mpirun -n 4 ./a.out -start_in_debugger >>>>>>>>>>> >>>>>>>>>>> I got the msg below. Before the gdb windows appear (thru x11), >>>>>>>>>>> the program aborts. >>>>>>>>>>> >>>>>>>>>>> Also I tried running in another cluster and it worked. Also >>>>>>>>>>> tried in the current cluster in debug mode and it worked too. >>>>>>>>>>> >>>>>>>>>>> mpirun -n 4 ./a.out -start_in_debugger >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> -------------- >>>>>>>>>>> An MPI process has executed an operation involving a call to the >>>>>>>>>>> "fork()" system call to create a child process. Open MPI is >>>>>>>>>>> currently >>>>>>>>>>> operating in a condition that could result in memory corruption >>>>>>>>>>> or >>>>>>>>>>> other system errors; your MPI job may hang, crash, or produce >>>>>>>>>>> silent >>>>>>>>>>> data corruption. The use of fork() (or system() or other calls >>>>>>>>>>> that >>>>>>>>>>> create child processes) is strongly discouraged. >>>>>>>>>>> >>>>>>>>>>> The process that invoked fork was: >>>>>>>>>>> >>>>>>>>>>> Local host: n12-76 (PID 20235) >>>>>>>>>>> MPI_COMM_WORLD rank: 2 >>>>>>>>>>> >>>>>>>>>>> If you are *absolutely sure* that your application will >>>>>>>>>>> successfully >>>>>>>>>>> and correctly survive a call to fork(), you may disable this >>>>>>>>>>> warning >>>>>>>>>>> by setting the mpi_warn_on_fork MCA parameter to 0. >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> -------------- >>>>>>>>>>> [2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on >>>>>>>>>>> display localhost:50.0 on machine n12-76 >>>>>>>>>>> [0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on >>>>>>>>>>> display localhost:50.0 on machine n12-76 >>>>>>>>>>> [1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on >>>>>>>>>>> display localhost:50.0 on machine n12-76 >>>>>>>>>>> [3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on >>>>>>>>>>> display localhost:50.0 on machine n12-76 >>>>>>>>>>> [n12-76:20232] 3 more processes have sent help message >>>>>>>>>>> help-mpi-runtime.txt / mpi_init:warn-fork >>>>>>>>>>> [n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 >>>>>>>>>>> to see all help / error messages >>>>>>>>>>> >>>>>>>>>>> .... >>>>>>>>>>> >>>>>>>>>>> 1 >>>>>>>>>>> [1]PETSC ERROR: ------------------------------ >>>>>>>>>>> ------------------------------------------ >>>>>>>>>>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation >>>>>>>>>>> Violation, probably memory access out of range >>>>>>>>>>> [1]PETSC ERROR: Try option -start_in_debugger or >>>>>>>>>>> -on_error_attach_debugger >>>>>>>>>>> [1]PETSC ERROR: or see >>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html# >>>>>>>>>>> valgrind[1]PETSC ERROR: or try http://valgrind.org >>>>>>>>>>> on GNU/linux and Apple Mac OS X to find memory corruption >>>>>>>>>>> errors >>>>>>>>>>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, >>>>>>>>>>> link, and run >>>>>>>>>>> [1]PETSC ERROR: to get more information on the crash. >>>>>>>>>>> [1]PETSC ERROR: User provided function() line 0 in unknown >>>>>>>>>>> directory unknown file (null) >>>>>>>>>>> [3]PETSC ERROR: ------------------------------ >>>>>>>>>>> ------------------------------------------ >>>>>>>>>>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation >>>>>>>>>>> Violation, probably memory access out of range >>>>>>>>>>> [3]PETSC ERROR: Try option -start_in_debugger or >>>>>>>>>>> -on_error_attach_debugger >>>>>>>>>>> [3]PETSC ERROR: or see >>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html# >>>>>>>>>>> valgrind[3]PETSC ERROR: or try http://valgrind.org >>>>>>>>>>> on GNU/linux and Apple Mac OS X to find memory corruption >>>>>>>>>>> errors >>>>>>>>>>> [3]PETSC ERROR: configure using --with-debugging=yes, recompile, >>>>>>>>>>> link, and run >>>>>>>>>>> [3]PETSC ERROR: to get more information on the crash. >>>>>>>>>>> [3]PETSC ERROR: User provided function() line 0 in unknown >>>>>>>>>>> directory unknown file (null) >>>>>>>>>>> >>>>>>>>>>> ... >>>>>>>>>>> Thank you. >>>>>>>>>>> >>>>>>>>>>> Yours sincerely, >>>>>>>>>>> >>>>>>>>>>> TAY wee-beng >>>>>>>>>>> >>>>>>>>>>> On 14/4/2014 9:05 PM, Barry Smith wrote: >>>>>>>>>>> >>>>>>>>>>> Because IO doesn’t always get flushed immediately it may not >>>>>>>>>>>> be hanging at this point. It is better to use the option >>>>>>>>>>>> -start_in_debugger then type cont in each debugger window and then >>>>>>>>>>>> when you >>>>>>>>>>>> think it is “hanging” do a control C in each debugger window and >>>>>>>>>>>> type where >>>>>>>>>>>> to see where each process is you can also look around in the >>>>>>>>>>>> debugger at >>>>>>>>>>>> variables to see why it is “hanging” at that point. >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> This routines don’t have any parallel communication in them >>>>>>>>>>>> so are unlikely to hang. >>>>>>>>>>>> >>>>>>>>>>>> On Apr 14, 2014, at 6:52 AM, TAY wee-beng >>>>>>>>>>>> >>>>>>>>>>>> <[email protected]> >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> My code hangs and I added in mpi_barrier and print to catch >>>>>>>>>>>>> the bug. I found that it hangs after printing "7". Is it because >>>>>>>>>>>>> I'm doing >>>>>>>>>>>>> something wrong? I need to access the u,v,w array so I use >>>>>>>>>>>>> DMDAVecGetArrayF90. After access, I use DMDAVecRestoreArrayF90. >>>>>>>>>>>>> >>>>>>>>>>>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr) >>>>>>>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) >>>>>>>>>>>>> print *,"3" >>>>>>>>>>>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) >>>>>>>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) >>>>>>>>>>>>> print *,"4" >>>>>>>>>>>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) >>>>>>>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) >>>>>>>>>>>>> print *,"5" >>>>>>>>>>>>> call I_IIB_uv_initial_1st_dm(I_ >>>>>>>>>>>>> cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_ >>>>>>>>>>>>> v1,I_cell_w1,u_array,v_array,w_array) >>>>>>>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) >>>>>>>>>>>>> print *,"6" >>>>>>>>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) >>>>>>>>>>>>> !must be in reverse order >>>>>>>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) >>>>>>>>>>>>> print *,"7" >>>>>>>>>>>>> call DMDAVecRestoreArrayF90(da_v,v_ >>>>>>>>>>>>> local,v_array,ierr) >>>>>>>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) >>>>>>>>>>>>> print *,"8" >>>>>>>>>>>>> call DMDAVecRestoreArrayF90(da_u,u_ >>>>>>>>>>>>> local,u_array,ierr) >>>>>>>>>>>>> -- >>>>>>>>>>>>> Thank you. >>>>>>>>>>>>> >>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>> >>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> <code.txt> >>>>>>>>> >>>>>>>> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
