To make things easier, I created a simple example here: https://drive.google.com/file/d/0BxEfb1tasJxhOHJWbVJ0ZGl0LTA/view?usp=sharing
./exe -snes_newtonls okay. ./exe -snes_nasm -da_grid_z 2 okay ./exe -snes_nasm failed. It got trapped in the while loop. Do you have any suggestions to avoid this trap for nz=1? Thanks. Xiangdong On Wed, Jun 29, 2016 at 11:23 AM, Xiangdong <[email protected]> wrote: > The code got trapped in a while loop when I switch to s=2. > > http://www.mcs.anl.gov/petsc/petsc-current/src/dm/impls/da/dadd.c.html > > In dadd.c line 85-96 > /* gone out of processor range on y axis */ > 86: while(jj > ne-1 || jj < ns) { > 87: if (nr == n-1) { > 88: ns = 0; > 89: ne = ly[0]; > 90: nr = 0; > 91: } else { > 92: nr++; > 93: ns = ne; > 94: ne += ly[nr]; > 95: } > 96: } > > For my case nx=5,ny=5,nz=1,s=2, np=2, snes_type nasm, the code is trapped > in the loop for jj=-1, ns=0, nr=0, n=1. This negative jj comes from the > line 340-349 of same file. > > /* global and subdomain ISes for the local indices of the subdomain */ > 341: /* todo - make this not loop over at nonperiodic boundaries, > which will be more involved */ > 342: lower.i = subinfo.gxs; > 343: lower.j = subinfo.gys; > 344: lower.k = subinfo.gzs; > 345: upper.i = subinfo.gxs+subinfo.gxm; > 346: upper.j = subinfo.gys+subinfo.gym; > 347: upper.k = subinfo.gzs+subinfo.gzm; > > 349: DMDACreatePatchIS(dm,&lower,&upper,&gdis); > > When trapped, subinfo.gxs=-2. Since there is a todo comment, I am not sure > this is the reason for the while loop trap. > > Thanks. > > Xiangdong > > > On Tue, Jun 28, 2016 at 5:16 PM, Barry Smith <[email protected]> wrote: > >> >> If you make the stencil width as wide at the overlap does the problem >> go away and you get expected convergence? >> >> > On Jun 28, 2016, at 3:48 PM, Xiangdong <[email protected]> wrote: >> > >> > Let me illustrate my point and question by an example. I have a 1d >> nonlinear problem with Nx=5 and stencil width 1. When I run it with np=2, >> processor 1 own cells 1,2,3 (+ ghost 4) and processor 2 own cells 4,5 (+ >> ghost 3). I also have functions FormFunctionLocal and FormJacobianLocal to >> compute the residual and Jacobian. >> > >> > i) snes_type newtonls. The Jacobian is in mpiaij format distributed on >> two processors. processor 1 computes the first three rows of Jac (size 3x5) >> and processor 1 computes the last two rows of Jac (size 2x5). The code >> works fine. >> > >> > ii) snes_type nasm and da_overlap 0. Each processor has its own Jac >> (seqaij). Processor 1 computes its 3x3 Jac and Processor 2 computes its 2x2 >> Jac. The code works fine. >> > >> > iii) snes_type nasm and da_overlap 1. Each processor needs to computes >> its own Jac. Processor 1 computes its 4x4 Jac and Processor 2 computes its >> 3x3 Jac. However, for processor 1 to compute the 4-th row of its Jac, it >> needs the properties information stored on cell 5, which is not available >> on processor 1. The zero pivot error is from this wrong Jac for the row >> corresponding to the overlap cell. The mat_fd_color does not work either >> for the same reason that information from cell 5 is not available on >> processor 1. >> > >> > Do you have any suggestion for the overlap case? Thank you. >> > >> > Best, >> > Xiangdong >> > >> > On Mon, Jun 27, 2016 at 4:21 PM, Barry Smith <[email protected]> >> wrote: >> > >> > > On Jun 27, 2016, at 3:15 PM, Xiangdong <[email protected]> wrote: >> > > >> > > Hello everyone, >> > > >> > > I am trying different number of da_overlap to see its effects on nasm >> and aspin preconditioner. The codes works fine with -da_overlap 0. However, >> when I change the option -da_overlap 1, it crashed with the error message >> like "zero pivot row 12544 value 0 tolerance 2.2e-14". The option >> -pc_factor_nonzeros_along_diagonal does not fix this zero pivot issue. >> > > >> > > Do you have any quick suggestions for me to try to fix this issue? >> > >> > My guess is that it is bug in the da_overlap business or in the >> example; normally I would not expect this to happen. Is this a PETSc >> example I can run (tell me example and exact options) to reproduce the >> problem? >> > >> > Thanks >> > >> > Barry >> > >> > > >> > > Thank you. >> > > >> > > Best, >> > > Xiangdong >> > >> > >> >> >
