Can you add -snes_linesearch_monitor -sub_snes_linesearch_monitor -ksp_converged_reason and send the output??
"Takahashi, Tadanaga" <[email protected]> writes: > Hello, > > We are working on a finite difference solver for a 2D nonlinear PDE with > Dirichlet Boundary conditions on a rectangular domain. Our goal is to solve > the problem with parallel nonlinear additive Schwarz (NASM) as the outer > solver. Our code is similar to SNES example 5 > <https://petsc.org/release/src/snes/tutorials/ex5.c.html>. In example 5, > the parallel NASM can be executed with a command like `mpiexec -n 4 ./ex5 > -mms 3 -snes_type nasm -snes_nasm_type restrict -da_overlap 2` which gives > a convergent result. We assume this is the correct usage. A comment in the > source code for NASM mentions that NASM should be a preconditioner but > there's no documentation on the usage. The Brune paper does not cover > parallel NASM either. We observed that increasing the overlap leads to > fewer Schwarz iterations. The parallelization works seamlessly for an > arbitrary number of subdomains. This is the type of behavior we were > expecting from our code. > > Our method uses box-style stencil width d = ceil(N^(1/3)) on a N by N DMDA. > The finite difference stencil consists of 4d+1 points spread out in a > diamond formation. If a stencil point is out of bounds, then it is > projected onto the boundary curve. Since the nodes on the boundary curve > would result in an irregular mesh, we chose not treat boundary nodes as > unknowns as in Example 5. We use DMDACreate2d to create the DA for the > interior points and DMDASNESSetFunctionLocal to associate the residue > function to the SNES object. > > Our code works serially. We have also tested our code > with Newton-Krylov-Schwarz (NKS) by running something akin to `mpiexec -n > <n> ./solve -snes_type newtonls`. We have tested the NKS for several > quantities of subdomains and overlap and the code works as expected. We > have some confidence in the correctness of our code. The overlapping NASM > was implemented in MATLAB so we know the method converges. However, the > parallel NASM will not converge with our PETSc code. We don't understand > why NKS works while NASM does not. The F-norm residue monotonically > decreases and then stagnates. > > Here is an example of the output when attempting to run NASM in parallel: > takahashi@ubuntu:~/Desktop/MA-DDM/Cpp/Rectangle$ mpiexec -n 4 ./test1 -t1_N > 20 -snes_max_it 50 -snes_monitor -snes_view -da_overlap 3 -snes_type nasm > -snes_nasm_type restrict > 0 SNES Function norm 7.244681057908e+02 > 1 SNES Function norm 1.237688062971e+02 > 2 SNES Function norm 1.068926073552e+02 > 3 SNES Function norm 1.027563237834e+02 > 4 SNES Function norm 1.022184806736e+02 > 5 SNES Function norm 1.020818227640e+02 > 6 SNES Function norm 1.020325629121e+02 > 7 SNES Function norm 1.020149036595e+02 > 8 SNES Function norm 1.020088110545e+02 > 9 SNES Function norm 1.020067198030e+02 > 10 SNES Function norm 1.020060034469e+02 > 11 SNES Function norm 1.020057582380e+02 > 12 SNES Function norm 1.020056743241e+02 > 13 SNES Function norm 1.020056456101e+02 > 14 SNES Function norm 1.020056357849e+02 > 15 SNES Function norm 1.020056324231e+02 > 16 SNES Function norm 1.020056312727e+02 > 17 SNES Function norm 1.020056308791e+02 > 18 SNES Function norm 1.020056307444e+02 > 19 SNES Function norm 1.020056306983e+02 > 20 SNES Function norm 1.020056306826e+02 > 21 SNES Function norm 1.020056306772e+02 > 22 SNES Function norm 1.020056306753e+02 > 23 SNES Function norm 1.020056306747e+02 > 24 SNES Function norm 1.020056306745e+02 > 25 SNES Function norm 1.020056306744e+02 > 26 SNES Function norm 1.020056306744e+02 > 27 SNES Function norm 1.020056306744e+02 > 28 SNES Function norm 1.020056306744e+02 > 29 SNES Function norm 1.020056306744e+02 > 30 SNES Function norm 1.020056306744e+02 > 31 SNES Function norm 1.020056306744e+02 > 32 SNES Function norm 1.020056306744e+02 > 33 SNES Function norm 1.020056306744e+02 > 34 SNES Function norm 1.020056306744e+02 > 35 SNES Function norm 1.020056306744e+02 > 36 SNES Function norm 1.020056306744e+02 > 37 SNES Function norm 1.020056306744e+02 > 38 SNES Function norm 1.020056306744e+02 > 39 SNES Function norm 1.020056306744e+02 > 40 SNES Function norm 1.020056306744e+02 > 41 SNES Function norm 1.020056306744e+02 > 42 SNES Function norm 1.020056306744e+02 > 43 SNES Function norm 1.020056306744e+02 > 44 SNES Function norm 1.020056306744e+02 > 45 SNES Function norm 1.020056306744e+02 > 46 SNES Function norm 1.020056306744e+02 > 47 SNES Function norm 1.020056306744e+02 > 48 SNES Function norm 1.020056306744e+02 > 49 SNES Function norm 1.020056306744e+02 > 50 SNES Function norm 1.020056306744e+02 > SNES Object: 4 MPI processes > type: nasm > total subdomain blocks = 4 > Local solver information for first block on rank 0: > Use -snes_view ::ascii_info_detail to display information for all blocks > SNES Object: (sub_) 1 MPI processes > type: newtonls > maximum iterations=50, maximum function evaluations=10000 > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > total number of linear solver iterations=22 > total number of function evaluations=40 > norm schedule ALWAYS > Jacobian is built using a DMDA local Jacobian > SNESLineSearch Object: (sub_) 1 MPI processes > type: bt > interpolation: cubic > alpha=1.000000e-04 > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > maximum iterations=40 > KSP Object: (sub_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (sub_) 1 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: nd > factor fill ratio given 5., needed 2.13732 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=169, cols=169 > package used to perform factorization: petsc > total: nonzeros=13339, allocated nonzeros=13339 > using I-node routines: found 104 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=169, cols=169 > total: nonzeros=6241, allocated nonzeros=6241 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > maximum iterations=50, maximum function evaluations=10000 > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > total number of function evaluations=51 > norm schedule ALWAYS > Jacobian is built using a DMDA local Jacobian > problem ex10 on 20 x 20 point 2D grid with d = 3, and eps = 0.082: > error |u-uexact|_inf = 3.996e-01, |u-uexact|_h = 2.837e-01 > > We have been stuck on this for a while now. We do not know how to debug > this issue. Please let us know if you have any insights.
