On 06/12/2017 07:50 AM, Matthew Knepley wrote: > On Sun, Jun 11, 2017 at 11:06 PM, David Nolte <[email protected] > <mailto:[email protected]>> wrote: > > Thanks Matt, makes sense to me! > > I skipped direct solvers at first because for these 'real' > configurations LU (mumps/superlu_dist) usally goes out of memory > (got 32GB RAM). It would be reasonable to take one more step back > and play with synthetic examples. > I managed to run one case though with 936k dofs using: ("user" > =pressure mass matrix) > > <...> > -pc_fieldsplit_schur_fact_type upper > -pc_fieldsplit_schur_precondition user > -fieldsplit_0_ksp_type preonly > -fieldsplit_0_pc_type lu > -fieldsplit_0_pc_factor_mat_solver_package mumps > > -fieldsplit_1_ksp_type gmres > -fieldsplit_1_ksp_monitor_true_residuals > -fieldsplit_1_ksp_rtol 1e-10 > -fieldsplit_1_pc_type lu > -fieldsplit_1_pc_factor_mat_solver_package mumps > > It takes 2 outer iterations, as expected. However the fieldsplit_1 > solve takes very long. > > > 1) It should take 1 outer iterate, not two. The problem is that your > Schur tolerance is way too high. Use > > -fieldsplit_1_ksp_rtol 1e-10 > > or something like that. Then it will take 1 iterate.
Shouldn't it take 2 with a triangular Schur factorization and exact preconditioners, and 1 with a full factorization? (cf. Benzi et al 2005, p.66, http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf) That's exactly what I set: -fieldsplit_1_ksp_rtol 1e-10 and the Schur solver does drop below "rtol < 1e-10" > > 2) There is a problem with the Schur solve. Now from the iterates > > 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm > 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 > > it is clear that the preconditioner is really screwing stuff up. For > testing, you can use > > -pc_fieldsplit_schur_precondition full > > and your same setup here. It should take one iterate. I think there is > something wrong with your > mass matrix. I agree. I forgot to mention that I am considering an "enclosed flow" problem, with u=0 on all the boundary and a Dirichlet condition for the pressure in one point for fixing the constant pressure. Maybe the preconditioner is not consistent with this setup, need to check this.. Thanks a lot > > Thanks, > > Matt > > > 0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid > norm 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00 > Residual norms for fieldsplit_1_ solve. > 0 KSP preconditioned resid norm 0.000000000000e+00 true resid > norm 0.000000000000e+00 ||r(i)||/||b|| -nan > Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL > iterations 0 > 1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid > norm 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03 > Residual norms for fieldsplit_1_ solve. > 0 KSP preconditioned resid norm 2.965546249872e+08 true resid > norm 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.347596594634e+08 true resid > norm 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01 > 2 KSP preconditioned resid norm 5.913230136403e+07 true resid > norm 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01 > 3 KSP preconditioned resid norm 4.629700028930e+07 true resid > norm 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01 > 4 KSP preconditioned resid norm 3.804431276819e+07 true resid > norm 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01 > 5 KSP preconditioned resid norm 3.178769422140e+07 true resid > norm 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01 > 6 KSP preconditioned resid norm 2.648669043919e+07 true resid > norm 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01 > 7 KSP preconditioned resid norm 2.203522108614e+07 true resid > norm 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02 > <...> > 422 KSP preconditioned resid norm 2.984888715147e-02 true > resid norm 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11 > 423 KSP preconditioned resid norm 2.638419658982e-02 true > resid norm 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL > iterations 423 > 2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid > norm 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14 > Linear solve converged due to CONVERGED_RTOL iterations 2 > > > Does the slow convergence of the Schur block mean that my > preconditioning matrix Sp is a poor choice? > > Thanks, > David > > > On 06/11/2017 08:53 AM, Matthew Knepley wrote: >> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte >> <[email protected] <mailto:[email protected]>> wrote: >> >> Dear all, >> >> I am solving a Stokes problem in 3D aorta geometries, using a >> P2/P1 >> finite elements discretization on tetrahedral meshes resulting in >> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted >> arbitrarily), and >> the right hand side is a function of noisy measurement data. >> >> In other settings of "standard" Stokes flow problems I have >> obtained >> good convergence with an "upper" Schur complement >> preconditioner, using >> AMG (ML or Hypre) on the velocity block and approximating the >> Schur >> complement matrix by the diagonal of the pressure mass matrix: >> >> -ksp_converged_reason >> -ksp_monitor_true_residual >> -ksp_initial_guess_nonzero >> -ksp_diagonal_scale >> -ksp_diagonal_scale_fix >> -ksp_type fgmres >> -ksp_rtol 1.0e-8 >> >> -pc_type fieldsplit >> -pc_fieldsplit_type schur >> -pc_fieldsplit_detect_saddle_point >> -pc_fieldsplit_schur_fact_type upper >> -pc_fieldsplit_schur_precondition user # <-- pressure >> mass matrix >> >> -fieldsplit_0_ksp_type preonly >> -fieldsplit_0_pc_type ml >> >> -fieldsplit_1_ksp_type preonly >> -fieldsplit_1_pc_type jacobi >> >> >> 1) I always recommend starting from an exact solver and backing >> off in small steps for optimization. Thus >> I would start with LU on the upper block and GMRES/LU with >> toelrance 1e-10 on the Schur block. >> This should converge in 1 iterate. >> >> 2) I don't think you want preonly on the Schur system. You might >> want GMRES/Jacobi to invert the mass matrix. >> >> 3) You probably want to tighten the tolerance on the Schur solve, >> at least to start, and then slowly let it out. The >> tight tolerance will show you how effective the >> preconditioner is using that Schur operator. Then you can start >> to evaluate how effective the Schur linear sovler is. >> >> Does this make sense? >> >> Thanks, >> >> Matt >> >> >> In my present case this setup gives rather slow convergence >> (varies for >> different geometries between 200-500 or several thousands!). >> I obtain >> better convergence with "-pc_fieldsplit_schur_precondition >> selfp"and >> using multigrid on S, with "-fieldsplit_1_pc_type ml" (I >> don't think >> this is optimal, though). >> >> I don't understand why the pressure mass matrix approach >> performs so >> poorly and wonder what I could try to improve the >> convergence. Until now >> I have been using ML and Hypre BoomerAMG mostly with default >> parameters. >> Surely they can be improved by tuning some parameters. Which >> could be a >> good starting point? Are there other options I should consider? >> >> With the above setup (jacobi) for a case that works better >> than others, >> the KSP terminates with >> 467 KSP unpreconditioned resid norm 2.072014323515e-09 true >> resid norm >> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 >> >> You can find the output of -ksp_view below. Let me know if >> you need more >> details. >> >> Thanks in advance for your advice! >> Best wishes >> David >> >> >> KSP Object: 1 MPI processes >> type: fgmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10000 >> tolerances: relative=1e-08, absolute=1e-50, divergence=10000. >> right preconditioning >> diagonally scaled system >> using nonzero initial guess >> using UNPRECONDITIONED norm type for convergence test >> PC Object: 1 MPI processes >> type: fieldsplit >> FieldSplit with Schur preconditioner, factorization UPPER >> Preconditioner for the Schur complement formed from user >> provided matrix >> Split info: >> Split number 0 Defined by IS >> Split number 1 Defined by IS >> KSP solver for A00 block >> KSP Object: (fieldsplit_0_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_) 1 MPI processes >> type: ml >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> Coarse grid solver -- level >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_coarse_) >> 1 MPI >> processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_coarse_) >> 1 MPI >> processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero >> pivot >> [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 1. >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=3, cols=3 >> package used to perform factorization: petsc >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during >> MatSetValues >> calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=3, cols=3 >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> Down solver (pre-smoother) on level 1 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_1_) >> 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_1_) >> 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=15, cols=15 >> total: nonzeros=69, allocated nonzeros=69 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 2 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_2_) >> 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_2_) >> 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=304, cols=304 >> total: nonzeros=7354, allocated nonzeros=7354 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 3 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_3_) >> 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_3_) >> 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=30236, cols=30236 >> total: nonzeros=2730644, allocated nonzeros=2730644 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 4 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_4_) >> 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_4_) >> 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_0_) >> 1 MPI >> processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated >> nonzeros=70684164 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_0_) 1 MPI >> processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated nonzeros=70684164 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> KSP solver for S = A11 - A10 inv(A00) A01 >> KSP Object: (fieldsplit_1_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_1_) 1 MPI processes >> type: jacobi >> linear system matrix followed by preconditioner matrix: >> Mat Object: (fieldsplit_1_) 1 MPI >> processes >> type: schurcomplement >> rows=42025, cols=42025 >> Schur complement A11 - A10 inv(A00) A01 >> A11 >> Mat Object: (fieldsplit_1_) >> 1 >> MPI processes >> type: seqaij >> rows=42025, cols=42025 >> total: nonzeros=554063, allocated nonzeros=554063 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> A10 >> Mat Object: 1 MPI processes >> type: seqaij >> rows=42025, cols=894132 >> total: nonzeros=6850107, allocated >> nonzeros=6850107 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> KSP of A00 >> KSP Object: (fieldsplit_0_) >> 1 >> MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_) >> 1 >> MPI processes >> type: ml >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> Coarse grid solver -- level >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_coarse_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess >> is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_coarse_) 1 MPI processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to >> prevent zero >> pivot [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 1. >> Factored matrix follows: >> Mat Object: >> 1 MPI >> processes >> type: seqaij >> rows=3, cols=3 >> package used to perform >> factorization: petsc >> total: nonzeros=3, allocated >> nonzeros=3 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI >> processes >> type: seqaij >> rows=3, cols=3 >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during >> MatSetValues >> calls =0 >> not using I-node routines >> Down solver (pre-smoother) on level 1 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_1_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_1_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations >> = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI >> processes >> type: seqaij >> rows=15, cols=15 >> total: nonzeros=69, allocated nonzeros=69 >> total number of mallocs used during >> MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 2 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_2_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_2_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations >> = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI >> processes >> type: seqaij >> rows=304, cols=304 >> total: nonzeros=7354, allocated >> nonzeros=7354 >> total number of mallocs used during >> MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 3 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_3_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_3_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations >> = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI >> processes >> type: seqaij >> rows=30236, cols=30236 >> total: nonzeros=2730644, allocated >> nonzeros=2730644 >> total number of mallocs used during >> MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 4 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_4_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_4_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations >> = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: >> (fieldsplit_0_) 1 MPI processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated >> nonzeros=70684164 >> total number of mallocs used during >> MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: >> (fieldsplit_0_) 1 MPI processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated >> nonzeros=70684164 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> A01 >> Mat Object: 1 MPI processes >> type: seqaij >> rows=894132, cols=42025 >> total: nonzeros=6850107, allocated >> nonzeros=6850107 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> Mat Object: 1 MPI processes >> type: seqaij >> rows=42025, cols=42025 >> total: nonzeros=554063, allocated nonzeros=554063 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=936157, cols=936157 >> total: nonzeros=84938441, allocated nonzeros=84938441 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ <http://www.caam.rice.edu/%7Emk51/> > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ <http://www.caam.rice.edu/%7Emk51/>
