On Mon, Jun 12, 2017 at 10:36 AM, David Nolte <[email protected]> wrote:
> > On 06/12/2017 07:50 AM, Matthew Knepley wrote: > > On Sun, Jun 11, 2017 at 11:06 PM, David Nolte <[email protected]> > wrote: > >> Thanks Matt, makes sense to me! >> >> I skipped direct solvers at first because for these 'real' configurations >> LU (mumps/superlu_dist) usally goes out of memory (got 32GB RAM). It would >> be reasonable to take one more step back and play with synthetic examples. >> I managed to run one case though with 936k dofs using: ("user" =pressure >> mass matrix) >> >> <...> >> -pc_fieldsplit_schur_fact_type upper >> -pc_fieldsplit_schur_precondition user >> -fieldsplit_0_ksp_type preonly >> -fieldsplit_0_pc_type lu >> -fieldsplit_0_pc_factor_mat_solver_package mumps >> >> -fieldsplit_1_ksp_type gmres >> -fieldsplit_1_ksp_monitor_true_residuals >> -fieldsplit_1_ksp_rtol 1e-10 >> -fieldsplit_1_pc_type lu >> -fieldsplit_1_pc_factor_mat_solver_package mumps >> >> It takes 2 outer iterations, as expected. However the fieldsplit_1 solve >> takes very long. >> > > 1) It should take 1 outer iterate, not two. The problem is that your Schur > tolerance is way too high. Use > > -fieldsplit_1_ksp_rtol 1e-10 > > or something like that. Then it will take 1 iterate. > > > Shouldn't it take 2 with a triangular Schur factorization and exact > preconditioners, and 1 with a full factorization? (cf. Benzi et al 2005, > p.66, http://www.mathcs.emory.edu/~benzi/Web_papers/bgl05.pdf) > > That's exactly what I set: -fieldsplit_1_ksp_rtol 1e-10 and the Schur > solver does drop below "rtol < 1e-10" > Oh, yes. Take away the upper until things are worked out. Thanks, Matt > > 2) There is a problem with the Schur solve. Now from the iterates > > 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm > 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 > > it is clear that the preconditioner is really screwing stuff up. For > testing, you can use > > -pc_fieldsplit_schur_precondition full > > and your same setup here. It should take one iterate. I think there is > something wrong with your > mass matrix. > > > I agree. I forgot to mention that I am considering an "enclosed flow" > problem, with u=0 on all the boundary and a Dirichlet condition for the > pressure in one point for fixing the constant pressure. Maybe the > preconditioner is not consistent with this setup, need to check this.. > > Thanks a lot > > > > Thanks, > > Matt > > >> 0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid norm >> 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00 >> Residual norms for fieldsplit_1_ solve. >> 0 KSP preconditioned resid norm 0.000000000000e+00 true resid norm >> 0.000000000000e+00 ||r(i)||/||b|| -nan >> Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL iterations 0 >> 1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid norm >> 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03 >> Residual norms for fieldsplit_1_ solve. >> 0 KSP preconditioned resid norm 2.965546249872e+08 true resid norm >> 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP preconditioned resid norm 1.347596594634e+08 true resid norm >> 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01 >> 2 KSP preconditioned resid norm 5.913230136403e+07 true resid norm >> 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01 >> 3 KSP preconditioned resid norm 4.629700028930e+07 true resid norm >> 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01 >> 4 KSP preconditioned resid norm 3.804431276819e+07 true resid norm >> 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01 >> 5 KSP preconditioned resid norm 3.178769422140e+07 true resid norm >> 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01 >> 6 KSP preconditioned resid norm 2.648669043919e+07 true resid norm >> 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01 >> 7 KSP preconditioned resid norm 2.203522108614e+07 true resid norm >> 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02 >> <...> >> 422 KSP preconditioned resid norm 2.984888715147e-02 true resid norm >> 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11 >> 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm >> 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 >> Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations >> 423 >> 2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid norm >> 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14 >> Linear solve converged due to CONVERGED_RTOL iterations 2 >> >> >> Does the slow convergence of the Schur block mean that my preconditioning >> matrix Sp is a poor choice? >> >> Thanks, >> David >> >> >> On 06/11/2017 08:53 AM, Matthew Knepley wrote: >> >> On Sat, Jun 10, 2017 at 8:25 PM, David Nolte <[email protected]> >> wrote: >> >>> Dear all, >>> >>> I am solving a Stokes problem in 3D aorta geometries, using a P2/P1 >>> finite elements discretization on tetrahedral meshes resulting in >>> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and >>> the right hand side is a function of noisy measurement data. >>> >>> In other settings of "standard" Stokes flow problems I have obtained >>> good convergence with an "upper" Schur complement preconditioner, using >>> AMG (ML or Hypre) on the velocity block and approximating the Schur >>> complement matrix by the diagonal of the pressure mass matrix: >>> >>> -ksp_converged_reason >>> -ksp_monitor_true_residual >>> -ksp_initial_guess_nonzero >>> -ksp_diagonal_scale >>> -ksp_diagonal_scale_fix >>> -ksp_type fgmres >>> -ksp_rtol 1.0e-8 >>> >>> -pc_type fieldsplit >>> -pc_fieldsplit_type schur >>> -pc_fieldsplit_detect_saddle_point >>> -pc_fieldsplit_schur_fact_type upper >>> -pc_fieldsplit_schur_precondition user # <-- pressure mass matrix >>> >>> -fieldsplit_0_ksp_type preonly >>> -fieldsplit_0_pc_type ml >>> >>> -fieldsplit_1_ksp_type preonly >>> -fieldsplit_1_pc_type jacobi >>> >> >> 1) I always recommend starting from an exact solver and backing off in >> small steps for optimization. Thus >> I would start with LU on the upper block and GMRES/LU with toelrance >> 1e-10 on the Schur block. >> This should converge in 1 iterate. >> >> 2) I don't think you want preonly on the Schur system. You might want >> GMRES/Jacobi to invert the mass matrix. >> >> 3) You probably want to tighten the tolerance on the Schur solve, at >> least to start, and then slowly let it out. The >> tight tolerance will show you how effective the preconditioner is >> using that Schur operator. Then you can start >> to evaluate how effective the Schur linear sovler is. >> >> Does this make sense? >> >> Thanks, >> >> Matt >> >> >>> In my present case this setup gives rather slow convergence (varies for >>> different geometries between 200-500 or several thousands!). I obtain >>> better convergence with "-pc_fieldsplit_schur_precondition selfp"and >>> using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think >>> this is optimal, though). >>> >>> I don't understand why the pressure mass matrix approach performs so >>> poorly and wonder what I could try to improve the convergence. Until now >>> I have been using ML and Hypre BoomerAMG mostly with default parameters. >>> Surely they can be improved by tuning some parameters. Which could be a >>> good starting point? Are there other options I should consider? >>> >>> With the above setup (jacobi) for a case that works better than others, >>> the KSP terminates with >>> 467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm >>> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 >>> >>> You can find the output of -ksp_view below. Let me know if you need more >>> details. >>> >>> Thanks in advance for your advice! >>> Best wishes >>> David >>> >>> >>> KSP Object: 1 MPI processes >>> type: fgmres >>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>> Orthogonalization with no iterative refinement >>> GMRES: happy breakdown tolerance 1e-30 >>> maximum iterations=10000 >>> tolerances: relative=1e-08, absolute=1e-50, divergence=10000. >>> right preconditioning >>> diagonally scaled system >>> using nonzero initial guess >>> using UNPRECONDITIONED norm type for convergence test >>> PC Object: 1 MPI processes >>> type: fieldsplit >>> FieldSplit with Schur preconditioner, factorization UPPER >>> Preconditioner for the Schur complement formed from user provided >>> matrix >>> Split info: >>> Split number 0 Defined by IS >>> Split number 1 Defined by IS >>> KSP solver for A00 block >>> KSP Object: (fieldsplit_0_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_) 1 MPI processes >>> type: ml >>> MG: type is MULTIPLICATIVE, levels=5 cycles=v >>> Cycles per PCApply=1 >>> Using Galerkin computed coarse grid matrices >>> Coarse grid solver -- level ------------------------------- >>> KSP Object: (fieldsplit_0_mg_coarse_) 1 MPI >>> processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_mg_coarse_) 1 MPI >>> processes >>> type: lu >>> LU: out-of-place factorization >>> tolerance for zero pivot 2.22045e-14 >>> using diagonal shift on blocks to prevent zero pivot >>> [INBLOCKS] >>> matrix ordering: nd >>> factor fill ratio given 5., needed 1. >>> Factored matrix follows: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=3, cols=3 >>> package used to perform factorization: petsc >>> total: nonzeros=3, allocated nonzeros=3 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=3, cols=3 >>> total: nonzeros=3, allocated nonzeros=3 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> Down solver (pre-smoother) on level 1 >>> ------------------------------- >>> KSP Object: (fieldsplit_0_mg_levels_1_) 1 >>> MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_mg_levels_1_) 1 >>> MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=15, cols=15 >>> total: nonzeros=69, allocated nonzeros=69 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver (pre-smoother) >>> Down solver (pre-smoother) on level 2 >>> ------------------------------- >>> KSP Object: (fieldsplit_0_mg_levels_2_) 1 >>> MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_mg_levels_2_) 1 >>> MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=304, cols=304 >>> total: nonzeros=7354, allocated nonzeros=7354 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver (pre-smoother) >>> Down solver (pre-smoother) on level 3 >>> ------------------------------- >>> KSP Object: (fieldsplit_0_mg_levels_3_) 1 >>> MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_mg_levels_3_) 1 >>> MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=30236, cols=30236 >>> total: nonzeros=2730644, allocated nonzeros=2730644 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver (pre-smoother) >>> Down solver (pre-smoother) on level 4 >>> ------------------------------- >>> KSP Object: (fieldsplit_0_mg_levels_4_) 1 >>> MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_mg_levels_4_) 1 >>> MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: (fieldsplit_0_) 1 MPI >>> processes >>> type: seqaij >>> rows=894132, cols=894132 >>> total: nonzeros=70684164, allocated nonzeros=70684164 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver (pre-smoother) >>> linear system matrix = precond matrix: >>> Mat Object: (fieldsplit_0_) 1 MPI processes >>> type: seqaij >>> rows=894132, cols=894132 >>> total: nonzeros=70684164, allocated nonzeros=70684164 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> KSP solver for S = A11 - A10 inv(A00) A01 >>> KSP Object: (fieldsplit_1_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_1_) 1 MPI processes >>> type: jacobi >>> linear system matrix followed by preconditioner matrix: >>> Mat Object: (fieldsplit_1_) 1 MPI processes >>> type: schurcomplement >>> rows=42025, cols=42025 >>> Schur complement A11 - A10 inv(A00) A01 >>> A11 >>> Mat Object: (fieldsplit_1_) 1 >>> MPI processes >>> type: seqaij >>> rows=42025, cols=42025 >>> total: nonzeros=554063, allocated nonzeros=554063 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> A10 >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=42025, cols=894132 >>> total: nonzeros=6850107, allocated nonzeros=6850107 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> KSP of A00 >>> KSP Object: (fieldsplit_0_) 1 >>> MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_) 1 >>> MPI processes >>> type: ml >>> MG: type is MULTIPLICATIVE, levels=5 cycles=v >>> Cycles per PCApply=1 >>> Using Galerkin computed coarse grid matrices >>> Coarse grid solver -- level >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_coarse_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_coarse_) 1 MPI processes >>> type: lu >>> LU: out-of-place factorization >>> tolerance for zero pivot 2.22045e-14 >>> using diagonal shift on blocks to prevent zero >>> pivot [INBLOCKS] >>> matrix ordering: nd >>> factor fill ratio given 5., needed 1. >>> Factored matrix follows: >>> Mat Object: 1 MPI >>> processes >>> type: seqaij >>> rows=3, cols=3 >>> package used to perform factorization: petsc >>> total: nonzeros=3, allocated nonzeros=3 >>> total number of mallocs used during >>> MatSetValues calls =0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=3, cols=3 >>> total: nonzeros=3, allocated nonzeros=3 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> Down solver (pre-smoother) on level 1 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_1_) 1 MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_levels_1_) 1 MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=15, cols=15 >>> total: nonzeros=69, allocated nonzeros=69 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver >>> (pre-smoother) >>> Down solver (pre-smoother) on level 2 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_2_) 1 MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_levels_2_) 1 MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=304, cols=304 >>> total: nonzeros=7354, allocated nonzeros=7354 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver >>> (pre-smoother) >>> Down solver (pre-smoother) on level 3 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_3_) 1 MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_levels_3_) 1 MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=30236, cols=30236 >>> total: nonzeros=2730644, allocated nonzeros=2730644 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver >>> (pre-smoother) >>> Down solver (pre-smoother) on level 4 >>> ------------------------------- >>> KSP Object: >>> (fieldsplit_0_mg_levels_4_) 1 MPI processes >>> type: richardson >>> Richardson: damping factor=1. >>> maximum iterations=2 >>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> left preconditioning >>> using nonzero initial guess >>> using NONE norm type for convergence test >>> PC Object: >>> (fieldsplit_0_mg_levels_4_) 1 MPI processes >>> type: sor >>> SOR: type = local_symmetric, iterations = 1, local >>> iterations = 1, omega = 1. >>> linear system matrix = precond matrix: >>> Mat Object: >>> (fieldsplit_0_) 1 MPI processes >>> type: seqaij >>> rows=894132, cols=894132 >>> total: nonzeros=70684164, allocated >>> nonzeros=70684164 >>> total number of mallocs used during MatSetValues >>> calls =0 >>> not using I-node routines >>> Up solver (post-smoother) same as down solver >>> (pre-smoother) >>> linear system matrix = precond matrix: >>> Mat Object: >>> (fieldsplit_0_) 1 MPI processes >>> type: seqaij >>> rows=894132, cols=894132 >>> total: nonzeros=70684164, allocated nonzeros=70684164 >>> total number of mallocs used during MatSetValues calls >>> =0 >>> not using I-node routines >>> A01 >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=894132, cols=42025 >>> total: nonzeros=6850107, allocated nonzeros=6850107 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=42025, cols=42025 >>> total: nonzeros=554063, allocated nonzeros=554063 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=936157, cols=936157 >>> total: nonzeros=84938441, allocated nonzeros=84938441 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node routines >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/
