Ling, Are you using PETSc TS? If so, it may worth trying Crank-Nicolson first to see if the nonlinear solve becomes faster.
In addition, you can try to improve the performance by pruning the Jacobian matrix. TSPruneIJacobianColor()<https://urldefense.us/v3/__https://petsc.org/main/manualpages/TS/TSPruneIJacobianColor/__;!!G_uCfscf7eWS!aAgMOjDD6ZgyEaxqc_vH-Yl5P1uNvFWa8_Lc5TWgoXfSz9x7kNgCi4jr10b2PmgzFwIyeCZGCGLUYU6hNJdYW8zcNQ$ > sometimes can reduce the number of colors especially for high-order methods and make your Jacobian matrix more compact. An example of usage can be found here<https://urldefense.us/v3/__https://petsc.org/release/src/ts/tests/ex5.c.html__;!!G_uCfscf7eWS!aAgMOjDD6ZgyEaxqc_vH-Yl5P1uNvFWa8_Lc5TWgoXfSz9x7kNgCi4jr10b2PmgzFwIyeCZGCGLUYU6hNJd4iC6hZw$ >. If you are not using TS, there is a SNES version SNESPruneJacobianColor()<https://urldefense.us/v3/__https://petsc.org/main/manualpages/SNES/SNESPruneJacobianColor/__;!!G_uCfscf7eWS!aAgMOjDD6ZgyEaxqc_vH-Yl5P1uNvFWa8_Lc5TWgoXfSz9x7kNgCi4jr10b2PmgzFwIyeCZGCGLUYU6hNJfq48Io5Q$ > for the same functionality. Hong (Mr.) On Mar 3, 2024, at 11:48 PM, Zou, Ling via petsc-users <petsc-users@mcs.anl.gov> wrote: From: Jed Brown <j...@jedbrown.org<mailto:j...@jedbrown.org>> Date: Sunday, March 3, 2024 at 11:35 PM To: Zou, Ling <l...@anl.gov<mailto:l...@anl.gov>>, Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] 'Preconditioning' with lower-order method If you're having PETSc use coloring and have confirmed that the stencil is sufficient, then it would be nonsmoothness (again, consider the limiter you've chosen) preventing quadratic convergence (assuming that doesn't kick in eventually). Note ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd If you're having PETSc use coloring and have confirmed that the stencil is sufficient, then it would be nonsmoothness (again, consider the limiter you've chosen) preventing quadratic convergence (assuming that doesn't kick in eventually). • Yes, I do use coloring, and I do provide sufficient stencil, i.e., neighbor’s neighbor. The sufficiency is confirmed by PETSc’s -snes_test_jacobian and -snes_test_jacobian_view options. Note that assembling a Jacobian of a second order TVD operator requires at least second neighbors while the first order needs only first neighbors, thus is much sparser and needs fewer colors to compute. • In my code implementation, when marking the Jacobian nonzero pattern, I don’t differentiate FV1 or FV2, I always use the FV2 stencil, so it’s a bit ‘fat’ for the FV1 method, but worked just fine. I expect you're either not exploiting that in the timings or something else is amiss. You can run with `-log_view -snes_view -ksp_converged_reason` to get a bit more information about what's happening. • The attached is screen output as you suggest. The linear and nonlinear performance of FV2 is both worse from the output. FV2: Time Step 149, time = 13229.7, dt = 100 NL Step = 0, fnorm = 7.80968E-03 Linear solve converged due to CONVERGED_RTOL iterations 26 NL Step = 1, fnorm = 7.65731E-03 Linear solve converged due to CONVERGED_RTOL iterations 24 NL Step = 2, fnorm = 6.85034E-03 Linear solve converged due to CONVERGED_RTOL iterations 27 NL Step = 3, fnorm = 6.11873E-03 Linear solve converged due to CONVERGED_RTOL iterations 25 NL Step = 4, fnorm = 1.57347E-03 Linear solve converged due to CONVERGED_RTOL iterations 27 NL Step = 5, fnorm = 9.03536E-04 SNES Object: 1 MPI process type: newtonls maximum iterations=20, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-06, solution=1e-08 total number of linear solver iterations=129 total number of function evaluations=144 norm schedule ALWAYS Jacobian is applied matrix-free with differencing Preconditioning Jacobian is built using finite differences with coloring SNESLineSearch Object: 1 MPI process type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 1 MPI process type: gmres restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=100, initial guess is zero tolerances: relative=0.0001, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI process type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 using diagonal shift to prevent zero pivot [NONZERO] matrix ordering: rcm factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI process type: seqaij rows=8715, cols=8715 package used to perform factorization: petsc total: nonzeros=38485, allocated nonzeros=38485 not using I-node routines linear system matrix followed by preconditioner matrix: Mat Object: 1 MPI process type: mffd rows=8715, cols=8715 Matrix-free approximation: err=1.49012e-08 (relative error in function evaluation) Using wp compute h routine Does not compute normU Mat Object: 1 MPI process type: seqaij rows=8715, cols=8715 total: nonzeros=38485, allocated nonzeros=38485 total number of mallocs used during MatSetValues calls=0 not using I-node routines Solve Converged! FV1: Time Step 149, time = 13229.7, dt = 100 NL Step = 0, fnorm = 7.90072E-03 Linear solve converged due to CONVERGED_RTOL iterations 12 NL Step = 1, fnorm = 2.01919E-04 Linear solve converged due to CONVERGED_RTOL iterations 17 NL Step = 2, fnorm = 1.06960E-05 Linear solve converged due to CONVERGED_RTOL iterations 15 NL Step = 3, fnorm = 2.41683E-09 SNES Object: 1 MPI process type: newtonls maximum iterations=20, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-06, solution=1e-08 total number of linear solver iterations=44 total number of function evaluations=51 norm schedule ALWAYS Jacobian is applied matrix-free with differencing Preconditioning Jacobian is built using finite differences with coloring SNESLineSearch Object: 1 MPI process type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 1 MPI process type: gmres restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=100, initial guess is zero tolerances: relative=0.0001, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI process type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 using diagonal shift to prevent zero pivot [NONZERO] matrix ordering: rcm factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI process type: seqaij rows=8715, cols=8715 package used to perform factorization: petsc total: nonzeros=38485, allocated nonzeros=38485 not using I-node routines linear system matrix followed by preconditioner matrix: Mat Object: 1 MPI process type: mffd rows=8715, cols=8715 Matrix-free approximation: err=1.49012e-08 (relative error in function evaluation) Using wp compute h routine Does not compute normU Mat Object: 1 MPI process type: seqaij rows=8715, cols=8715 total: nonzeros=38485, allocated nonzeros=38485 total number of mallocs used during MatSetValues calls=0 not using I-node routines Solve Converged! "Zou, Ling via petsc-users" <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> writes: > Barry, thank you. > I am not sure if I exactly follow you on this: > “Are you forming the Jacobian for the first and second order cases inside of > Newton?” > > The problem that we deal with, heat/mass transfer in heterogeneous systems > (reactor system), is generally small in terms of size, i.e., # of DOFs > (several k to maybe 100k level), so for now, I completely rely on PETSc to > compute Jacobian, i.e., finite-differencing. > > That’s a good suggestion to see the time spent during various events. > What motivated me to try the options are the following observations. > > 2nd order FVM: > > Time Step 149, time = 13229.7, dt = 100 > > NL Step = 0, fnorm = 7.80968E-03 > > NL Step = 1, fnorm = 7.65731E-03 > > NL Step = 2, fnorm = 6.85034E-03 > > NL Step = 3, fnorm = 6.11873E-03 > > NL Step = 4, fnorm = 1.57347E-03 > > NL Step = 5, fnorm = 9.03536E-04 > > Solve Converged! > > 1st order FVM: > > Time Step 149, time = 13229.7, dt = 100 > > NL Step = 0, fnorm = 7.90072E-03 > > NL Step = 1, fnorm = 2.01919E-04 > > NL Step = 2, fnorm = 1.06960E-05 > > NL Step = 3, fnorm = 2.41683E-09 > > Solve Converged! > > Notice the obvious ‘stagnant’ in residual for the 2nd order method while not > in the 1st order. > For the same problem, the wall time is 10 sec vs 6 sec. I would be happy if I > can reduce 2 sec for the 2nd order method. > > -Ling > > From: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>> > Date: Sunday, March 3, 2024 at 12:06 PM > To: Zou, Ling <l...@anl.gov<mailto:l...@anl.gov>> > Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> > <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> > Subject: Re: [petsc-users] 'Preconditioning' with lower-order method > Are you forming the Jacobian for the first and second order cases inside of > Newton? You can run both with -log_view to see how much time is spent in the > various events (compute function, compute Jacobian, linear solve, .. . ) for > the two cases > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > Are you forming the Jacobian for the first and second order cases inside > of Newton? > > You can run both with -log_view to see how much time is spent in the > various events (compute function, compute Jacobian, linear solve, ...) for > the two cases and compare them. > > > > > On Mar 3, 2024, at 11:42 AM, Zou, Ling via petsc-users > <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote: > > Original email may have been sent to the incorrect place. > See below. > > -Ling > > From: Zou, Ling <l...@anl.gov<mailto:l...@anl.gov><mailto:l...@anl.gov>> > Date: Sunday, March 3, 2024 at 10:34 AM > To: petsc-users > <petsc-users-boun...@mcs.anl.gov<mailto:petsc-users-boun...@mcs.anl.gov><mailto:petsc-users-boun...@mcs.anl.gov>> > Subject: 'Preconditioning' with lower-order method > Hi all, > > I am solving a PDE system over a spatial domain. Numerical methods are: > > * Finite volume method (both 1st and 2nd order implemented) > * BDF1 and BDF2 for time integration. > What I have noticed is that 1st order FVM converges much faster than 2nd > order FVM, regardless the time integration scheme. Well, not surprising since > 2nd order FVM introduces additional non-linearity. > > I’m thinking about two possible ways to speed up 2nd order FVM, and would > like to get some thoughts or community knowledge before jumping into code > implementation. > > Say, let the 2nd order FVM residual function be F2(x) = 0; and the 1st order > FVM residual function be F1(x) = 0. > > 1. Option – 1, multi-step for each time step > Step 1: solving F1(x) = 0 to obtain a temporary solution x1 > Step 2: feed x1 as an initial guess to solve F2(x) = 0 to obtain the final > solution. > [Not sure if gain any saving at all] > > > 1. Option -2, dynamically changing residual function F(x) > In pseudo code, would be something like. > > snesFormFunction(SNES snes, Vec u, Vec f, void *) > { > if (snes.nl_it_no < 4) // 4 being arbitrary here > f = F1(u); > else > f = F2(u); > } > > I know this might be a bit crazy since it may crash after switching residual > function, still, any thoughts? > > Best, > > -Ling