I always like a challenge. The MR https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8675__;!!G_uCfscf7eWS!eEJyFKOE1fIr1zG138qFEBHnEAPDDHz6-nhpA3l8TnLtt8VK3i98Cc13eRy7zJGgmWBtGQ8IB0yIDNjR7M6rtACjJrE$ should eliminate both redundant TSComputeRHSFunction() calls in your code and thus decrease its run time.
Thanks for bringing the performance bug to our attention Barry > On Aug 30, 2025, at 5:42 PM, Aldo Bonfiglioli <aldo.bonfigli...@unibas.it> > wrote: > > Barry, > thank you for the update. > Aldo > > > Aldo Bonfiglioli > Professore associato di Fluidodinamica settore scientifico disciplinare > IIND-01/F > Dipartimento di Ingegneria > Viale dell'Ateneo Lucano, 10 85100 Potenza > Tel: 0971.205203 > > > Il ven 29 ago 2025, 20:52 Barry Smith <barryfsm...@icloud.com > <mailto:barryfsm...@icloud.com>> ha scritto: >> >> Aldo, >> >> I understand the situation you are facing. >> >> With pseudo-continuation, when you provide an RHS function, SNES is >> computing udot - >> F(u). Inside the SNESSolve(), specifically in TSComputeIFunction(), with the >> lines >> >> if (!ts->ops->ifunction) { >> ierr = TSComputeRHSFunction(ts,t,X,Y);CHKERRQ(ierr); >> ierr = VecAYPX(Y,-1,Xdot);CHKERRQ(ierr); >> >> But for the original system, we care about the norm for F(u). So in >> TSStep_Pseudo(), after the nonlinear solve >> there is the line >> >> pseudo->fnorm = -1; /* The current norm is no longer valid, monitor must >> recompute it. */ >> >> and further down >> >> if (pseudo->fnorm < 0) { >> PetscCall(VecZeroEntries(pseudo->xdot)); >> PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, >> pseudo->func, PETSC_FALSE)); >> PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm)); >> >> this norm is needed to determine of the nonlinear system has converged >> satisfactory (or if more pseudo time steps are needed). >> >> Hence, your FormRHSFunction gets called twice with identical input. >> >> To make matters even worse your FormRHSFunction is actually being called >> THREE times with identical input. Because at the start of TSStep_Pseudo() it >> calls TSPseudoComputeTimeStep(ts, &next_time_step) which by default uses >> TSPseudoTimeStepDefault() which has the same chunk of code >> >> PetscCall(VecZeroEntries(pseudo->xdot)); >> PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, >> pseudo->func, PETSC_FALSE)); >> PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm)); >> >> And the same norm is also recomputed. >> >> Now the diff >> >> diff --git a/src/ts/impls/pseudo/posindep.c b/src/ts/impls/pseudo/posindep.c >> index 7c013435f98..813a28311e7 100644 >> --- a/src/ts/impls/pseudo/posindep.c >> +++ b/src/ts/impls/pseudo/posindep.c >> @@ -660,9 +660,11 @@ PetscErrorCode TSPseudoTimeStepDefault(TS ts, PetscReal >> *newdt, void *dtctx) >> PetscReal inc = pseudo->dt_increment; >> >> PetscFunctionBegin; >> - PetscCall(VecZeroEntries(pseudo->xdot)); >> - PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, >> pseudo->func, PETSC_FALSE)); >> - PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm)); >> + if (pseudo->fnorm < 0.0) { >> + PetscCall(VecZeroEntries(pseudo->xdot)); >> + PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, >> pseudo->func, PETSC_FALSE)); >> + PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm)); >> + } >> if (pseudo->fnorm_initial < 0) { >> /* first time through so compute initial function norm */ >> pseudo->fnorm_initial = pseudo->fnorm >> >> will prevent the THIRD computation of the same function evaluation. So this >> is a good optimization >> >> But eliminating the second computation is not so trivial. To review, deep >> inside the TS code there is >> >> TSComputeRHSFunction(ts,t,X,Y); >> VecAYPX(Y,-1,Xdot); >> >> This code knows nothing about TSPPseudo etc. >> >> Then later in the Pseudo code >> >> PetscCall(VecZeroEntries(pseudo->xdot)); >> PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, >> pseudo->func, PETSC_FALSE)); >> PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm)); >> >> again calls TSComputeRHSFunction(ts,t,X,Y); with the same X. (since Xdot >> is not used directly by TSComputeRHSFunction()). >> >> Now if all the code was custom for Pseudo we could simply put in a line >> >> TSComputeRHSFunction(ts,t,X,Y); >> VecNorm(Y,NORM_2, &pseudo->fnorm) >> VecAYPX(Y,-1,Xdot); >> >> into TSComputeIFunction() and have the norm available later for pseudo. >> >> It is not clear that there is any definitive benefit for pseudo to utilize >> the TS framework. It possibly gets some code reuse from TS but at the cost >> of extra code that won't be needed if written directly. Plus this >> performance hit. >> >> Anyways I will make a merge request to remove the third redundant >> computation of FormRHSFunction but will await input from others on if >> anything else could be done to remove the inefficiency. >> >> Thanks for reporting this challenging problem. >> >> Barry >> >> Note: I don't think providing the Jacobian improves the efficiency, it is >> just better at hiding the redundant computation. >> >> >> >> >> >> >> >> >>> On Aug 26, 2025, at 4:21 AM, Aldo Bonfiglioli <aldo.bonfigli...@unibas.it >>> <mailto:aldo.bonfigli...@unibas.it>> wrote: >>> >>> Hi there, >>> >>> I am using -ts_type pseudo to find steady solutions of the Euler/NS PDEs. >>> >>> It looks like FormRHSFunction is called twice with the exactly same value >>> of the dependent variable (u) >>> >>> before invoking TSRHSJacobianFn and then solving the linear system with KSP >>> (u is then updated) >>> >>> >>>> 1 TS dt 0.141197 time 0.141197 >>>> Calling FormRHSFunction; ||u|| = 60.489405479528003 >>>> 463.15635675129079 62.813532026019146 5.9789502719351155 >>>> Calling FormRHSFunction; ||u|| = 60.489405479528003 >>>> 463.15635675129079 62.813532026019146 5.9789502719351155 >>>> Calling TSRHSJacobianFn with ||u|| = 60.489405479528003 >>>> 463.15635675129079 62.813532026019146 5.9789502719351155 >>>> Linear solve converged due to CONVERGED_RTOL iterations 9 >>>> Calling FormRHSFunction; ||u|| = 60.489405476878737 >>>> 463.15635675602186 62.813532031616965 5.9789502710519811 >>>> 2 TS dt 0.155917 time 0.297115 >>>> Calling FormRHSFunction; ||u|| = 60.489405476878737 >>>> 463.15635675602186 62.813532031616965 5.9789502710519811 >>>> Calling FormRHSFunction; ||u|| = 60.489405476878737 >>>> 463.15635675602186 62.813532031616965 5.9789502710519811 >>>> Calling TSRHSJacobianFn with ||u|| = 60.489405476878737 >>>> 463.15635675602186 62.813532031616965 5.9789502710519811 >>>> >>> Why is it so? Am I possibly misusing (or missing) some options? >>> >>> My .petscrc file is attached. >>> >>> Thank you, >>> >>> Aldo >>> >>> -- >>> Dr. Aldo Bonfiglioli >>> Associate professor of Fluid Mechanics >>> Dipartimento di Ingegneria >>> Universita' della Basilicata >>> V.le dell'Ateneo Lucano, 10 85100 Potenza ITALY >>> <https://urldefense.us/v3/__https://www.google.com/maps/search/dell'Ateneo*Lucano,*10*85100*Potenza*ITALY?entry=gmail&source=g__;KysrKys!!G_uCfscf7eWS!eEJyFKOE1fIr1zG138qFEBHnEAPDDHz6-nhpA3l8TnLtt8VK3i98Cc13eRy7zJGgmWBtGQ8IB0yIDNjR7M6rNuzDtLU$ >>> > >>> tel:+39.0971.205203 fax:+39.0971.205215 >>> web: >>> https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!eEJyFKOE1fIr1zG138qFEBHnEAPDDHz6-nhpA3l8TnLtt8VK3i98Cc13eRy7zJGgmWBtGQ8IB0yIDNjR7M6rchJhC0A$ >>> >>> <https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!ci_aySHdF2Qx_B-37IOzELFDtcQRthJ2NItK-JNbfXB0M2_21B_z97WIfjxZUyKCVRRhd1mzL6b-rnCed_i8LLA56JvFdz7JBrw$> >>