I always like a challenge.

  The MR 
https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8675__;!!G_uCfscf7eWS!eEJyFKOE1fIr1zG138qFEBHnEAPDDHz6-nhpA3l8TnLtt8VK3i98Cc13eRy7zJGgmWBtGQ8IB0yIDNjR7M6rtACjJrE$
  should eliminate both redundant TSComputeRHSFunction() calls in your code and 
thus decrease its run time. 

 Thanks for bringing the performance bug to our attention 

  Barry


> On Aug 30, 2025, at 5:42 PM, Aldo Bonfiglioli <aldo.bonfigli...@unibas.it> 
> wrote:
> 
> Barry,
> thank you for the update.
> Aldo 
> 
> 
> Aldo Bonfiglioli
> Professore associato di Fluidodinamica settore scientifico disciplinare 
> IIND-01/F 
> Dipartimento di Ingegneria
> Viale dell'Ateneo Lucano, 10 85100 Potenza
> Tel: 0971.205203
> 
> 
> Il ven 29 ago 2025, 20:52 Barry Smith <barryfsm...@icloud.com 
> <mailto:barryfsm...@icloud.com>> ha scritto:
>> 
>>   Aldo,
>> 
>>     I understand the situation you are facing. 
>> 
>>     With pseudo-continuation, when you provide an RHS function, SNES is 
>> computing udot -
>> F(u). Inside the SNESSolve(), specifically in TSComputeIFunction(), with the 
>> lines
>> 
>>     if (!ts->ops->ifunction) {
>>       ierr = TSComputeRHSFunction(ts,t,X,Y);CHKERRQ(ierr);
>>       ierr = VecAYPX(Y,-1,Xdot);CHKERRQ(ierr);
>> 
>> But for the original system, we care about the norm for F(u). So in 
>> TSStep_Pseudo(), after the nonlinear solve
>> there is the line
>> 
>>     pseudo->fnorm = -1; /* The current norm is no longer valid, monitor must 
>> recompute it. */
>> 
>> and further down
>> 
>>   if (pseudo->fnorm < 0) {
>>     PetscCall(VecZeroEntries(pseudo->xdot));
>>     PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, 
>> pseudo->func, PETSC_FALSE));
>>     PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm));
>> 
>> this norm is needed to determine of the nonlinear system has converged 
>> satisfactory (or if more pseudo time steps are needed).
>> 
>> Hence, your FormRHSFunction gets called twice with identical input. 
>> 
>> To make matters even worse your FormRHSFunction is actually being called 
>> THREE times with identical input. Because at the start of TSStep_Pseudo() it 
>> calls TSPseudoComputeTimeStep(ts, &next_time_step) which by default uses 
>> TSPseudoTimeStepDefault() which has the same chunk of code 
>> 
>>  PetscCall(VecZeroEntries(pseudo->xdot));
>>   PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, 
>> pseudo->func, PETSC_FALSE));
>>   PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm));
>> 
>> And the same norm is also recomputed.
>> 
>> Now the diff 
>> 
>> diff --git a/src/ts/impls/pseudo/posindep.c b/src/ts/impls/pseudo/posindep.c
>> index 7c013435f98..813a28311e7 100644
>> --- a/src/ts/impls/pseudo/posindep.c
>> +++ b/src/ts/impls/pseudo/posindep.c
>> @@ -660,9 +660,11 @@ PetscErrorCode TSPseudoTimeStepDefault(TS ts, PetscReal 
>> *newdt, void *dtctx)
>>    PetscReal  inc    = pseudo->dt_increment;
>>  
>>    PetscFunctionBegin;
>> -  PetscCall(VecZeroEntries(pseudo->xdot));
>> -  PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, 
>> pseudo->func, PETSC_FALSE));
>> -  PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm));
>> +  if (pseudo->fnorm < 0.0) {
>> +    PetscCall(VecZeroEntries(pseudo->xdot));
>> +    PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, 
>> pseudo->func, PETSC_FALSE));
>> +    PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm));
>> +  }
>>    if (pseudo->fnorm_initial < 0) {
>>      /* first time through so compute initial function norm */
>>      pseudo->fnorm_initial  = pseudo->fnorm
>> 
>> will prevent the THIRD computation of the same function evaluation. So this 
>> is a good optimization
>> 
>> But eliminating the second computation is not so trivial. To review, deep 
>> inside the TS code there is
>> 
>> TSComputeRHSFunction(ts,t,X,Y);
>> VecAYPX(Y,-1,Xdot);
>> 
>> This code knows nothing about TSPPseudo etc.
>> 
>> Then later in the Pseudo code
>> 
>>  PetscCall(VecZeroEntries(pseudo->xdot));
>>  PetscCall(TSComputeIFunction(ts, ts->ptime, ts->vec_sol, pseudo->xdot, 
>> pseudo->func, PETSC_FALSE));
>>  PetscCall(VecNorm(pseudo->func, NORM_2, &pseudo->fnorm));
>> 
>>  again calls TSComputeRHSFunction(ts,t,X,Y); with the same X.  (since Xdot 
>> is not used directly by TSComputeRHSFunction()). 
>> 
>> Now if all the code was custom for Pseudo we could simply put in a line 
>> 
>> TSComputeRHSFunction(ts,t,X,Y);
>> VecNorm(Y,NORM_2, &pseudo->fnorm)
>> VecAYPX(Y,-1,Xdot);
>> 
>> into TSComputeIFunction() and have the norm available later for pseudo. 
>> 
>> It is not clear that there is any definitive benefit for pseudo to utilize 
>> the TS framework. It possibly gets some code reuse from TS but at the cost 
>> of extra code that won't be needed if written directly. Plus this 
>> performance hit.
>> 
>> Anyways I will make a merge request to remove the third redundant 
>> computation of FormRHSFunction but will await input from others on if 
>> anything else could be done to remove the inefficiency.
>> 
>> Thanks for reporting this challenging problem. 
>> 
>>  Barry
>> 
>> Note: I don't think providing the Jacobian improves the efficiency, it is 
>> just better at hiding the redundant computation.
>>  
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Aug 26, 2025, at 4:21 AM, Aldo Bonfiglioli <aldo.bonfigli...@unibas.it 
>>> <mailto:aldo.bonfigli...@unibas.it>> wrote:
>>> 
>>> Hi there,
>>> 
>>> I am using -ts_type pseudo to find steady solutions of the Euler/NS PDEs.
>>> 
>>> It looks like FormRHSFunction is called twice with the exactly same value 
>>> of the dependent variable (u)
>>> 
>>> before invoking TSRHSJacobianFn and then solving the linear system with KSP 
>>> (u is then updated)
>>> 
>>> 
>>>> 1 TS dt 0.141197 time 0.141197 
>>>> Calling FormRHSFunction; ||u|| =    60.489405479528003        
>>>> 463.15635675129079        62.813532026019146        5.9789502719351155     
>>>> Calling FormRHSFunction; ||u|| =    60.489405479528003        
>>>> 463.15635675129079        62.813532026019146        5.9789502719351155 
>>>> Calling TSRHSJacobianFn with ||u|| =    60.489405479528003        
>>>> 463.15635675129079        62.813532026019146        5.9789502719351155 
>>>>      Linear solve converged due to CONVERGED_RTOL iterations 9 
>>>> Calling FormRHSFunction; ||u|| =    60.489405476878737        
>>>> 463.15635675602186        62.813532031616965        5.9789502710519811 
>>>> 2 TS dt 0.155917 time 0.297115 
>>>> Calling FormRHSFunction; ||u|| =    60.489405476878737        
>>>> 463.15635675602186        62.813532031616965        5.9789502710519811 
>>>> Calling FormRHSFunction; ||u|| =    60.489405476878737        
>>>> 463.15635675602186        62.813532031616965        5.9789502710519811 
>>>> Calling TSRHSJacobianFn with ||u|| =    60.489405476878737        
>>>> 463.15635675602186        62.813532031616965        5.9789502710519811
>>>> 
>>> Why is it so? Am I possibly misusing (or missing) some options?
>>> 
>>> My .petscrc file is attached.
>>> 
>>> Thank you,
>>> 
>>> Aldo
>>> 
>>> -- 
>>> Dr. Aldo Bonfiglioli
>>> Associate professor of Fluid Mechanics
>>> Dipartimento di Ingegneria
>>> Universita' della Basilicata
>>> V.le dell'Ateneo Lucano, 10 85100 Potenza ITALY 
>>> <https://urldefense.us/v3/__https://www.google.com/maps/search/dell'Ateneo*Lucano,*10*85100*Potenza*ITALY?entry=gmail&source=g__;KysrKys!!G_uCfscf7eWS!eEJyFKOE1fIr1zG138qFEBHnEAPDDHz6-nhpA3l8TnLtt8VK3i98Cc13eRy7zJGgmWBtGQ8IB0yIDNjR7M6rNuzDtLU$
>>>  >
>>> tel:+39.0971.205203 fax:+39.0971.205215
>>> web: 
>>> https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!eEJyFKOE1fIr1zG138qFEBHnEAPDDHz6-nhpA3l8TnLtt8VK3i98Cc13eRy7zJGgmWBtGQ8IB0yIDNjR7M6rchJhC0A$
>>>   
>>> <https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!ci_aySHdF2Qx_B-37IOzELFDtcQRthJ2NItK-JNbfXB0M2_21B_z97WIfjxZUyKCVRRhd1mzL6b-rnCed_i8LLA56JvFdz7JBrw$>
>> 

Reply via email to