I have started a merge request to properly propagate failure reasons up from the line search to the SNESSolve in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8914__;!!G_uCfscf7eWS!b2nWDVWqfyc96V63w_2sLd0siVZ769Ztwal8rZgfCzJ3q3V3ALVEMdGDLu6IvbSPmudCO08cQL4r0J54oVEz12k$ Could you give it a try when you get the chance?
> On Dec 22, 2025, at 3:03 PM, David Knezevic <[email protected]> > wrote: > > P.S. As a test I removed the "postcheck" callback, and I still get the same > behavior with the DIVERGED_LINE_SEARCH converged reason, so I guess the > "postcheck" is not related. > > > On Mon, Dec 22, 2025 at 1:58 PM David Knezevic <[email protected] > <mailto:[email protected]>> wrote: >> The print out I get from -snes_view is shown below. I wonder if the issue is >> related to "using user-defined postcheck step"? >> >> >> SNES Object: 1 MPI process >> type: newtonls >> maximum iterations=5, maximum function evaluations=10000 >> tolerances: relative=0., absolute=0., solution=0. >> total number of linear solver iterations=3 >> total number of function evaluations=4 >> norm schedule ALWAYS >> SNESLineSearch Object: 1 MPI process >> type: basic >> maxstep=1.000000e+08, minlambda=1.000000e-12 >> tolerances: relative=1.000000e-08, absolute=1.000000e-15, >> lambda=1.000000e-08 >> maximum iterations=40 >> using user-defined postcheck step >> KSP Object: 1 MPI process >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: 1 MPI process >> type: cholesky >> out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> matrix ordering: external >> factor fill ratio given 0., needed 0. >> Factored matrix follows: >> Mat Object: 1 MPI process >> type: mumps >> rows=1152, cols=1152 >> package used to perform factorization: mumps >> total: nonzeros=126936, allocated nonzeros=126936 >> MUMPS run parameters: >> Use -ksp_view ::ascii_info_detail to display information for >> all processes >> RINFOG(1) (global estimated flops for the elimination after >> analysis): 1.63461e+07 >> RINFOG(2) (global estimated flops for the assembly after >> factorization): 74826. >> RINFOG(3) (global estimated flops for the elimination after >> factorization): 1.63461e+07 >> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): >> (0.,0.)*(2^0) >> INFOG(3) (estimated real workspace for factors on all >> processors after analysis): 150505 >> INFOG(4) (estimated integer workspace for factors on all >> processors after analysis): 6276 >> INFOG(5) (estimated maximum front size in the complete >> tree): 216 >> INFOG(6) (number of nodes in the complete tree): 24 >> INFOG(7) (ordering option effectively used after analysis): 2 >> INFOG(8) (structural symmetry in percent of the permuted >> matrix after analysis): 100 >> INFOG(9) (total real/complex workspace to store the matrix >> factors after factorization): 150505 >> INFOG(10) (total integer space store the matrix factors >> after factorization): 6276 >> INFOG(11) (order of largest frontal matrix after >> factorization): 216 >> INFOG(12) (number of off-diagonal pivots): 1044 >> INFOG(13) (number of delayed pivots after factorization): 0 >> INFOG(14) (number of memory compress after factorization): 0 >> INFOG(15) (number of steps of iterative refinement after >> solution): 0 >> INFOG(16) (estimated size (in MB) of all MUMPS internal data >> for factorization after analysis: value on the most memory consuming >> processor): 2 >> INFOG(17) (estimated size of all MUMPS internal data for >> factorization after analysis: sum over all processors): 2 >> INFOG(18) (size of all MUMPS internal data allocated during >> factorization: value on the most memory consuming processor): 2 >> INFOG(19) (size of all MUMPS internal data allocated during >> factorization: sum over all processors): 2 >> INFOG(20) (estimated number of entries in the factors): >> 126936 >> INFOG(21) (size in MB of memory effectively used during >> factorization - value on the most memory consuming processor): 2 >> INFOG(22) (size in MB of memory effectively used during >> factorization - sum over all processors): 2 >> INFOG(23) (after analysis: value of ICNTL(6) effectively >> used): 0 >> INFOG(24) (after analysis: value of ICNTL(12) effectively >> used): 1 >> INFOG(25) (after factorization: number of pivots modified by >> static pivoting): 0 >> INFOG(28) (after factorization: number of null pivots >> encountered): 0 >> INFOG(29) (after factorization: effective number of entries >> in the factors (sum over all processors)): 126936 >> INFOG(30, 31) (after solution: size in Mbytes of memory used >> during solution phase): 2, 2 >> INFOG(32) (after analysis: type of analysis done): 1 >> INFOG(33) (value used for ICNTL(8)): 7 >> INFOG(34) (exponent of the determinant if determinant is >> requested): 0 >> INFOG(35) (after factorization: number of entries taking >> into account BLR factor compression - sum over all processors): 126936 >> INFOG(36) (after analysis: estimated size of all MUMPS >> internal data for running BLR in-core - value on the most memory consuming >> processor): 0 >> INFOG(37) (after analysis: estimated size of all MUMPS >> internal data for running BLR in-core - sum over all processors): 0 >> INFOG(38) (after analysis: estimated size of all MUMPS >> internal data for running BLR out-of-core - value on the most memory >> consuming processor): 0 >> INFOG(39) (after analysis: estimated size of all MUMPS >> internal data for running BLR out-of-core - sum over all processors): 0 >> linear system matrix = precond matrix: >> Mat Object: 1 MPI process >> type: seqaij >> rows=1152, cols=1152 >> total: nonzeros=60480, allocated nonzeros=60480 >> total number of mallocs used during MatSetValues calls=0 >> using I-node routines: found 384 nodes, limit used is 5 >> >> >> >> On Mon, Dec 22, 2025 at 9:25 AM Barry Smith <[email protected] >> <mailto:[email protected]>> wrote: >>> David, >>> >>> This is due to a software glitch. SNES_DIVERGED_FUNCTION_DOMAIN was >>> added long after the origins of SNES and, in places, the code was never >>> fully updated to handle function domain problems. In particular, parts of >>> the line search don't handle it correctly. Can you run with -snes_view and >>> that will help us find the spot that needs to be updated. >>> >>> Barry >>> >>> >>>> On Dec 21, 2025, at 5:53 PM, David Knezevic <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Hi, actually, I have a follow up on this topic. >>>> >>>> I noticed that when I call SNESSetFunctionDomainError(), it exits the >>>> solve as expected, but it leads to a converged reason >>>> "DIVERGED_LINE_SEARCH" instead of "DIVERGED_FUNCTION_DOMAIN". If I also >>>> set SNESSetConvergedReason(snes, SNES_DIVERGED_FUNCTION_DOMAIN) in the >>>> callback, then I get the expected SNES_DIVERGED_FUNCTION_DOMAIN converged >>>> reason, so that's what I'm doing now. I was surprised by this behavior, >>>> though, since I expected that calling SNESSetFunctionDomainError woudld >>>> lead to the DIVERGED_FUNCTION_DOMAIN converged reason, so I just wanted to >>>> check on what could be causing this. >>>> >>>> FYI, I'm using PETSc 3.23.4 >>>> >>>> Thanks, >>>> David >>>> >>>> >>>> On Thu, Dec 18, 2025 at 8:10 AM David Knezevic <[email protected] >>>> <mailto:[email protected]>> wrote: >>>>> Thank you very much for this guidance. I switched to use >>>>> SNES_DIVERGED_FUNCTION_DOMAIN, and I don't get any errors now. >>>>> >>>>> Thanks! >>>>> David >>>>> >>>>> >>>>> On Wed, Dec 17, 2025 at 3:43 PM Barry Smith <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> >>>>>>> On Dec 17, 2025, at 2:47 PM, David Knezevic <[email protected] >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> Stefano and Barry: Thank you, this is very helpful. >>>>>>> >>>>>>> I'll give some more info here which may help to clarify further. >>>>>>> Normally we do just get a negative "converged reason", as you >>>>>>> described. But in this specific case where I'm having issues the solve >>>>>>> is a numerically sensitive creep solve, which has exponential terms in >>>>>>> the residual and jacobian callback that can "blow up" and give NaN >>>>>>> values. In this case, the root cause is that we hit a NaN value during >>>>>>> a callback, and then we throw an exception (in libMesh C++ code) which >>>>>>> I gather leads to the SNES solve exiting with this error code. >>>>>>> >>>>>>> Is there a way to tell the SNES to terminate with a negative "converged >>>>>>> reason" because we've encountered some issue during the callback? >>>>>> >>>>>> In your callback you should call SNESSetFunctionDomainError() and >>>>>> make sure the function value has an infinity or NaN in it (you can call >>>>>> VecFlag() for this purpose)). >>>>>> >>>>>> Now SNESConvergedReason will be a completely reasonable >>>>>> SNES_DIVERGED_FUNCTION_DOMAIN >>>>>> >>>>>> Barry >>>>>> >>>>>> If you are using an ancient version of PETSc (I hope you are using the >>>>>> latest since that always has more bug fixes and features) that does not >>>>>> have SNESSetFunctionDomainError then just make sure the function vector >>>>>> result has an infinity or NaN in it and then SNESConvergedReason will be >>>>>> SNES_DIVERGED_FNORM_NAN >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> >>>>>>> On Wed, Dec 17, 2025 at 2:25 PM Barry Smith <[email protected] >>>>>>> <mailto:[email protected]>> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On Dec 17, 2025, at 2:08 PM, David Knezevic via petsc-users >>>>>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'm using PETSc via the libMesh framework, so creating a MWE is >>>>>>>>> complicated by that, unfortunately. >>>>>>>>> >>>>>>>>> The situation is that I am not modifying the solution vector in a >>>>>>>>> callback. The SNES solve has terminated, with PetscErrorCode 82, and >>>>>>>>> I then want to update the solution vector (reset it to the >>>>>>>>> "previously converged value") and then try to solve again with a >>>>>>>>> smaller load increment. This is a typical "auto load stepping" >>>>>>>>> strategy in FE. >>>>>>>> >>>>>>>> Once a PetscError is generated you CANNOT continue the PETSc >>>>>>>> program, it is not designed to allow this and trying to continue will >>>>>>>> lead to further problems. >>>>>>>> >>>>>>>> So what you need to do is prevent PETSc from getting to the point >>>>>>>> where an actual PetscErrorCode of 82 is generated. Normally >>>>>>>> SNESSolve() returns without generating an error even if the nonlinear >>>>>>>> solver failed (for example did not converge). One then uses >>>>>>>> SNESGetConvergedReason to check if it converged or not. Normally when >>>>>>>> SNESSolve() returns, regardless of whether the converged reason is >>>>>>>> negative or positive, there will be no locked vectors and one can >>>>>>>> modify the SNES object and call SNESSolve again. >>>>>>>> >>>>>>>> So my guess is that an actual PETSc error is being generated because >>>>>>>> SNESSetErrorIfNotConverged(snes,PETSC_TRUE) is being called by either >>>>>>>> your code or libMesh or the option -snes_error_if_not_converged is >>>>>>>> being used. In your case when you wish the code to work after a >>>>>>>> non-converged SNESSolve() these options should never be set instead >>>>>>>> you should check the result of SNESGetConvergedReason() to check if >>>>>>>> SNESSolve has failed. If SNESSetErrorIfNotConverged() is never being >>>>>>>> set that may indicate you are using an old version of PETSc or have it >>>>>>>> a bug inside PETSc's SNES that does not handle errors correctly and we >>>>>>>> can help fix the problem if you can provide a full debug output >>>>>>>> version of when the error occurs. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> I think the key piece of info I'd like to know is, at what point is >>>>>>>>> the solution vector "unlocked" by the SNES object? Should it be >>>>>>>>> unlocked as soon as the SNES solve has terminated with PetscErrorCode >>>>>>>>> 82? Since it seems to me that it hasn't been unlocked yet (maybe just >>>>>>>>> on a subset of the processes). Should I manually "unlock" the >>>>>>>>> solution vector by calling VecLockWriteSet? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Dec 17, 2025 at 2:02 PM Stefano Zampini >>>>>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>>>>> You are not allowed to call VecGetArray on the solution vector of an >>>>>>>>>> SNES object within a user callback, nor to modify its values in any >>>>>>>>>> other way. >>>>>>>>>> Put in C++ lingo, the solution vector is a "const" argument >>>>>>>>>> It would be great if you could provide an MWE to help us understand >>>>>>>>>> your problem >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Il giorno mer 17 dic 2025 alle ore 20:51 David Knezevic via >>>>>>>>>> petsc-users <[email protected] >>>>>>>>>> <mailto:[email protected]>> ha scritto: >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I have a question about this error: >>>>>>>>>>>> Vector 'Vec_0x84000005_0' (argument #2) was locked for read-only >>>>>>>>>>>> access in unknown_function() at unknown file:0 (line numbers only >>>>>>>>>>>> accurate to function begin) >>>>>>>>>>> >>>>>>>>>>> I'm encountering this error in an FE solve where there is an error >>>>>>>>>>> encountered during the residual/jacobian assembly, and what we >>>>>>>>>>> normally do in that situation is shrink the load step and continue, >>>>>>>>>>> starting from the "last converged solution". However, in this case >>>>>>>>>>> I'm running on 32 processes, and 5 of the processes report the >>>>>>>>>>> error above about a "locked vector". >>>>>>>>>>> >>>>>>>>>>> We clear the SNES object (via SNESDestroy) before we reset the >>>>>>>>>>> solution to the "last converged solution", and then we make a new >>>>>>>>>>> SNES object subsequently. But it seems to me that somehow the >>>>>>>>>>> solution vector is still marked as "locked" on 5 of the processes >>>>>>>>>>> when we modify the solution vector, which leads to the error above. >>>>>>>>>>> >>>>>>>>>>> I was wondering if someone could advise on what the best way to >>>>>>>>>>> handle this would be? I thought one option could be to add an MPI >>>>>>>>>>> barrier call prior to updating the solution vector to "last >>>>>>>>>>> converged solution", to make sure that the SNES object is destroyed >>>>>>>>>>> on all procs (and hence the locks cleared) before editing the >>>>>>>>>>> solution vector, but I'm unsure if that would make a difference. >>>>>>>>>>> Any help would be most appreciated! >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> David >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Stefano >>>>>>>> >>>>>> >>>
