P.S. As a test I removed the "postcheck" callback, and I still get the same behavior with the DIVERGED_LINE_SEARCH converged reason, so I guess the "postcheck" is not related.
On Mon, Dec 22, 2025 at 1:58 PM David Knezevic <[email protected]> wrote: > The print out I get from -snes_view is shown below. I wonder if the issue > is related to "using user-defined postcheck step"? > > > SNES Object: 1 MPI process > type: newtonls > maximum iterations=5, maximum function evaluations=10000 > tolerances: relative=0., absolute=0., solution=0. > total number of linear solver iterations=3 > total number of function evaluations=4 > norm schedule ALWAYS > SNESLineSearch Object: 1 MPI process > type: basic > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > maximum iterations=40 > using user-defined postcheck step > KSP Object: 1 MPI process > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: 1 MPI process > type: cholesky > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: external > factor fill ratio given 0., needed 0. > Factored matrix follows: > Mat Object: 1 MPI process > type: mumps > rows=1152, cols=1152 > package used to perform factorization: mumps > total: nonzeros=126936, allocated nonzeros=126936 > MUMPS run parameters: > Use -ksp_view ::ascii_info_detail to display information > for all processes > RINFOG(1) (global estimated flops for the elimination > after analysis): 1.63461e+07 > RINFOG(2) (global estimated flops for the assembly after > factorization): 74826. > RINFOG(3) (global estimated flops for the elimination > after factorization): 1.63461e+07 > (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): > (0.,0.)*(2^0) > INFOG(3) (estimated real workspace for factors on all > processors after analysis): 150505 > INFOG(4) (estimated integer workspace for factors on all > processors after analysis): 6276 > INFOG(5) (estimated maximum front size in the complete > tree): 216 > INFOG(6) (number of nodes in the complete tree): 24 > INFOG(7) (ordering option effectively used after > analysis): 2 > INFOG(8) (structural symmetry in percent of the permuted > matrix after analysis): 100 > INFOG(9) (total real/complex workspace to store the matrix > factors after factorization): 150505 > INFOG(10) (total integer space store the matrix factors > after factorization): 6276 > INFOG(11) (order of largest frontal matrix after > factorization): 216 > INFOG(12) (number of off-diagonal pivots): 1044 > INFOG(13) (number of delayed pivots after factorization): 0 > INFOG(14) (number of memory compress after factorization): > 0 > INFOG(15) (number of steps of iterative refinement after > solution): 0 > INFOG(16) (estimated size (in MB) of all MUMPS internal > data for factorization after analysis: value on the most memory consuming > processor): 2 > INFOG(17) (estimated size of all MUMPS internal data for > factorization after analysis: sum over all processors): 2 > INFOG(18) (size of all MUMPS internal data allocated > during factorization: value on the most memory consuming processor): 2 > INFOG(19) (size of all MUMPS internal data allocated > during factorization: sum over all processors): 2 > INFOG(20) (estimated number of entries in the factors): > 126936 > INFOG(21) (size in MB of memory effectively used during > factorization - value on the most memory consuming processor): 2 > INFOG(22) (size in MB of memory effectively used during > factorization - sum over all processors): 2 > INFOG(23) (after analysis: value of ICNTL(6) effectively > used): 0 > INFOG(24) (after analysis: value of ICNTL(12) effectively > used): 1 > INFOG(25) (after factorization: number of pivots modified > by static pivoting): 0 > INFOG(28) (after factorization: number of null pivots > encountered): 0 > INFOG(29) (after factorization: effective number of > entries in the factors (sum over all processors)): 126936 > INFOG(30, 31) (after solution: size in Mbytes of memory > used during solution phase): 2, 2 > INFOG(32) (after analysis: type of analysis done): 1 > INFOG(33) (value used for ICNTL(8)): 7 > INFOG(34) (exponent of the determinant if determinant is > requested): 0 > INFOG(35) (after factorization: number of entries taking > into account BLR factor compression - sum over all processors): 126936 > INFOG(36) (after analysis: estimated size of all MUMPS > internal data for running BLR in-core - value on the most memory consuming > processor): 0 > INFOG(37) (after analysis: estimated size of all MUMPS > internal data for running BLR in-core - sum over all processors): 0 > INFOG(38) (after analysis: estimated size of all MUMPS > internal data for running BLR out-of-core - value on the most memory > consuming processor): 0 > INFOG(39) (after analysis: estimated size of all MUMPS > internal data for running BLR out-of-core - sum over all processors): 0 > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: seqaij > rows=1152, cols=1152 > total: nonzeros=60480, allocated nonzeros=60480 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 384 nodes, limit used is 5 > > > > On Mon, Dec 22, 2025 at 9:25 AM Barry Smith <[email protected]> wrote: > >> David, >> >> This is due to a software glitch. SNES_DIVERGED_FUNCTION_DOMAIN was >> added long after the origins of SNES and, in places, the code was never >> fully updated to handle function domain problems. In particular, parts of >> the line search don't handle it correctly. Can you run with -snes_view and >> that will help us find the spot that needs to be updated. >> >> Barry >> >> >> On Dec 21, 2025, at 5:53 PM, David Knezevic <[email protected]> >> wrote: >> >> Hi, actually, I have a follow up on this topic. >> >> I noticed that when I call SNESSetFunctionDomainError(), it exits the >> solve as expected, but it leads to a converged reason >> "DIVERGED_LINE_SEARCH" instead of "DIVERGED_FUNCTION_DOMAIN". If I also >> set SNESSetConvergedReason(snes, SNES_DIVERGED_FUNCTION_DOMAIN) in the >> callback, then I get the expected SNES_DIVERGED_FUNCTION_DOMAIN converged >> reason, so that's what I'm doing now. I was surprised by this behavior, >> though, since I expected that calling SNESSetFunctionDomainError woudld >> lead to the DIVERGED_FUNCTION_DOMAIN converged reason, so I just wanted to >> check on what could be causing this. >> >> FYI, I'm using PETSc 3.23.4 >> >> Thanks, >> David >> >> >> On Thu, Dec 18, 2025 at 8:10 AM David Knezevic < >> [email protected]> wrote: >> >>> Thank you very much for this guidance. I switched to use >>> SNES_DIVERGED_FUNCTION_DOMAIN, and I don't get any errors now. >>> >>> Thanks! >>> David >>> >>> >>> On Wed, Dec 17, 2025 at 3:43 PM Barry Smith <[email protected]> wrote: >>> >>>> >>>> >>>> On Dec 17, 2025, at 2:47 PM, David Knezevic <[email protected]> >>>> wrote: >>>> >>>> Stefano and Barry: Thank you, this is very helpful. >>>> >>>> I'll give some more info here which may help to clarify further. >>>> Normally we do just get a negative "converged reason", as you described. >>>> But in this specific case where I'm having issues the solve is a >>>> numerically sensitive creep solve, which has exponential terms in the >>>> residual and jacobian callback that can "blow up" and give NaN values. In >>>> this case, the root cause is that we hit a NaN value during a callback, and >>>> then we throw an exception (in libMesh C++ code) which I gather leads to >>>> the SNES solve exiting with this error code. >>>> >>>> Is there a way to tell the SNES to terminate with a negative "converged >>>> reason" because we've encountered some issue during the callback? >>>> >>>> >>>> In your callback you should call SNESSetFunctionDomainError() and >>>> make sure the function value has an infinity or NaN in it (you can call >>>> VecFlag() for this purpose)). >>>> >>>> Now SNESConvergedReason will be a completely >>>> reasonable SNES_DIVERGED_FUNCTION_DOMAIN >>>> >>>> Barry >>>> >>>> If you are using an ancient version of PETSc (I hope you are using the >>>> latest since that always has more bug fixes and features) that does not >>>> have SNESSetFunctionDomainError then just make sure the function vector >>>> result has an infinity or NaN in it and then SNESConvergedReason will be >>>> SNES_DIVERGED_FNORM_NAN >>>> >>>> >>>> >>>> >>>> Thanks, >>>> David >>>> >>>> >>>> On Wed, Dec 17, 2025 at 2:25 PM Barry Smith <[email protected]> wrote: >>>> >>>>> >>>>> >>>>> On Dec 17, 2025, at 2:08 PM, David Knezevic via petsc-users < >>>>> [email protected]> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I'm using PETSc via the libMesh framework, so creating a MWE is >>>>> complicated by that, unfortunately. >>>>> >>>>> The situation is that I am not modifying the solution vector in a >>>>> callback. The SNES solve has terminated, with PetscErrorCode 82, and I >>>>> then >>>>> want to update the solution vector (reset it to the "previously converged >>>>> value") and then try to solve again with a smaller load increment. This is >>>>> a typical "auto load stepping" strategy in FE. >>>>> >>>>> >>>>> Once a PetscError is generated you CANNOT continue the PETSc >>>>> program, it is not designed to allow this and trying to continue will lead >>>>> to further problems. >>>>> >>>>> So what you need to do is prevent PETSc from getting to the point >>>>> where an actual PetscErrorCode of 82 is generated. Normally SNESSolve() >>>>> returns without generating an error even if the nonlinear solver failed >>>>> (for example did not converge). One then uses SNESGetConvergedReason to >>>>> check if it converged or not. Normally when SNESSolve() returns, >>>>> regardless >>>>> of whether the converged reason is negative or positive, there will be no >>>>> locked vectors and one can modify the SNES object and call SNESSolve >>>>> again. >>>>> >>>>> So my guess is that an actual PETSc error is being generated >>>>> because SNESSetErrorIfNotConverged(snes,PETSC_TRUE) is being called by >>>>> either your code or libMesh or the option -snes_error_if_not_converged is >>>>> being used. In your case when you wish the code to work after a >>>>> non-converged SNESSolve() these options should never be set instead you >>>>> should check the result of SNESGetConvergedReason() to check if SNESSolve >>>>> has failed. If SNESSetErrorIfNotConverged() is never being set that may >>>>> indicate you are using an old version of PETSc or have it a bug inside >>>>> PETSc's SNES that does not handle errors correctly and we can help fix the >>>>> problem if you can provide a full debug output version of when the error >>>>> occurs. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> I think the key piece of info I'd like to know is, at what point is >>>>> the solution vector "unlocked" by the SNES object? Should it be unlocked >>>>> as >>>>> soon as the SNES solve has terminated with PetscErrorCode 82? Since it >>>>> seems to me that it hasn't been unlocked yet (maybe just on a subset of >>>>> the >>>>> processes). Should I manually "unlock" the solution vector by >>>>> calling VecLockWriteSet? >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> >>>>> >>>>> On Wed, Dec 17, 2025 at 2:02 PM Stefano Zampini < >>>>> [email protected]> wrote: >>>>> >>>>>> You are not allowed to call VecGetArray on the solution vector of an >>>>>> SNES object within a user callback, nor to modify its values in any other >>>>>> way. >>>>>> Put in C++ lingo, the solution vector is a "const" argument >>>>>> It would be great if you could provide an MWE to help us understand >>>>>> your problem >>>>>> >>>>>> >>>>>> Il giorno mer 17 dic 2025 alle ore 20:51 David Knezevic via >>>>>> petsc-users <[email protected]> ha scritto: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I have a question about this error: >>>>>>> >>>>>>>> Vector 'Vec_0x84000005_0' (argument #2) was locked for read-only >>>>>>>> access in unknown_function() at unknown file:0 (line numbers only >>>>>>>> accurate >>>>>>>> to function begin) >>>>>>> >>>>>>> >>>>>>> I'm encountering this error in an FE solve where there is an error >>>>>>> encountered during the residual/jacobian assembly, and what we normally >>>>>>> do >>>>>>> in that situation is shrink the load step and continue, starting from >>>>>>> the >>>>>>> "last converged solution". However, in this case I'm running on 32 >>>>>>> processes, and 5 of the processes report the error above about a "locked >>>>>>> vector". >>>>>>> >>>>>>> We clear the SNES object (via SNESDestroy) before we reset the >>>>>>> solution to the "last converged solution", and then we make a new SNES >>>>>>> object subsequently. But it seems to me that somehow the solution >>>>>>> vector is >>>>>>> still marked as "locked" on 5 of the processes when we modify the >>>>>>> solution >>>>>>> vector, which leads to the error above. >>>>>>> >>>>>>> I was wondering if someone could advise on what the best way to >>>>>>> handle this would be? I thought one option could be to add an MPI >>>>>>> barrier >>>>>>> call prior to updating the solution vector to "last converged >>>>>>> solution", to >>>>>>> make sure that the SNES object is destroyed on all procs (and hence the >>>>>>> locks cleared) before editing the solution vector, but I'm unsure if >>>>>>> that >>>>>>> would make a difference. Any help would be most appreciated! >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Stefano >>>>>> >>>>> >>>>> >>>> >>
