Re: [petsc-users] Question regarding SNES error about locked vectors

David Knezevic via petsc-users Mon, 22 Dec 2025 12:03:58 -0800

P.S. As a test I removed the "postcheck" callback, and I still get the same
behavior with the DIVERGED_LINE_SEARCH converged reason, so I guess the
"postcheck" is not related.



On Mon, Dec 22, 2025 at 1:58 PM David Knezevic <[email protected]>
wrote:

> The print out I get from -snes_view is shown below. I wonder if the issue
> is related to "using user-defined postcheck step"?
>
>
> SNES Object: 1 MPI process
>   type: newtonls
>   maximum iterations=5, maximum function evaluations=10000
>   tolerances: relative=0., absolute=0., solution=0.
>   total number of linear solver iterations=3
>   total number of function evaluations=4
>   norm schedule ALWAYS
>   SNESLineSearch Object: 1 MPI process
>     type: basic
>     maxstep=1.000000e+08, minlambda=1.000000e-12
>     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
> lambda=1.000000e-08
>     maximum iterations=40
>     using user-defined postcheck step
>   KSP Object: 1 MPI process
>     type: preonly
>     maximum iterations=10000, initial guess is zero
>     tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using NONE norm type for convergence test
>   PC Object: 1 MPI process
>     type: cholesky
>       out-of-place factorization
>       tolerance for zero pivot 2.22045e-14
>       matrix ordering: external
>       factor fill ratio given 0., needed 0.
>         Factored matrix follows:
>           Mat Object: 1 MPI process
>             type: mumps
>             rows=1152, cols=1152
>             package used to perform factorization: mumps
>             total: nonzeros=126936, allocated nonzeros=126936
>               MUMPS run parameters:
>                 Use -ksp_view ::ascii_info_detail to display information
> for all processes
>                 RINFOG(1) (global estimated flops for the elimination
> after analysis): 1.63461e+07
>                 RINFOG(2) (global estimated flops for the assembly after
> factorization): 74826.
>                 RINFOG(3) (global estimated flops for the elimination
> after factorization): 1.63461e+07
>                 (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
> (0.,0.)*(2^0)
>                 INFOG(3) (estimated real workspace for factors on all
> processors after analysis): 150505
>                 INFOG(4) (estimated integer workspace for factors on all
> processors after analysis): 6276
>                 INFOG(5) (estimated maximum front size in the complete
> tree): 216
>                 INFOG(6) (number of nodes in the complete tree): 24
>                 INFOG(7) (ordering option effectively used after
> analysis): 2
>                 INFOG(8) (structural symmetry in percent of the permuted
> matrix after analysis): 100
>                 INFOG(9) (total real/complex workspace to store the matrix
> factors after factorization): 150505
>                 INFOG(10) (total integer space store the matrix factors
> after factorization): 6276
>                 INFOG(11) (order of largest frontal matrix after
> factorization): 216
>                 INFOG(12) (number of off-diagonal pivots): 1044
>                 INFOG(13) (number of delayed pivots after factorization): 0
>                 INFOG(14) (number of memory compress after factorization):
> 0
>                 INFOG(15) (number of steps of iterative refinement after
> solution): 0
>                 INFOG(16) (estimated size (in MB) of all MUMPS internal
> data for factorization after analysis: value on the most memory consuming
> processor): 2
>                 INFOG(17) (estimated size of all MUMPS internal data for
> factorization after analysis: sum over all processors): 2
>                 INFOG(18) (size of all MUMPS internal data allocated
> during factorization: value on the most memory consuming processor): 2
>                 INFOG(19) (size of all MUMPS internal data allocated
> during factorization: sum over all processors): 2
>                 INFOG(20) (estimated number of entries in the factors):
> 126936
>                 INFOG(21) (size in MB of memory effectively used during
> factorization - value on the most memory consuming processor): 2
>                 INFOG(22) (size in MB of memory effectively used during
> factorization - sum over all processors): 2
>                 INFOG(23) (after analysis: value of ICNTL(6) effectively
> used): 0
>                 INFOG(24) (after analysis: value of ICNTL(12) effectively
> used): 1
>                 INFOG(25) (after factorization: number of pivots modified
> by static pivoting): 0
>                 INFOG(28) (after factorization: number of null pivots
> encountered): 0
>                 INFOG(29) (after factorization: effective number of
> entries in the factors (sum over all processors)): 126936
>                 INFOG(30, 31) (after solution: size in Mbytes of memory
> used during solution phase): 2, 2
>                 INFOG(32) (after analysis: type of analysis done): 1
>                 INFOG(33) (value used for ICNTL(8)): 7
>                 INFOG(34) (exponent of the determinant if determinant is
> requested): 0
>                 INFOG(35) (after factorization: number of entries taking
> into account BLR factor compression - sum over all processors): 126936
>                 INFOG(36) (after analysis: estimated size of all MUMPS
> internal data for running BLR in-core - value on the most memory consuming
> processor): 0
>                 INFOG(37) (after analysis: estimated size of all MUMPS
> internal data for running BLR in-core - sum over all processors): 0
>                 INFOG(38) (after analysis: estimated size of all MUMPS
> internal data for running BLR out-of-core - value on the most memory
> consuming processor): 0
>                 INFOG(39) (after analysis: estimated size of all MUMPS
> internal data for running BLR out-of-core - sum over all processors): 0
>     linear system matrix = precond matrix:
>     Mat Object: 1 MPI process
>       type: seqaij
>       rows=1152, cols=1152
>       total: nonzeros=60480, allocated nonzeros=60480
>       total number of mallocs used during MatSetValues calls=0
>         using I-node routines: found 384 nodes, limit used is 5
>
>
>
> On Mon, Dec 22, 2025 at 9:25 AM Barry Smith <[email protected]> wrote:
>
>>   David,
>>
>>     This is due to a software glitch. SNES_DIVERGED_FUNCTION_DOMAIN was
>> added long after the origins of SNES and, in places, the code was never
>> fully updated to handle function domain problems. In particular, parts of
>> the line search don't handle it correctly. Can you run with -snes_view and
>> that will help us find the spot that needs to be updated.
>>
>>    Barry
>>
>>
>> On Dec 21, 2025, at 5:53 PM, David Knezevic <[email protected]>
>> wrote:
>>
>> Hi, actually, I have a follow up on this topic.
>>
>> I noticed that when I call SNESSetFunctionDomainError(), it exits the
>> solve as expected, but it leads to a converged reason
>> "DIVERGED_LINE_SEARCH" instead of "DIVERGED_FUNCTION_DOMAIN". If I also
>> set SNESSetConvergedReason(snes, SNES_DIVERGED_FUNCTION_DOMAIN) in the
>> callback, then I get the expected SNES_DIVERGED_FUNCTION_DOMAIN converged
>> reason, so that's what I'm doing now. I was surprised by this behavior,
>> though, since I expected that calling SNESSetFunctionDomainError woudld
>> lead to the DIVERGED_FUNCTION_DOMAIN converged reason, so I just wanted to
>> check on what could be causing this.
>>
>> FYI, I'm using PETSc 3.23.4
>>
>> Thanks,
>> David
>>
>>
>> On Thu, Dec 18, 2025 at 8:10 AM David Knezevic <
>> [email protected]> wrote:
>>
>>> Thank you very much for this guidance. I switched to use
>>> SNES_DIVERGED_FUNCTION_DOMAIN, and I don't get any errors now.
>>>
>>> Thanks!
>>> David
>>>
>>>
>>> On Wed, Dec 17, 2025 at 3:43 PM Barry Smith <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Dec 17, 2025, at 2:47 PM, David Knezevic <[email protected]>
>>>> wrote:
>>>>
>>>> Stefano and Barry: Thank you, this is very helpful.
>>>>
>>>> I'll give some more info here which may help to clarify further.
>>>> Normally we do just get a negative "converged reason", as you described.
>>>> But in this specific case where I'm having issues the solve is a
>>>> numerically sensitive creep solve, which has exponential terms in the
>>>> residual and jacobian callback that can "blow up" and give NaN values. In
>>>> this case, the root cause is that we hit a NaN value during a callback, and
>>>> then we throw an exception (in libMesh C++ code) which I gather leads to
>>>> the SNES solve exiting with this error code.
>>>>
>>>> Is there a way to tell the SNES to terminate with a negative "converged
>>>> reason" because we've encountered some issue during the callback?
>>>>
>>>>
>>>>    In your callback you should call SNESSetFunctionDomainError() and
>>>> make sure the function value has an infinity or NaN in it (you can call
>>>> VecFlag() for this purpose)).
>>>>
>>>>    Now SNESConvergedReason will be a completely
>>>> reasonable SNES_DIVERGED_FUNCTION_DOMAIN
>>>>
>>>>   Barry
>>>>
>>>> If you are using an ancient version of PETSc (I hope you are using the
>>>> latest since that always has more bug fixes and features) that does not
>>>> have SNESSetFunctionDomainError then just make sure the function vector
>>>> result has an infinity or NaN in it and then SNESConvergedReason will be
>>>> SNES_DIVERGED_FNORM_NAN
>>>>
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>
>>>> On Wed, Dec 17, 2025 at 2:25 PM Barry Smith <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Dec 17, 2025, at 2:08 PM, David Knezevic via petsc-users <
>>>>> [email protected]> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm using PETSc via the libMesh framework, so creating a MWE is
>>>>> complicated by that, unfortunately.
>>>>>
>>>>> The situation is that I am not modifying the solution vector in a
>>>>> callback. The SNES solve has terminated, with PetscErrorCode 82, and I 
>>>>> then
>>>>> want to update the solution vector (reset it to the "previously converged
>>>>> value") and then try to solve again with a smaller load increment. This is
>>>>> a typical "auto load stepping" strategy in FE.
>>>>>
>>>>>
>>>>>    Once a PetscError is generated you CANNOT continue the PETSc
>>>>> program, it is not designed to allow this and trying to continue will lead
>>>>> to further problems.
>>>>>
>>>>>    So what you need to do is prevent PETSc from getting to the point
>>>>> where an actual PetscErrorCode of 82 is generated.  Normally SNESSolve()
>>>>> returns without generating an error even if the nonlinear solver failed
>>>>> (for example did not converge). One then uses SNESGetConvergedReason to
>>>>> check if it converged or not. Normally when SNESSolve() returns, 
>>>>> regardless
>>>>> of whether the converged reason is negative or positive, there will be no
>>>>> locked vectors and one can modify the SNES object and call SNESSolve 
>>>>> again.
>>>>>
>>>>>   So my guess is that an actual PETSc error is being generated
>>>>> because SNESSetErrorIfNotConverged(snes,PETSC_TRUE) is being called by
>>>>> either your code or libMesh or the option -snes_error_if_not_converged is
>>>>> being used. In your case when you wish the code to work after a
>>>>> non-converged SNESSolve() these options should never be set instead you
>>>>> should check the result of SNESGetConvergedReason() to check if SNESSolve
>>>>> has failed. If SNESSetErrorIfNotConverged() is never being set that may
>>>>> indicate you are using an old version of PETSc or have it a bug inside
>>>>> PETSc's SNES that does not handle errors correctly and we can help fix the
>>>>> problem if you can provide a full debug output version of when the error
>>>>> occurs.
>>>>>
>>>>>   Barry
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I think the key piece of info I'd like to know is, at what point is
>>>>> the solution vector "unlocked" by the SNES object? Should it be unlocked 
>>>>> as
>>>>> soon as the SNES solve has terminated with PetscErrorCode 82? Since it
>>>>> seems to me that it hasn't been unlocked yet (maybe just on a subset of 
>>>>> the
>>>>> processes). Should I manually "unlock" the solution vector by
>>>>> calling VecLockWriteSet?
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Dec 17, 2025 at 2:02 PM Stefano Zampini <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> You are not allowed to call VecGetArray on the solution vector of an
>>>>>> SNES object within a user callback, nor to modify its values in any other
>>>>>> way.
>>>>>> Put in C++ lingo, the solution vector is a "const" argument
>>>>>> It would be great if you could provide an MWE to help us understand
>>>>>> your problem
>>>>>>
>>>>>>
>>>>>> Il giorno mer 17 dic 2025 alle ore 20:51 David Knezevic via
>>>>>> petsc-users <[email protected]> ha scritto:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a question about this error:
>>>>>>>
>>>>>>>> Vector 'Vec_0x84000005_0' (argument #2) was locked for read-only
>>>>>>>> access in unknown_function() at unknown file:0 (line numbers only 
>>>>>>>> accurate
>>>>>>>> to function begin)
>>>>>>>
>>>>>>>
>>>>>>> I'm encountering this error in an FE solve where there is an error
>>>>>>> encountered during the residual/jacobian assembly, and what we normally 
>>>>>>> do
>>>>>>> in that situation is shrink the load step and continue, starting from 
>>>>>>> the
>>>>>>> "last converged solution". However, in this case I'm running on 32
>>>>>>> processes, and 5 of the processes report the error above about a "locked
>>>>>>> vector".
>>>>>>>
>>>>>>> We clear the SNES object (via SNESDestroy) before we reset the
>>>>>>> solution to the "last converged solution", and then we make a new SNES
>>>>>>> object subsequently. But it seems to me that somehow the solution 
>>>>>>> vector is
>>>>>>> still marked as "locked" on 5 of the processes when we modify the 
>>>>>>> solution
>>>>>>> vector, which leads to the error above.
>>>>>>>
>>>>>>> I was wondering if someone could advise on what the best way to
>>>>>>> handle this would be? I thought one option could be to add an MPI 
>>>>>>> barrier
>>>>>>> call prior to updating the solution vector to "last converged 
>>>>>>> solution", to
>>>>>>> make sure that the SNES object is destroyed on all procs (and hence the
>>>>>>> locks cleared) before editing the solution vector, but I'm unsure if 
>>>>>>> that
>>>>>>> would make a difference. Any  help would be most appreciated!
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Stefano
>>>>>>
>>>>>
>>>>>
>>>>
>>

Re: [petsc-users] Question regarding SNES error about locked vectors

Reply via email to