Re: [petsc-users] Question regarding SNES error about locked vectors

Barry Smith Wed, 24 Dec 2025 20:03:01 -0800

   I have started a merge request to properly propagate failure reasons up from 
the line search to the SNESSolve in 
https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8914__;!!G_uCfscf7eWS!b2nWDVWqfyc96V63w_2sLd0siVZ769Ztwal8rZgfCzJ3q3V3ALVEMdGDLu6IvbSPmudCO08cQL4r0J54oVEz12k$
  Could you give it a try when you get the chance?



> On Dec 22, 2025, at 3:03 PM, David Knezevic <[email protected]> 
> wrote:
> 
> P.S. As a test I removed the "postcheck" callback, and I still get the same 
> behavior with the DIVERGED_LINE_SEARCH converged reason, so I guess the 
> "postcheck" is not related.
> 
> 
> On Mon, Dec 22, 2025 at 1:58 PM David Knezevic <[email protected] 
> <mailto:[email protected]>> wrote:
>> The print out I get from -snes_view is shown below. I wonder if the issue is 
>> related to "using user-defined postcheck step"?
>> 
>> 
>> SNES Object: 1 MPI process
>>   type: newtonls
>>   maximum iterations=5, maximum function evaluations=10000
>>   tolerances: relative=0., absolute=0., solution=0.
>>   total number of linear solver iterations=3
>>   total number of function evaluations=4
>>   norm schedule ALWAYS
>>   SNESLineSearch Object: 1 MPI process
>>     type: basic
>>     maxstep=1.000000e+08, minlambda=1.000000e-12
>>     tolerances: relative=1.000000e-08, absolute=1.000000e-15, 
>> lambda=1.000000e-08
>>     maximum iterations=40
>>     using user-defined postcheck step
>>   KSP Object: 1 MPI process
>>     type: preonly
>>     maximum iterations=10000, initial guess is zero
>>     tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>>     left preconditioning
>>     using NONE norm type for convergence test
>>   PC Object: 1 MPI process
>>     type: cholesky
>>       out-of-place factorization
>>       tolerance for zero pivot 2.22045e-14
>>       matrix ordering: external
>>       factor fill ratio given 0., needed 0.
>>         Factored matrix follows:
>>           Mat Object: 1 MPI process
>>             type: mumps
>>             rows=1152, cols=1152
>>             package used to perform factorization: mumps
>>             total: nonzeros=126936, allocated nonzeros=126936
>>               MUMPS run parameters:
>>                 Use -ksp_view ::ascii_info_detail to display information for 
>> all processes
>>                 RINFOG(1) (global estimated flops for the elimination after 
>> analysis): 1.63461e+07
>>                 RINFOG(2) (global estimated flops for the assembly after 
>> factorization): 74826.
>>                 RINFOG(3) (global estimated flops for the elimination after 
>> factorization): 1.63461e+07
>>                 (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): 
>> (0.,0.)*(2^0)
>>                 INFOG(3) (estimated real workspace for factors on all 
>> processors after analysis): 150505
>>                 INFOG(4) (estimated integer workspace for factors on all 
>> processors after analysis): 6276
>>                 INFOG(5) (estimated maximum front size in the complete 
>> tree): 216
>>                 INFOG(6) (number of nodes in the complete tree): 24
>>                 INFOG(7) (ordering option effectively used after analysis): 2
>>                 INFOG(8) (structural symmetry in percent of the permuted 
>> matrix after analysis): 100
>>                 INFOG(9) (total real/complex workspace to store the matrix 
>> factors after factorization): 150505
>>                 INFOG(10) (total integer space store the matrix factors 
>> after factorization): 6276
>>                 INFOG(11) (order of largest frontal matrix after 
>> factorization): 216
>>                 INFOG(12) (number of off-diagonal pivots): 1044
>>                 INFOG(13) (number of delayed pivots after factorization): 0
>>                 INFOG(14) (number of memory compress after factorization): 0
>>                 INFOG(15) (number of steps of iterative refinement after 
>> solution): 0
>>                 INFOG(16) (estimated size (in MB) of all MUMPS internal data 
>> for factorization after analysis: value on the most memory consuming 
>> processor): 2
>>                 INFOG(17) (estimated size of all MUMPS internal data for 
>> factorization after analysis: sum over all processors): 2
>>                 INFOG(18) (size of all MUMPS internal data allocated during 
>> factorization: value on the most memory consuming processor): 2
>>                 INFOG(19) (size of all MUMPS internal data allocated during 
>> factorization: sum over all processors): 2
>>                 INFOG(20) (estimated number of entries in the factors): 
>> 126936
>>                 INFOG(21) (size in MB of memory effectively used during 
>> factorization - value on the most memory consuming processor): 2
>>                 INFOG(22) (size in MB of memory effectively used during 
>> factorization - sum over all processors): 2
>>                 INFOG(23) (after analysis: value of ICNTL(6) effectively 
>> used): 0
>>                 INFOG(24) (after analysis: value of ICNTL(12) effectively 
>> used): 1
>>                 INFOG(25) (after factorization: number of pivots modified by 
>> static pivoting): 0
>>                 INFOG(28) (after factorization: number of null pivots 
>> encountered): 0
>>                 INFOG(29) (after factorization: effective number of entries 
>> in the factors (sum over all processors)): 126936
>>                 INFOG(30, 31) (after solution: size in Mbytes of memory used 
>> during solution phase): 2, 2
>>                 INFOG(32) (after analysis: type of analysis done): 1
>>                 INFOG(33) (value used for ICNTL(8)): 7
>>                 INFOG(34) (exponent of the determinant if determinant is 
>> requested): 0
>>                 INFOG(35) (after factorization: number of entries taking 
>> into account BLR factor compression - sum over all processors): 126936
>>                 INFOG(36) (after analysis: estimated size of all MUMPS 
>> internal data for running BLR in-core - value on the most memory consuming 
>> processor): 0
>>                 INFOG(37) (after analysis: estimated size of all MUMPS 
>> internal data for running BLR in-core - sum over all processors): 0
>>                 INFOG(38) (after analysis: estimated size of all MUMPS 
>> internal data for running BLR out-of-core - value on the most memory 
>> consuming processor): 0
>>                 INFOG(39) (after analysis: estimated size of all MUMPS 
>> internal data for running BLR out-of-core - sum over all processors): 0
>>     linear system matrix = precond matrix:
>>     Mat Object: 1 MPI process
>>       type: seqaij
>>       rows=1152, cols=1152
>>       total: nonzeros=60480, allocated nonzeros=60480
>>       total number of mallocs used during MatSetValues calls=0
>>         using I-node routines: found 384 nodes, limit used is 5
>> 
>> 
>> 
>> On Mon, Dec 22, 2025 at 9:25 AM Barry Smith <[email protected] 
>> <mailto:[email protected]>> wrote:
>>>   David,
>>> 
>>>     This is due to a software glitch. SNES_DIVERGED_FUNCTION_DOMAIN was 
>>> added long after the origins of SNES and, in places, the code was never 
>>> fully updated to handle function domain problems. In particular, parts of 
>>> the line search don't handle it correctly. Can you run with -snes_view and 
>>> that will help us find the spot that needs to be updated. 
>>> 
>>>    Barry
>>> 
>>> 
>>>> On Dec 21, 2025, at 5:53 PM, David Knezevic <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Hi, actually, I have a follow up on this topic.
>>>> 
>>>> I noticed that when I call SNESSetFunctionDomainError(), it exits the 
>>>> solve as expected, but it leads to a converged reason 
>>>> "DIVERGED_LINE_SEARCH" instead of "DIVERGED_FUNCTION_DOMAIN". If I also 
>>>> set SNESSetConvergedReason(snes, SNES_DIVERGED_FUNCTION_DOMAIN) in the 
>>>> callback, then I get the expected SNES_DIVERGED_FUNCTION_DOMAIN converged 
>>>> reason, so that's what I'm doing now. I was surprised by this behavior, 
>>>> though, since I expected that calling SNESSetFunctionDomainError woudld 
>>>> lead to the DIVERGED_FUNCTION_DOMAIN converged reason, so I just wanted to 
>>>> check on what could be causing this.
>>>> 
>>>> FYI, I'm using PETSc 3.23.4
>>>> 
>>>> Thanks,
>>>> David
>>>> 
>>>> 
>>>> On Thu, Dec 18, 2025 at 8:10 AM David Knezevic <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>>> Thank you very much for this guidance. I switched to use 
>>>>> SNES_DIVERGED_FUNCTION_DOMAIN, and I don't get any errors now.
>>>>> 
>>>>> Thanks!
>>>>> David
>>>>> 
>>>>> 
>>>>> On Wed, Dec 17, 2025 at 3:43 PM Barry Smith <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Dec 17, 2025, at 2:47 PM, David Knezevic <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> 
>>>>>>> Stefano and Barry: Thank you, this is very helpful.
>>>>>>> 
>>>>>>> I'll give some more info here which may help to clarify further. 
>>>>>>> Normally we do just get a negative "converged reason", as you 
>>>>>>> described. But in this specific case where I'm having issues the solve 
>>>>>>> is a numerically sensitive creep solve, which has exponential terms in 
>>>>>>> the residual and jacobian callback that can "blow up" and give NaN 
>>>>>>> values. In this case, the root cause is that we hit a NaN value during 
>>>>>>> a callback, and then we throw an exception (in libMesh C++ code) which 
>>>>>>> I gather leads to the SNES solve exiting with this error code.
>>>>>>> 
>>>>>>> Is there a way to tell the SNES to terminate with a negative "converged 
>>>>>>> reason" because we've encountered some issue during the callback?
>>>>>> 
>>>>>>    In your callback you should call SNESSetFunctionDomainError() and 
>>>>>> make sure the function value has an infinity or NaN in it (you can call 
>>>>>> VecFlag() for this purpose)). 
>>>>>> 
>>>>>>    Now SNESConvergedReason will be a completely reasonable 
>>>>>> SNES_DIVERGED_FUNCTION_DOMAIN
>>>>>> 
>>>>>>   Barry
>>>>>> 
>>>>>> If you are using an ancient version of PETSc (I hope you are using the 
>>>>>> latest since that always has more bug fixes and features) that does not 
>>>>>> have SNESSetFunctionDomainError then just make sure the function vector 
>>>>>> result has an infinity or NaN in it and then SNESConvergedReason will be 
>>>>>> SNES_DIVERGED_FNORM_NAN
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Dec 17, 2025 at 2:25 PM Barry Smith <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Dec 17, 2025, at 2:08 PM, David Knezevic via petsc-users 
>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I'm using PETSc via the libMesh framework, so creating a MWE is 
>>>>>>>>> complicated by that, unfortunately.
>>>>>>>>> 
>>>>>>>>> The situation is that I am not modifying the solution vector in a 
>>>>>>>>> callback. The SNES solve has terminated, with PetscErrorCode 82, and 
>>>>>>>>> I then want to update the solution vector (reset it to the 
>>>>>>>>> "previously converged value") and then try to solve again with a 
>>>>>>>>> smaller load increment. This is a typical "auto load stepping" 
>>>>>>>>> strategy in FE.
>>>>>>>> 
>>>>>>>>    Once a PetscError is generated you CANNOT continue the PETSc 
>>>>>>>> program, it is not designed to allow this and trying to continue will 
>>>>>>>> lead to further problems. 
>>>>>>>> 
>>>>>>>>    So what you need to do is prevent PETSc from getting to the point 
>>>>>>>> where an actual PetscErrorCode of 82 is generated.  Normally 
>>>>>>>> SNESSolve() returns without generating an error even if the nonlinear 
>>>>>>>> solver failed (for example did not converge). One then uses 
>>>>>>>> SNESGetConvergedReason to check if it converged or not. Normally when 
>>>>>>>> SNESSolve() returns, regardless of whether the converged reason is 
>>>>>>>> negative or positive, there will be no locked vectors and one can 
>>>>>>>> modify the SNES object and call SNESSolve again. 
>>>>>>>> 
>>>>>>>>   So my guess is that an actual PETSc error is being generated because 
>>>>>>>> SNESSetErrorIfNotConverged(snes,PETSC_TRUE) is being called by either 
>>>>>>>> your code or libMesh or the option -snes_error_if_not_converged is 
>>>>>>>> being used. In your case when you wish the code to work after a 
>>>>>>>> non-converged SNESSolve() these options should never be set instead 
>>>>>>>> you should check the result of SNESGetConvergedReason() to check if 
>>>>>>>> SNESSolve has failed. If SNESSetErrorIfNotConverged() is never being 
>>>>>>>> set that may indicate you are using an old version of PETSc or have it 
>>>>>>>> a bug inside PETSc's SNES that does not handle errors correctly and we 
>>>>>>>> can help fix the problem if you can provide a full debug output 
>>>>>>>> version of when the error occurs.
>>>>>>>> 
>>>>>>>>   Barry
>>>>>>>> 
>>>>>>>> 
>>>>>>>>   
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I think the key piece of info I'd like to know is, at what point is 
>>>>>>>>> the solution vector "unlocked" by the SNES object? Should it be 
>>>>>>>>> unlocked as soon as the SNES solve has terminated with PetscErrorCode 
>>>>>>>>> 82? Since it seems to me that it hasn't been unlocked yet (maybe just 
>>>>>>>>> on a subset of the processes). Should I manually "unlock" the 
>>>>>>>>> solution vector by calling VecLockWriteSet?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Dec 17, 2025 at 2:02 PM Stefano Zampini 
>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>> You are not allowed to call VecGetArray on the solution vector of an 
>>>>>>>>>> SNES object within a user callback, nor to modify its values in any 
>>>>>>>>>> other way.
>>>>>>>>>> Put in C++ lingo, the solution vector is a "const" argument
>>>>>>>>>> It would be great if you could provide an MWE to help us understand 
>>>>>>>>>> your problem
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Il giorno mer 17 dic 2025 alle ore 20:51 David Knezevic via 
>>>>>>>>>> petsc-users <[email protected] 
>>>>>>>>>> <mailto:[email protected]>> ha scritto:
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> I have a question about this error:
>>>>>>>>>>>> Vector 'Vec_0x84000005_0' (argument #2) was locked for read-only 
>>>>>>>>>>>> access in unknown_function() at unknown file:0 (line numbers only 
>>>>>>>>>>>> accurate to function begin)
>>>>>>>>>>> 
>>>>>>>>>>> I'm encountering this error in an FE solve where there is an error 
>>>>>>>>>>> encountered during the residual/jacobian assembly, and what we 
>>>>>>>>>>> normally do in that situation is shrink the load step and continue, 
>>>>>>>>>>> starting from the "last converged solution". However, in this case 
>>>>>>>>>>> I'm running on 32 processes, and 5 of the processes report the 
>>>>>>>>>>> error above about a "locked vector".
>>>>>>>>>>> 
>>>>>>>>>>> We clear the SNES object (via SNESDestroy) before we reset the 
>>>>>>>>>>> solution to the "last converged solution", and then we make a new 
>>>>>>>>>>> SNES object subsequently. But it seems to me that somehow the 
>>>>>>>>>>> solution vector is still marked as "locked" on 5 of the processes 
>>>>>>>>>>> when we modify the solution vector, which leads to the error above.
>>>>>>>>>>> 
>>>>>>>>>>> I was wondering if someone could advise on what the best way to 
>>>>>>>>>>> handle this would be? I thought one option could be to add an MPI 
>>>>>>>>>>> barrier call prior to updating the solution vector to "last 
>>>>>>>>>>> converged solution", to make sure that the SNES object is destroyed 
>>>>>>>>>>> on all procs (and hence the locks cleared) before editing the 
>>>>>>>>>>> solution vector, but I'm unsure if that would make a difference. 
>>>>>>>>>>> Any  help would be most appreciated!
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Stefano
>>>>>>>> 
>>>>>> 
>>>

Re: [petsc-users] Question regarding SNES error about locked vectors

Reply via email to