Re: [petsc-users] Debugging SNES when "everything" looks okay

Matthew Knepley Wed, 22 Aug 2018 11:50:55 -0700

On Wed, Aug 22, 2018 at 1:33 PM Ellen Price <[email protected]>
wrote:


> For Barry:
>
> Okay, I'm stumped. Some of my Jacobian terms match up with the finite
> difference one. Others are off by orders of magnitude. This led me to think
> that maybe there was something wrong with my analytic Jacobian, so I went
> back and started trying to track down which term causes the problem. The
> weird thing is that Mathematica gives the same numerical result as my code
> (within a small tolerance), whether i substitute numbers into its analytic
> Jacobian or numerically differentiate the function I'm trying to solve
> using Mathematica. I can't think of anything else to try for debugging the
> Jacobian. Even if the function has a bug in it, Mathematica seems pretty
> sure that I'm using the right Jacobian for what I put in... At the
> suggestion of the FAQ, I also tried "-mat_fd_type ds" to see if that made
> any difference, but it doesn't. Like I said, I'm stumped, but I'll keep
> trying.
>
> For Matt:
>
> Using options "-pc_type lu -snes_view -snes_monitor -snes_converged_reason
> -ksp_monitor_true_residual -ksp_converged_reason -snes_type newtontr", the
> output is below. I thought that the step length convergence message had
> something to do with using NEWTONTR, but I could be wrong about that. It
> doesn't give that message when I switch to NEWTONLS. It also only gives
> that message after the first iteration -- the first one is always
> CONVERGED_RTOL.
>
> 7.533921e-03, 2.054699e+02
>   0 SNES Function norm 6.252612941119e+04
>     0 KSP preconditioned resid norm 1.027204021595e-01 true resid norm
> 6.252612941119e+04 ||r(i)||/||b|| 1.000000000000e+00
>     1 KSP preconditioned resid norm 2.641617517320e-17 true resid norm
> 3.941610607093e-11 ||r(i)||/||b|| 6.303941478885e-16
>   Linear solve converged due to CONVERGED_RTOL iterations 1
>

Shoot, I forgot to have you add -snes_linesearch_monitor, but what that
would show is that you cut down the step
scaling lambda until its 1e-10 or something, so nothing changes in your
residual norm. This means your Jacobian
does not match your function.


>   1 SNES Function norm 6.252085930204e+04
>     0 KSP preconditioned resid norm 3.857372064539e-01 true resid norm
> 6.252085930204e+04 ||r(i)||/||b|| 1.000000000000e+00
>     1 KSP preconditioned resid norm 3.664921095158e-16 true resid norm
> 2.434875289829e-10 ||r(i)||/||b|| 3.894500678672e-15
>   Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
>

This is another indication that there is mismatch. Now the solution to your
linear equation is almost 0, so Newton thinks you
should not move at all.

  Matt


>   2 SNES Function norm 6.251886926050e+04
>     0 KSP preconditioned resid norm 4.085000006938e-01 true resid norm
> 6.251886926050e+04 ||r(i)||/||b|| 1.000000000000e+00
>     1 KSP preconditioned resid norm 5.318519617927e-16 true resid norm
> 8.189931340015e-10 ||r(i)||/||b|| 1.309993516660e-14
>   Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
> .....
>  48 SNES Function norm 6.229932776333e+04
>     0 KSP preconditioned resid norm 4.872526135345e+00 true resid norm
> 6.229932776333e+04 ||r(i)||/||b|| 1.000000000000e+00
>     1 KSP preconditioned resid norm 7.281571852581e-15 true resid norm
> 8.562155918387e-09 ||r(i)||/||b|| 1.374357673796e-13
>   Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
>  49 SNES Function norm 6.229663411026e+04
>     0 KSP preconditioned resid norm 4.953303401996e+00 true resid norm
> 6.229663411026e+04 ||r(i)||/||b|| 1.000000000000e+00
>     1 KSP preconditioned resid norm 1.537319689814e-15 true resid norm
> 4.783442569088e-09 ||r(i)||/||b|| 7.678492806886e-14
>   Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
>  50 SNES Function norm 6.229442686875e+04
> Nonlinear solve did not converge due to DIVERGED_MAX_IT iterations 50
> SNES Object: 2 MPI processes
>   type: newtontr
>     Trust region tolerance (-snes_trtol)
>     mu=0.25, eta=0.75, sigma=0.0001
>     delta0=0.2, delta1=0.3, delta2=0.75, delta3=2.
>   maximum iterations=50, maximum function evaluations=10000
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=50
>   total number of function evaluations=58
>   norm schedule ALWAYS
>   SNESLineSearch Object: 2 MPI processes
>     type: basic
>     maxstep=1.000000e+08, minlambda=1.000000e-12
>     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
> lambda=1.000000e-08
>     maximum iterations=1
>   KSP Object: 2 MPI processes
>     type: gmres
>       restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>       happy breakdown tolerance 1e-30
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using PRECONDITIONED norm type for convergence test
>   PC Object: 2 MPI processes
>     type: lu
>       out-of-place factorization
>       tolerance for zero pivot 2.22045e-14
>       matrix ordering: natural
>       factor fill ratio given 0., needed 0.
>         Factored matrix follows:
>           Mat Object: 2 MPI processes
>             type: mumps
>             rows=20, cols=20
>             package used to perform factorization: mumps
>             total: nonzeros=112, allocated nonzeros=112
>             total number of mallocs used during MatSetValues calls =0
>               MUMPS run parameters:
>                 SYM (matrix type):                   0
>                 PAR (host participation):            1
>                 ICNTL(1) (output for error):         6
>                 ICNTL(2) (output of diagnostic msg): 0
>                 ICNTL(3) (output for global info):   0
>                 ICNTL(4) (level of printing):        0
>                 ICNTL(5) (input mat struct):         0
>                 ICNTL(6) (matrix prescaling):        7
>                 ICNTL(7) (sequential matrix ordering):7
>                 ICNTL(8) (scaling strategy):        77
>                 ICNTL(10) (max num of refinements):  0
>                 ICNTL(11) (error analysis):          0
>                 ICNTL(12) (efficiency control):                         1
>                 ICNTL(13) (efficiency control):                         0
>                 ICNTL(14) (percentage of estimated workspace increase): 20
>                 ICNTL(18) (input mat struct):                           3
>                 ICNTL(19) (Schur complement info):                       0
>                 ICNTL(20) (rhs sparse pattern):                         0
>                 ICNTL(21) (solution struct):                            1
>                 ICNTL(22) (in-core/out-of-core facility):               0
>                 ICNTL(23) (max size of memory can be allocated locally):0
>                 ICNTL(24) (detection of null pivot rows):               0
>                 ICNTL(25) (computation of a null space basis):          0
>                 ICNTL(26) (Schur options for rhs or solution):          0
>                 ICNTL(27) (experimental parameter):
>  -32
>                 ICNTL(28) (use parallel or sequential ordering):        1
>                 ICNTL(29) (parallel ordering):                          0
>                 ICNTL(30) (user-specified set of entries in inv(A)):    0
>                 ICNTL(31) (factors is discarded in the solve phase):    0
>                 ICNTL(33) (compute determinant):                        0
>                 ICNTL(35) (activate BLR based factorization):           0
>                 CNTL(1) (relative pivoting threshold):      0.01
>                 CNTL(2) (stopping criterion of refinement): 1.49012e-08
>                 CNTL(3) (absolute pivoting threshold):      0.
>                 CNTL(4) (value of static pivoting):         -1.
>                 CNTL(5) (fixation for null pivots):         0.
>                 CNTL(7) (dropping parameter for BLR):       0.
>                 RINFO(1) (local estimated flops for the elimination after
> analysis):
>                   [0] 127.
>                   [1] 155.
>                 RINFO(2) (local estimated flops for the assembly after
> factorization):
>                   [0]  16.
>                   [1]  16.
>                 RINFO(3) (local estimated flops for the elimination after
> factorization):
>                   [0]  127.
>                   [1]  155.
>                 INFO(15) (estimated size of (in MB) MUMPS internal data
> for running numerical factorization):
>                 [0] 1
>                 [1] 1
>                 INFO(16) (size of (in MB) MUMPS internal data used during
> numerical factorization):
>                   [0] 1
>                   [1] 1
>                 INFO(23) (num of pivots eliminated on this processor after
> factorization):
>                   [0] 10
>                   [1] 10
>                 RINFOG(1) (global estimated flops for the elimination
> after analysis): 282.
>                 RINFOG(2) (global estimated flops for the assembly after
> factorization): 32.
>                 RINFOG(3) (global estimated flops for the elimination
> after factorization): 282.
>                 (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
> (0.,0.)*(2^0)
>                 INFOG(3) (estimated real workspace for factors on all
> processors after analysis): 112
>                 INFOG(4) (estimated integer workspace for factors on all
> processors after analysis): 225
>                 INFOG(5) (estimated maximum front size in the complete
> tree): 4
>                 INFOG(6) (number of nodes in the complete tree): 9
>                 INFOG(7) (ordering option effectively use after analysis):
> 2
>                 INFOG(8) (structural symmetry in percent of the permuted
> matrix after analysis): 100
>                 INFOG(9) (total real/complex workspace to store the matrix
> factors after factorization): 112
>                 INFOG(10) (total integer space store the matrix factors
> after factorization): 225
>                 INFOG(11) (order of largest frontal matrix after
> factorization): 4
>                 INFOG(12) (number of off-diagonal pivots): 0
>                 INFOG(13) (number of delayed pivots after factorization):
> 0
>                 INFOG(14) (number of memory compress after factorization):
> 0
>                 INFOG(15) (number of steps of iterative refinement after
> solution): 0
>                 INFOG(16) (estimated size (in MB) of all MUMPS internal
> data for factorization after analysis: value on the most memory consuming
> processor): 1
>                 INFOG(17) (estimated size of all MUMPS internal data for
> factorization after analysis: sum over all processors): 2
>                 INFOG(18) (size of all MUMPS internal data allocated
> during factorization: value on the most memory consuming processor): 1
>                 INFOG(19) (size of all MUMPS internal data allocated
> during factorization: sum over all processors): 2
>                 INFOG(20) (estimated number of entries in the factors):
> 112
>                 INFOG(21) (size in MB of memory effectively used during
> factorization - value on the most memory consuming processor): 1
>                 INFOG(22) (size in MB of memory effectively used during
> factorization - sum over all processors): 2
>                 INFOG(23) (after analysis: value of ICNTL(6) effectively
> used): 0
>                 INFOG(24) (after analysis: value of ICNTL(12) effectively
> used): 1
>                 INFOG(25) (after factorization: number of pivots modified
> by static pivoting): 0
>                 INFOG(28) (after factorization: number of null pivots
> encountered): 0
>                 INFOG(29) (after factorization: effective number of
> entries in the factors (sum over all processors)): 112
>                 INFOG(30, 31) (after solution: size in Mbytes of memory
> used during solution phase): 0, 0
>                 INFOG(32) (after analysis: type of analysis done): 1
>                 INFOG(33) (value used for ICNTL(8)): 7
>                 INFOG(34) (exponent of the determinant if determinant is
> requested): 0
>     linear system matrix = precond matrix:
>     Mat Object: 2 MPI processes
>       type: mpiaij
>       rows=20, cols=20, bs=2
>       total: nonzeros=112, allocated nonzeros=240
>       total number of mallocs used during MatSetValues calls =0
>
> Ellen
>
> On Tue, Aug 21, 2018 at 10:16 PM Matthew Knepley <[email protected]>
> wrote:
>
>> On Tue, Aug 21, 2018 at 10:00 PM Ellen M. Price <
>> [email protected]> wrote:
>>
>>> Hi PETSc users,
>>>
>>> I'm having trouble applying SNES to a new problem I'm working on. I'll
>>> try to be as complete as possible but can't post the full code because
>>> it's ongoing research and is pretty long anyway.
>>>
>>> The nonlinear problem arises from trying to solve a set of two coupled
>>> ODEs using a Galerkin method. I am using Mathematica to generate the
>>> objective function to solve and the Jacobian, so I *think* I can rule
>>> out human error on that front.
>>>
>>> There are four things I can easily change:
>>>
>>> - number of DMDA grid points N (I've tried 100 and 1000)
>>> - preconditioner (I've tried LU and SVD, LU appears to work better, and
>>> SVD is too slow for N = 1000)
>>> - linear solver (haven't played with this much, but sometimes smaller
>>> tolerances work better)
>>> - nonlinear solver (what I'm having trouble with)
>>>
>>> Trying different solvers should, as far as I'm aware, give comparable
>>> answers, but that's not the case here. NEWTONTR works best of the ones
>>> I've tried, but I'm suspicious that the function value barely decreases
>>> before SNES "converges" -- and none of the options I've tried changing
>>> seem to affect this, as it just finds another reason to converge without
>>> making any real progress. For example:
>>>
>>>   0 SNES Function norm 7.197788479418e+00
>>>     0 KSP Residual norm 1.721996766619e+01
>>>     1 KSP Residual norm 5.186021549059e-14
>>>   Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
>>>
>>
>> This makes no sense. LU should converge due to atol or rtol. Send the
>> output of
>>
>>   -snes_view -snes_monitor -snes_converged_reason
>> -ksp_monitor_true_residual -ksp_converged_reason
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>>   1 SNES Function norm 7.197777674987e+00
>>>     0 KSP Residual norm 3.296112185897e+01
>>>     1 KSP Residual norm 2.713415432045e-13
>>>   Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
>>> .....
>>>  50 SNES Function norm 7.197376803542e+00
>>>     0 KSP Residual norm 6.222518656302e+02
>>>     1 KSP Residual norm 9.630659996504e-12
>>>   Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
>>>  51 SNES Function norm 7.197376803542e+00
>>> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 50
>>>
>>> I've tried everything I can think of and the FAQ suggestions, including
>>> non-dimensionalizing the problem; I observe the same behavior either
>>> way. The particularly strange thing I can't understand is why some of
>>> the SNES methods fail outright, after just one iteration (NCG, NGMRES,
>>> and others) with DIVERGED_DTOL. Unless I've misunderstood, it seems
>>> like, for the most part, I should be able to substitute in one method
>>> for another, possibly adjusting a few parameters along the way.
>>> Furthermore, the default method, NEWTONLS, diverges with
>>> DIVERGED_LINE_SEARCH, which I'm not sure how to debug.
>>>
>>> So the only viable method is NEWTONTR, and that doesn't appear to
>>> "really" converge. Any suggestions for further things to try are
>>> appreciated. My current options are:
>>>
>>> -pc_type lu -snes_monitor -snes_converged_reason -ksp_converged_reason
>>> -snes_max_it 10000 -ksp_monitor -snes_type newtonls
>>>
>>> Thanks in advance,
>>> Ellen Price
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/ <http://www.caam.rice.edu/~mk51/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.caam.rice.edu/~mk51/>

Re: [petsc-users] Debugging SNES when "everything" looks okay

Reply via email to