Hello,

STCG is being used to compute a search direction by inverting the Hessian of 
the objective onto the gradient. The Hessian has to be positive definitive for 
this search direction to be a valid descent direction. To enforce this, STCG 
terminates the KSP solution when it encounters negative curvature (i.e. 
“uphill” directions). The resulting search direction from this early 
termination is still a descent direction but it’s a pretty bad one, meaning 
that the line search is forced to do extra work to find a valid (and sometimes 
pretty small) step length. This is normal.

Once a negative curvature is detected, the NLS algorithm imposes a shift to the 
diagonal of the Hessian in order to guarantee that it will be positive-definite 
on the next optimization iteration. This diagonal shift is increased in 
magnitude if you keep hitting the negative curvature KSP termination, and it’s 
gradually decreased on iterations where the KSP solution succeeds. However this 
does not prevent the line search from doing all that extra work on iterations 
where the search direction solution never converged fully. It simply helps 
generate better search directions in subsequent iterations and helps the 
overall optimization solution eventually converge to a valid minimum.

As a side note, if your function, gradient and/or Hessian evaluations are 
expensive, it might actually make more sense to use the quasi-Newton method 
(BQNLS) instead of Newton line search. The optimization solution is likely to 
take more overall iterations with BQNLS(-tao_type bqnls), but the BFGS 
approximation used by the algorithm is inherently guaranteed to be 
positive-definite and tends to generate very well scaled search directions so 
it might help the line search do less work. Combined with the elimination of 
Hessian evaluations, it may very well solve your problem faster in CPU time 
even if it takes more nonlinear iterations overall. This is technically a bound 
constrained method but it solves unconstrained problems too when you don’t 
define any bounds.

About Levenberg-Marquardt: a user started the branch to eventually contribute 
an LM solver, but I have not heard any updates on it since end of April. For 
least-squares type problems, you can try using the regularized Gauss-Newton 
solver (-tao_type brgn). The problem definition interface is a bit different. 
BRGN requires the problem to be defined as a residual and its Jacobian, and it 
will assemble the gradient and the Hessian on its own and feed it into the 
standard Newton line search solver underneath. There are also a few different 
regularization options available, like proximal point Tikhonov (l2prox) and a 
sparsity term (l1dict) that can be set with the (-tao_brgn_regularization_type) 
option. These get automatically cooked into the gradient and Hessian. Combined 
with the automatic diagonal shifts for the Hessian, BRGN really is almost 
equivalent to an LM solver so it might do what you need.

Hope this helps!
—
Alp

On Jun 9, 2020, at 10:55 PM, zakaryah . 
<zakar...@gmail.com<mailto:zakar...@gmail.com>> wrote:

I am using TAO to minimize the elastic strain energy for somewhat complicated 
applied forces. With Newton linesearch (NLS), the STCG KSP, and several 
preconditioners (none, Jacobi, lmvm), the solver finds a minimum within an 
acceptable number of iterations (<50). Still, in the interest of performance, I 
am wondering about what happens after the KSP encounters an indefinite matrix 
(KSP_CONVERGED_CG_NEG_CURVE). After this happens, the line search performs 
extremely poorly, with function values at step length 1 reaching absurdly large 
values. As the line search tries smaller step lengths, the function values 
fall, then typically reach values representing a decline from the previous 
iteration, but only with tiny step lengths (~1e-11). The line search takes many 
iterations to arrive at such an improvement. Here is an example, run with  
-tao_type nls -tao_nls_ksp_type stcg -tao_nls_pc_type none 
-tao_nls_ksp_norm_type unpreconditioned:

  0 TAO,  Function value: 0.160612,  Residual: 0.0736074
  0 TAO,  Function value: 0.121568,  Residual: 0.0424117
    Residual norms for tao_nls_ solve.
    0 KSP Residual norm 4.241174778983e-02
    1 KSP Residual norm 5.806891169509e-02
    2 KSP Residual norm 6.492953448014e-02
    3 KSP Residual norm 5.856559430984e-02
    4 KSP Residual norm 5.262548559710e-02
    5 KSP Residual norm 4.863633473400e-02
    6 KSP Residual norm 4.725156347026e-02
    7 KSP Residual norm 4.748458009319e-02
    8 KSP Residual norm 4.885339641711e-02
    9 KSP Residual norm 5.065071226354e-02
   10 KSP Residual norm 5.085544070851e-02
   11 KSP Residual norm 5.127093547976e-02
   12 KSP Residual norm 5.155225884843e-02
   13 KSP Residual norm 5.219895021408e-02
   14 KSP Residual norm 6.480520610077e-02
   15 KSP Residual norm 1.433515456621e-01
  Linear tao_nls_ solve converged due to CONVERGED_CG_NEG_CURVE iterations 16
    0 LS    Function value: 0.121568,    Step length: 0.
    1 LS    Function value: 5.99839e+59,    Step length: 1.
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0., fy: 0.121568, dgy: -1.32893e+08
    2 LS    Function value: 1.46445e+56,    Step length: 0.25
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 1., fy: 5.99839e+59, dgy: 3.59903e+60
    3 LS    Function value: 3.57532e+52,    Step length: 0.0625
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.25, fy: 1.46445e+56, dgy: 3.51468e+57
    4 LS    Function value: 8.7288e+48,    Step length: 0.015625
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.0625, fy: 3.57532e+52, dgy: 3.4323e+54
    5 LS    Function value: 2.13105e+45,    Step length: 0.00390625
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.015625, fy: 8.7288e+48, dgy: 3.35186e+51
    6 LS    Function value: 5.20277e+41,    Step length: 0.000976562
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.00390625, fy: 2.13105e+45, dgy: 3.2733e+48
    7 LS    Function value: 1.27021e+38,    Step length: 0.000244141
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.000976562, fy: 5.20277e+41, dgy: 3.19658e+45
    8 LS    Function value: 3.1011e+34,    Step length: 6.10352e-05
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.000244141, fy: 1.27021e+38, dgy: 3.12166e+42
    9 LS    Function value: 7.57106e+30,    Step length: 1.52588e-05
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 6.10352e-05, fy: 3.1011e+34, dgy: 3.0485e+39
   10 LS    Function value: 1.84842e+27,    Step length: 3.8147e-06
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 1.52588e-05, fy: 7.57106e+30, dgy: 2.97706e+36
   11 LS    Function value: 4.51295e+23,    Step length: 9.53672e-07
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 3.8147e-06, fy: 1.84842e+27, dgy: 2.90731e+33
   12 LS    Function value: 1.10199e+20,    Step length: 2.38417e-07
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 9.53672e-07, fy: 4.51295e+23, dgy: 2.83927e+30
   13 LS    Function value: 2.69231e+16,    Step length: 5.96028e-08
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 2.38417e-07, fy: 1.10199e+20, dgy: 2.77314e+27
   14 LS    Function value: 6.59192e+12,    Step length: 1.48993e-08
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 5.96028e-08, fy: 2.69231e+16, dgy: 2.70974e+24
   15 LS    Function value: 1.62897e+09,    Step length: 3.72338e-09
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 1.48993e-08, fy: 6.59192e+12, dgy: 2.65254e+21
   16 LS    Function value: 421305.,    Step length: 9.29291e-10
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 3.72338e-09, fy: 1.62897e+09, dgy: 2.61626e+18
   17 LS    Function value: 141.939,    Step length: 2.30343e-10
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 9.29291e-10, fy: 421305., dgy: 2.67496e+15
   18 LS    Function value: 0.213621,    Step length: 5.45712e-11
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 2.30343e-10, fy: 141.939, dgy: 3.35874e+12
   19 LS    Function value: 0.120058,    Step length: 1.32286e-11
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 5.45712e-11, fy: 0.213621, dgy: 8.60977e+09
  1 TAO,  Function value: 0.120058,  Residual: 0.0484495

When the KSP does not encounter negative curvature, the linesearch performs 
well:
  3 TAO,  Function value: 0.118376,  Residual: 0.0545446
    Residual norms for tao_nls_ solve.
    0 KSP Residual norm 5.454463134947e-02
    1 KSP Residual norm 2.394277461960e-02
    2 KSP Residual norm 6.674182971627e-03
    3 KSP Residual norm 1.235116278289e-03
    4 KSP Residual norm 1.714792759324e-04
    5 KSP Residual norm 3.928769927518e-05
    6 KSP Residual norm 8.464174666739e-06
    7 KSP Residual norm 1.583763581407e-06
    8 KSP Residual norm 3.251685842746e-07
  Linear tao_nls_ solve converged due to CONVERGED_RTOL iterations 8
    0 LS    Function value: 0.118376,    Step length: 0.
    1 LS    Function value: 0.117884,    Step length: 1.
        stx: 0., fx: 0.118376, dgx: -0.000530697
        sty: 0., fy: 0.118376, dgy: -0.000530697

I have attached the full log for this particular solve. The points of negative 
curvature do not surprise me - this is a difficult problem, with many 
singularities in the Jacobian/Hessian. In fact, I am very happy with how well 
STCG is performing, globally. My question is what to make of the poor 
performance of the linesearch after encountering these points. As I understand 
it, most of the flags for adjusting the solver after encountering an indefinite 
matrix involve altering the magnitude of the perturbation by which the Hessian 
is offset for calculating the update direction. I am not clear on how to adjust 
the routine for determining the initial step size. More generally, should I 
expect these points of negative curvature to cause difficulties for the solver, 
and be satisfied with a large number of iterations before finding an 
improvement at tiny step size?

I have a loosely related question about the status of the Levenberg Marquardt 
solver within Tao? I remember hearing on the list that this was being worked 
on, but I am not sure from the branch description whether it is working or 
intended for general use. I would love to hear any updates on that, because I 
think it could be useful for my problem.

Thanks!
<Tao_log.txt>

Reply via email to