Hi Alp, Thanks for the help. Quasi-Newton seems promising - the Tao solver eventually converges, sometimes after hundreds or even thousands of iterations, with each iterate proceeding very quickly thanks to not evaluating the Hessian. I have only tried this with the problem set up as a general optimization, i.e., not a least-squares problem per se.
Your help got me started with testing BRGN, which I'd like to explore more. Can you give me some advice on preconditioners for the subsolver? I guess that the subsolver uses a shell matrix to calculate elements of J^T J from the Jacobian J. I have good preconditioners for J, but I'm not sure what to do in order to apply them to the shell matrix. Thanks, Zak On Wed, Jun 10, 2020 at 3:10 PM Dener, Alp <ade...@anl.gov> wrote: > Hi Zak, > > You got it right with the TaoBRGNGetSubsolver -> TaoGetKSP workflow. This > will get you the KSP object correctly. > > BRGN is not a stand-alone solver. It’s a wrapper that combines the > user-provided residual and Jacobian callbacks to assemble the gradient and > Hessian under the Gauss-Newton formulation, and feed that information into > a standard TAO solver like NLS. It also incorporates some regularizations > to the problem too. Technically any user could also do this themselves and > directly use NLS to solve a least squares problem, but BRGN reduces the > workload a little bit. > > To use it for your least-squares problem, you would need to set TaoType() > to TAOBRGN and then provide the Tao solver two callbacks: one for the > residual using TaoSetResidualRoutine() and another for the Jacobian using > TaoSetJacobianResidualRoutine(). The regularization can be changed using > -tao_brgn_regularization_type <none,l2pure,l2prox,l1dict> command line > options. > > Note that the BRGN sub solver can also be changed. It’s set to BNLS by > default — this is bound constrained Newton line search but it also solves > unconstrained problems when no bounds are defined. You could always try to > use BNTR for a trust-region method instead of line search, or switch to > BNQLS to do a quasi-Gauss-Newton solution. The sub solver’s settings can be > accessed via the -tao_brgn_subsolver_ prefix. So for instance if you wanted > to change type, you could do "-tao_brgn_subsolver_tao_type bntr" for Newton > trust-region. > > Alp > > On Jun 10, 2020, at 1:55 PM, zakaryah . <zakar...@gmail.com> wrote: > > Oh, maybe I can answer my own question - for BRGN, is it the subsolver > that has a KSP? Then I would call > > TaoBRGNGetSubsolver(tao, &subsolver); > TaoGetKSP(subsolver,&ksp); > > ? > > On Wed, Jun 10, 2020 at 2:52 PM zakaryah . <zakar...@gmail.com> wrote: > >> Hi Alp, >> >> Thanks very much for this thorough answer. I understand the situation >> much better now. >> >> I will experiment with quasi-Newton methods - this should be >> straightforward to test and evaluate. I agree that the regularized >> Gauss-Newton may be useful in place of the Levenberg Marquardt per se. I am >> having some trouble understanding how it is set up and how to use it for my >> purposes. >> >> Using BRGN, does the Tao solver itself not have a KSP? After setting up >> the solver, running TaoGetKSP returns a NULL KSP. It seems I am doing >> something wrong, but with other Tao types, like nls, everything seems to >> work fine. >> >> Thanks for your help! >> >> Cheers, Zak >> >> >> On Wed, Jun 10, 2020 at 11:29 AM Dener, Alp <ade...@anl.gov> wrote: >> >>> Hello, >>> >>> STCG is being used to compute a search direction by inverting the >>> Hessian of the objective onto the gradient. The Hessian has to be positive >>> definitive for this search direction to be a valid descent direction. To >>> enforce this, STCG terminates the KSP solution when it encounters negative >>> curvature (i.e. “uphill” directions). The resulting search direction from >>> this early termination is still a descent direction but it’s a pretty bad >>> one, meaning that the line search is forced to do extra work to find a >>> valid (and sometimes pretty small) step length. This is normal. >>> >>> Once a negative curvature is detected, the NLS algorithm imposes a shift >>> to the diagonal of the Hessian in order to guarantee that it will be >>> positive-definite on the next optimization iteration. This diagonal shift >>> is increased in magnitude if you keep hitting the negative curvature KSP >>> termination, and it’s gradually decreased on iterations where the KSP >>> solution succeeds. However this does not prevent the line search from doing >>> all that extra work on iterations where the search direction solution never >>> converged fully. It simply helps generate better search directions in >>> subsequent iterations and helps the overall optimization solution >>> eventually converge to a valid minimum. >>> >>> As a side note, if your function, gradient and/or Hessian evaluations >>> are expensive, it might actually make more sense to use the quasi-Newton >>> method (BQNLS) instead of Newton line search. The optimization solution is >>> likely to take more overall iterations with BQNLS(-tao_type bqnls), but the >>> BFGS approximation used by the algorithm is inherently guaranteed to be >>> positive-definite and tends to generate very well scaled search directions >>> so it might help the line search do less work. Combined with the >>> elimination of Hessian evaluations, it may very well solve your problem >>> faster in CPU time even if it takes more nonlinear iterations overall. This >>> is technically a bound constrained method but it solves unconstrained >>> problems too when you don’t define any bounds. >>> >>> About Levenberg-Marquardt: a user started the branch to eventually >>> contribute an LM solver, but I have not heard any updates on it since end >>> of April. For least-squares type problems, you can try using the >>> regularized Gauss-Newton solver (-tao_type brgn). The problem definition >>> interface is a bit different. BRGN requires the problem to be defined as a >>> residual and its Jacobian, and it will assemble the gradient and the >>> Hessian on its own and feed it into the standard Newton line search solver >>> underneath. There are also a few different regularization options >>> available, like proximal point Tikhonov (l2prox) and a sparsity term >>> (l1dict) that can be set with the (-tao_brgn_regularization_type) option. >>> These get automatically cooked into the gradient and Hessian. Combined with >>> the automatic diagonal shifts for the Hessian, BRGN really is almost >>> equivalent to an LM solver so it might do what you need. >>> >>> Hope this helps! >>> — >>> Alp >>> >>> On Jun 9, 2020, at 10:55 PM, zakaryah . <zakar...@gmail.com> wrote: >>> >>> I am using TAO to minimize the elastic strain energy for somewhat >>> complicated applied forces. With Newton linesearch (NLS), the STCG KSP, and >>> several preconditioners (none, Jacobi, lmvm), the solver finds a minimum >>> within an acceptable number of iterations (<50). Still, in the interest of >>> performance, I am wondering about what happens after the KSP encounters an >>> indefinite matrix (KSP_CONVERGED_CG_NEG_CURVE). After this happens, the >>> line search performs extremely poorly, with function values at step length >>> 1 reaching absurdly large values. As the line search tries smaller step >>> lengths, the function values fall, then typically reach values representing >>> a decline from the previous iteration, but only with tiny step lengths >>> (~1e-11). The line search takes many iterations to arrive at such an >>> improvement. Here is an example, run with -tao_type nls -tao_nls_ksp_type >>> stcg -tao_nls_pc_type none -tao_nls_ksp_norm_type unpreconditioned: >>> >>> 0 TAO, Function value: 0.160612, Residual: 0.0736074 >>> >>> >>> >>> 0 TAO, Function value: 0.121568, Residual: 0.0424117 >>> >>> >>> >>> Residual norms for tao_nls_ solve. >>> >>> >>> >>> 0 KSP Residual norm 4.241174778983e-02 >>> >>> >>> >>> 1 KSP Residual norm 5.806891169509e-02 >>> >>> >>> >>> 2 KSP Residual norm 6.492953448014e-02 >>> >>> >>> >>> 3 KSP Residual norm 5.856559430984e-02 >>> >>> >>> >>> 4 KSP Residual norm 5.262548559710e-02 >>> >>> >>> >>> 5 KSP Residual norm 4.863633473400e-02 >>> >>> >>> >>> 6 KSP Residual norm 4.725156347026e-02 >>> >>> >>> >>> 7 KSP Residual norm 4.748458009319e-02 >>> >>> >>> >>> 8 KSP Residual norm 4.885339641711e-02 >>> >>> >>> >>> 9 KSP Residual norm 5.065071226354e-02 >>> >>> >>> >>> 10 KSP Residual norm 5.085544070851e-02 >>> >>> >>> >>> 11 KSP Residual norm 5.127093547976e-02 >>> >>> >>> >>> 12 KSP Residual norm 5.155225884843e-02 >>> >>> >>> >>> 13 KSP Residual norm 5.219895021408e-02 >>> >>> >>> >>> 14 KSP Residual norm 6.480520610077e-02 >>> >>> >>> >>> 15 KSP Residual norm 1.433515456621e-01 >>> >>> >>> >>> Linear tao_nls_ solve converged due to CONVERGED_CG_NEG_CURVE >>> iterations 16 >>> >>> >>> 0 LS Function value: 0.121568, Step length: 0. >>> >>> >>> >>> 1 LS Function value: 5.99839e+59, Step length: 1. >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 0., fy: 0.121568, dgy: -1.32893e+08 >>> 2 LS Function value: 1.46445e+56, Step length: 0.25 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 1., fy: 5.99839e+59, dgy: 3.59903e+60 >>> >>> >>> >>> 3 LS Function value: 3.57532e+52, Step length: 0.0625 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 0.25, fy: 1.46445e+56, dgy: 3.51468e+57 >>> >>> >>> >>> 4 LS Function value: 8.7288e+48, Step length: 0.015625 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 0.0625, fy: 3.57532e+52, dgy: 3.4323e+54 >>> >>> >>> >>> 5 LS Function value: 2.13105e+45, Step length: 0.00390625 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 0.015625, fy: 8.7288e+48, dgy: 3.35186e+51 >>> >>> >>> >>> 6 LS Function value: 5.20277e+41, Step length: 0.000976562 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 0.00390625, fy: 2.13105e+45, dgy: 3.2733e+48 >>> >>> >>> >>> 7 LS Function value: 1.27021e+38, Step length: 0.000244141 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 0.000976562, fy: 5.20277e+41, dgy: 3.19658e+45 >>> >>> >>> >>> 8 LS Function value: 3.1011e+34, Step length: 6.10352e-05 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 0.000244141, fy: 1.27021e+38, dgy: 3.12166e+42 >>> >>> >>> >>> 9 LS Function value: 7.57106e+30, Step length: 1.52588e-05 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 6.10352e-05, fy: 3.1011e+34, dgy: 3.0485e+39 >>> >>> >>> >>> 10 LS Function value: 1.84842e+27, Step length: 3.8147e-06 >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 1.52588e-05, fy: 7.57106e+30, dgy: 2.97706e+36 >>> >>> >>> >>> 11 LS Function value: 4.51295e+23, Step length: 9.53672e-07 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 3.8147e-06, fy: 1.84842e+27, dgy: 2.90731e+33 >>> >>> >>> >>> 12 LS Function value: 1.10199e+20, Step length: 2.38417e-07 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 9.53672e-07, fy: 4.51295e+23, dgy: 2.83927e+30 >>> >>> >>> >>> 13 LS Function value: 2.69231e+16, Step length: 5.96028e-08 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 2.38417e-07, fy: 1.10199e+20, dgy: 2.77314e+27 >>> >>> >>> >>> 14 LS Function value: 6.59192e+12, Step length: 1.48993e-08 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 5.96028e-08, fy: 2.69231e+16, dgy: 2.70974e+24 >>> >>> >>> >>> 15 LS Function value: 1.62897e+09, Step length: 3.72338e-09 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 1.48993e-08, fy: 6.59192e+12, dgy: 2.65254e+21 >>> >>> >>> >>> 16 LS Function value: 421305., Step length: 9.29291e-10 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 3.72338e-09, fy: 1.62897e+09, dgy: 2.61626e+18 >>> >>> >>> >>> 17 LS Function value: 141.939, Step length: 2.30343e-10 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 9.29291e-10, fy: 421305., dgy: 2.67496e+15 >>> >>> >>> >>> 18 LS Function value: 0.213621, Step length: 5.45712e-11 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> >>> >>> >>> sty: 2.30343e-10, fy: 141.939, dgy: 3.35874e+12 >>> >>> >>> >>> 19 LS Function value: 0.120058, Step length: 1.32286e-11 >>> >>> >>> >>> stx: 0., fx: 0.121568, dgx: -1.32893e+08 >>> sty: 5.45712e-11, fy: 0.213621, dgy: 8.60977e+09 >>> 1 TAO, Function value: 0.120058, Residual: 0.0484495 >>> >>> When the KSP does not encounter negative curvature, the linesearch >>> performs well: >>> 3 TAO, Function value: 0.118376, Residual: 0.0545446 >>> >>> >>> >>> Residual norms for tao_nls_ solve. >>> >>> >>> >>> 0 KSP Residual norm 5.454463134947e-02 >>> >>> >>> >>> 1 KSP Residual norm 2.394277461960e-02 >>> >>> >>> >>> 2 KSP Residual norm 6.674182971627e-03 >>> >>> >>> >>> 3 KSP Residual norm 1.235116278289e-03 >>> >>> >>> >>> 4 KSP Residual norm 1.714792759324e-04 >>> >>> >>> >>> 5 KSP Residual norm 3.928769927518e-05 >>> >>> >>> >>> 6 KSP Residual norm 8.464174666739e-06 >>> >>> >>> >>> 7 KSP Residual norm 1.583763581407e-06 >>> >>> >>> >>> 8 KSP Residual norm 3.251685842746e-07 >>> >>> >>> >>> Linear tao_nls_ solve converged due to CONVERGED_RTOL iterations 8 >>> >>> >>> >>> 0 LS Function value: 0.118376, Step length: 0. >>> >>> >>> >>> 1 LS Function value: 0.117884, Step length: 1. >>> >>> >>> >>> stx: 0., fx: 0.118376, dgx: -0.000530697 >>> >>> >>> >>> sty: 0., fy: 0.118376, dgy: -0.000530697 >>> >>> I have attached the full log for this particular solve. The points of >>> negative curvature do not surprise me - this is a difficult problem, with >>> many singularities in the Jacobian/Hessian. In fact, I am very happy with >>> how well STCG is performing, globally. My question is what to make of the >>> poor performance of the linesearch after encountering these points. As I >>> understand it, most of the flags for adjusting the solver after >>> encountering an indefinite matrix involve altering the magnitude of the >>> perturbation by which the Hessian is offset for calculating the update >>> direction. I am not clear on how to adjust the routine for determining the >>> initial step size. More generally, should I expect these points of negative >>> curvature to cause difficulties for the solver, and be satisfied with a >>> large number of iterations before finding an improvement at tiny step size? >>> >>> I have a loosely related question about the status of the Levenberg >>> Marquardt solver within Tao? I remember hearing on the list that this was >>> being worked on, but I am not sure from the branch description whether it >>> is working or intended for general use. I would love to hear any updates on >>> that, because I think it could be useful for my problem. >>> >>> Thanks! >>> <Tao_log.txt> >>> >>> >>> >