Re: [petsc-users] Report Bug TaoALMM class

Barry Smith Fri, 11 Nov 2022 20:44:43 -0800

  I am still working to understand. I have a PETSc branch 
barry/2022-11-11/fixes-for-tao/release where I have made a few fix/improvements 
to help me run and debug with your code.


I made a tiny change to your code, passing Hessian twice, and ran with 

./test_tao_neohooke            -tao_monitor        -tao_view        -tao_max_it 
500 -tao_converged_reason -tao_lmvm_recycle -tao_type nls -tao_ls_monitor

 and got 

18 TAO,  Function value: -0.0383888,  Residual: 7.46748e-11 
  TAO  solve converged due to CONVERGED_GATOL iterations 18

Is this what you expect? Also works with ntr

If I run with 

./test_tao_neohooke            -tao_monitor        -tao_view        -tao_max_it 
10000 -tao_converged_reason  -tao_type lmvm -tao_ls_monitor

 I get 

2753 TAO,  Function value: -0.0161685,  Residual: 0.120782 
    0 LS    Function value: -0.0161685,    Step length: 0.
    1 LS    Function value: 4.49423e+307,    Step length: 1.
        stx: 0., fx: -0.0161685, dgx: -0.0145883
        sty: 0., fy: -0.0161685, dgy: -0.0145883
    2 LS    Function value: -0.0161685,    Step length: 0.
        stx: 0., fx: -0.0161685, dgx: -0.0145883
        sty: 1., fy: 4.49423e+307, dgy: 5.68594e+307
2754 TAO,  Function value: -0.0161685,  Residual: 0.120782 
  TAO  solve did not converge due to DIVERGED_LS_FAILURE iteration 2754

Note the insane fy value that pops up at the end.

The next one

./test_tao_neohooke            -tao_monitor        -tao_view        -tao_max_it 
500 -tao_converged_reason -tao_lmvm_recycle -tao_type owlqn -tao_ls_monitor
  0 TAO,  Function value: 0.,  Residual: 0. 
  TAO  solve converged due to CONVERGED_GATOL iterations 0

fails right off the bat, somehow the initial residual norm is 0, which should 
not depend on the solver (maybe a bug in Tao?)

bmrm gets stuck far from the minimum found by the Newton methods.

1719 TAO,  Function value: -2.36706e-06,  Residual: 1.94494e-09

I realize this is still far from the problem you reported (and I agree is a 
bug), I am working to understand enough to provide a proper fix to the bug 
instead of just doing something ad hoc.

 Barry



> On Nov 4, 2022, at 7:43 AM, Stephan Köhler 
> <[email protected]> wrote:
> 
> Barry,
> 
> this is a nonartificial code.  This is a problem in the ALMM subsolver.  I 
> want to solve a problem with a TaoALMM solver what       then happens is:
> 
> TaoSolve(tao)    /* TaoALMM solver */
>    |
>    |
>    |-------->   This calls the TaoALMM subsolver routine
>                   
>                  TaoSolve(subsolver)
>                        |
>                        |
>                        |----------->   The subsolver does not correctly work, 
> at least with an Armijo line search, since the solution is overwritten within 
> the line search.  
>                                        In my case, the subsolver does not 
> make any progress although it is possible.
> 
> To get to my real problem you can simply change line 268 to if(0)  (from 
> if(1) -----> if(0)) and line 317 from // ierr = TaoSolve(tao); CHKERRQ(ierr); 
>  -------> ierr = TaoSolve(tao); CHKERRQ(ierr);
> What you can see is that the solver does not make any progress, but it should 
> make progress.
> 
> To be honest, I do not really know why the option 
> -tao_almm_subsolver_tao_ls_monitor has know effect if the ALMM solver is 
> called and not the subsolver. I also do not know why 
> -tao_almm_subsolver_tao_view prints as termination reason for the subsolver 
> 
>      Solution converged:    ||g(X)|| <= gatol
> 
> This is obviously not the case.  I set the tolerance        
> -tao_almm_subsolver_tao_gatol 1e-8 \
> -tao_almm_subsolver_tao_grtol 1e-8 \
>  
> I encountered this and then I looked into the ALMM class and therefore I 
> tried to call the subsolver (previous example).
> 
> I attach the updated programm and also the options.
> 
> Stephan
> 
> 
> 
> 
> 
>  <https://www.dict.cc/?s=obviously>
> On 03.11.22 22:15, Barry Smith wrote:
>> 
>>   Thanks for your response and the code. I understand the potential problem 
>> and how your code demonstrates a bug if the TaoALMMSubsolverObjective() is 
>> used in the manner you use in the example where you directly call 
>> TaoComputeObjective() multiple times line a line search code might.
>> 
>>   What I don't have or understand is how to reproduce the problem in a real 
>> code that uses Tao. That is where the Tao Armijo line search code has a 
>> problem when it is used (somehow) in a Tao solver with ALMM. You suggest "If 
>> you have an example for your own, you can switch the Armijo line search by 
>> the option -tao_ls_type armijo.  The thing is that it will cause no problems 
>> if the line search accepts the steps with step length one."  I don't see how 
>> to do this if I use -tao_type almm I cannot use -tao_ls_type armijo; that is 
>> the option -tao_ls_type doesn't seem to me to be usable in the context of 
>> almm (since almm internally does directly its own trust region approach for 
>> globalization). If we remove the if (1) code from your example, is there 
>> some Tao options I can use to get the bug to appear inside the Tao solve?
>> 
>> I'll try to explain again, I agree that the fact that the Tao solution is 
>> aliased (within the ALMM solver) is a problem with repeated calls to 
>> TaoComputeObjective() but I cannot see how these repeated calls could ever 
>> happen in the use of TaoSolve() with the ALMM solver. That is when is this 
>> "design problem" a true problem as opposed to just a potential problem that 
>> can be demonstrated in artificial code?
>> 
>> The reason I need to understand the non-artificial situation it breaks 
>> things is to come up with an appropriate correction for the current code.
>> 
>>   Barry
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Nov 3, 2022, at 12:46 PM, Stephan Köhler 
>>> <[email protected]> 
>>> <mailto:[email protected]> wrote:
>>> 
>>> Barry,
>>> 
>>> so far, I have not experimented with trust-region methods, but I can 
>>> imagine that this "design feature" causes no problem for trust-region 
>>> methods, if the old point is saved and after the trust-region check fails 
>>> the old point is copied to the actual point.  But the implementation of the 
>>> Armijo line search method does not work that way.  Here, the actual point 
>>> will always be overwritten.  Only if the line search fails, then the old 
>>> point is restored, but then the TaoSolve method ends with a line search 
>>> failure. 
>>> 
>>> If you have an example for your own, you can switch the Armijo line search 
>>> by the option -tao_ls_type armijo.  The thing is that it will cause no 
>>> problems if the line search accepts the steps with step length one.  
>>> It is also possible that, by luck, it will cause no problems, if the 
>>> "excessive" step brings a reduction of the objective
>>> 
>>> Otherwise, I attach my example, which is not minimal, but here you can see 
>>> that it causes problems.  You need to set the paths to the PETSc library in 
>>> the makefile.  You find the options for this problem in the 
>>> run_test_tao_neohooke.sh script.
>>> The import part begins at line 292 in test_tao_neohooke.cpp
>>> 
>>> Stephan
>>> 
>>> On 02.11.22 19:04, Barry Smith wrote:
>>>>   Stephan,
>>>> 
>>>>     I have located the troublesome line in TaoSetUp_ALMM() it has the line
>>>> 
>>>>   auglag->Px = tao->solution;
>>>> 
>>>> and in alma.h it has 
>>>> 
>>>> Vec  Px, LgradX, Ce, Ci, G;         /* aliased vectors (do not destroy!) */
>>>> 
>>>> Now auglag->P in some situations alias auglag->P  and in some cases 
>>>> auglag->Px serves to hold a portion of auglag->P. So then in 
>>>> TaoALMMSubsolverObjective_Private()
>>>> the lines
>>>> 
>>>> PetscCall(VecCopy(P, auglag->P));
>>>>  PetscCall((*auglag->sub_obj)(auglag->parent));
>>>> 
>>>> causes, just as you said, tao->solution to be overwritten by the P at 
>>>> which the objective function is being computed. In other words, the 
>>>> solution of the outer Tao is aliased with the solution of the inner Tao, 
>>>> by design. 
>>>> 
>>>> You are definitely correct, the use of TaoALMMSubsolverObjective_Private 
>>>> and TaoALMMSubsolverObjectiveAndGradient_Private  in a line search would 
>>>> be problematic. 
>>>> 
>>>> I am not an expert at these methods or their implementations. Could you 
>>>> point to an actual use case within Tao that triggers the problem. Is there 
>>>> a set of command line options or code calls to Tao that fail due to this 
>>>> "design feature". Within the standard use of ALMM I do not see how the 
>>>> objective function would be used within a line search. The TaoSolve_ALMM() 
>>>> code is self-correcting in that if a trust region check fails it 
>>>> automatically rolls back the solution.
>>>> 
>>>>   Barry
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Oct 28, 2022, at 4:27 AM, Stephan Köhler 
>>>>> <[email protected]> 
>>>>> <mailto:[email protected]> wrote:
>>>>> 
>>>>> Dear PETSc/Tao team,
>>>>> 
>>>>> it seems to be that there is a bug in the TaoALMM class:
>>>>> 
>>>>> In the methods TaoALMMSubsolverObjective_Private and 
>>>>> TaoALMMSubsolverObjectiveAndGradient_Private the vector where the 
>>>>> function value for the augmented Lagrangian is evaluate
>>>>> is copied into the current solution, see, e.g., 
>>>>> https://petsc.org/release/src/tao/constrained/impls/almm/almm.c.html line 
>>>>> 672 or 682.  This causes subsolver routine to not converge if the line 
>>>>> search for the subsolver rejects the step length 1. for some
>>>>> update.  In detail:
>>>>> 
>>>>> Suppose the current iterate is xk and the current update is dxk. The line 
>>>>> search evaluates the augmented Lagrangian now at (xk + dxk).  This causes 
>>>>> that the value (xk + dxk) is copied in the current solution.  If the 
>>>>> point (xk + dxk) is rejected, the line search should
>>>>> try the point (xk + alpha * dxk), where alpha < 1.  But due to the 
>>>>> copying, what happens is that the point ((xk + dxk) + alpha * dxk) is 
>>>>> evaluated, see, e.g., 
>>>>> https://petsc.org/release/src/tao/linesearch/impls/armijo/armijo.c.html 
>>>>> line 191.
>>>>> 
>>>>> Best regards
>>>>> Stephan Köhler
>>>>> 
>>>>> -- 
>>>>> Stephan Köhler
>>>>> TU Bergakademie Freiberg
>>>>> Institut für numerische Mathematik und Optimierung
>>>>> 
>>>>> Akademiestraße 6
>>>>> 09599 Freiberg
>>>>> Gebäudeteil Mittelbau, Zimmer 2.07
>>>>> 
>>>>> Telefon: +49 (0)3731 39-3173 (Büro)
>>>>> 
>>>>> <OpenPGP_0xC9BF2C20DFE9F713.asc>
>>> 
>>> -- 
>>> Stephan Köhler
>>> TU Bergakademie Freiberg
>>> Institut für numerische Mathematik und Optimierung
>>> 
>>> Akademiestraße 6
>>> 09599 Freiberg
>>> Gebäudeteil Mittelbau, Zimmer 2.07
>>> 
>>> Telefon: +49 (0)3731 39-3173 (Büro)
>>> <Minimal_example_without_vtk_2.tar.gz><OpenPGP_0xC9BF2C20DFE9F713.asc>
>> 
> 
> -- 
> Stephan Köhler
> TU Bergakademie Freiberg
> Institut für numerische Mathematik und Optimierung
> 
> Akademiestraße 6
> 09599 Freiberg
> Gebäudeteil Mittelbau, Zimmer 2.07
> 
> Telefon: +49 (0)3731 39-3173 (Büro)
> <run_test_tao_neohooke.sh><test_tao_neohooke.cpp><OpenPGP_0xC9BF2C20DFE9F713.asc>

Re: [petsc-users] Report Bug TaoALMM class

Reply via email to