Re: [petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues
It was indeed a mass scaling issue. We have to project the CADJ derived gradient to the corresponding FE space again. Testing hand-coded gradient (hc) against finite difference gradient (fd), if the ratio ||fd - hc|| / ||hc|| is 0 (1.e-8), the hand-coded gradient is probably correct. Run with -tao_test_display to show difference between hand-coded and finite difference gradient. ||fd|| 0.000150841, ||hc|| = 0.000150841, angle cosine = (fd'hc)/||fdhc|| = 1. 2-norm ||fd-hc||/max(||hc||,||fd||) = 4.48554e-06, difference ||fd-hc|| = 6.76604e-10 max-norm ||fd-hc||/max(||hc||,||fd||) = 4.99792e-06, difference ||fd-hc|| = 1.88044e-10 ||fd|| 0.000386312, ||hc|| = 0.000386312, angle cosine = (fd'hc)/||fdhc|| = 1. 2-norm ||fd-hc||/max(||hc||,||fd||) = 1.14682e-05, difference ||fd-hc|| = 4.4303e-09 max-norm ||fd-hc||/max(||hc||,||fd||) = 1.56645e-05, difference ||fd-hc|| = 1.49275e-09 ||fd|| 8.46797e-05, ||hc|| = 8.46797e-05, angle cosine = (fd'hc)/||fdhc|| = 1. 2-norm ||fd-hc||/max(||hc||,||fd||) = 2.63488e-06, difference ||fd-hc|| = 2.2312e-10 max-norm ||fd-hc||/max(||hc||,||fd||) = 2.7873e-06, difference ||fd-hc|| = 5.58718e-11 Thank you all for the quick responses and input again! On 2017-11-23 09:29, Julian Andrej wrote: On 2017-11-22 16:27, Emil Constantinescu wrote: On 11/22/17 3:48 AM, Julian Andrej wrote: Hello, we prepared a small example which computes the gradient via the continuous adjoint method of a heating problem with a cost functional. We implemented the text book example and tested the gradient via a Taylor Remainder (which works fine). Now we wanted to solve the optimization problem with TAO and checked the gradient vs. the finite difference gradient and run into problems. Testing hand-coded gradient (hc) against finite difference gradient (fd), if the ratio ||fd - hc|| / ||hc|| is 0 (1.e-8), the hand-coded gradient is probably correct. Run with -tao_test_display to show difference between hand-coded and finite difference gradient. ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = (fd'hc)/||fdhc|| = 0.99768 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = 0.00973464 max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference ||fd-hc|| = 0.00243363 ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = (fd'hc)/||fdhc|| = 0.997609 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = 0.0253185 max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference ||fd-hc|| = 0.00624562 ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = (fd'hc)/||fdhc|| = 0.997338 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| = 0.00585376 max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference ||fd-hc|| = 0.00137836 Despite these differences we achieve convergence with our hand coded gradient, but have to use -tao_ls_type unit. Both give similar (assume descent) directions, but seem to be scaled differently. It could be a bad scaling by the mass matrix somewhere in the continuous adjoint. This could be seen if you plot them side by side as a quick diagnostic. I visualized and attached the two gradients. The CADJ is hand coded and the DADJ is from pyadjoint which is the same as the finite difference gradient from TAO. If the attachement gets lost in the mailing list,, here is a direct link [1] [1] https://cloud.tf.uni-kiel.de/index.php/s/nmiNOoI213dx1L1 Emil $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_gatol 1e-7 -tao_ls_type unit iter = 0, Function value: 0.000316722, Residual: 0.00126285 iter = 1, Function value: 3.82272e-05, Residual: 0.000438094 iter = 2, Function value: 1.26011e-07, Residual: 8.4194e-08 Tao Object: 1 MPI processes type: blmvm Gradient steps: 0 TaoLineSearch Object: 1 MPI processes type: unit Active Set subset type: subvec convergence tolerances: gatol=1e-07, steptol=0., gttol=0. Residual in Function/Gradient:=8.4194e-08 Objective value=1.26011e-07 total number of iterations=2, (max: 2000) total number of function/gradient evaluations=3, (max: 4000) Solution converged: ||g(X)|| <= gatol $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_fd_gradient iter = 0, Function value: 0.000316722, Residual: 4.87343e-06 iter = 1, Function value: 0.000195676, Residual: 3.83011e-06 iter = 2, Function value: 1.26394e-07, Residual: 1.60262e-09 Tao Object: 1 MPI processes type: blmvm Gradient steps: 0 TaoLineSearch Object: 1 MPI processes type: more-thuente Active Set subset type: subvec convergence tolerances: gatol=1e-08, steptol=0., gttol=0. Residual in Function/Gradient:=1.60262e-09 Objective value=1.26394e-07 total number of iterations=2, (max: 2000) total number of function/gradient evaluations=3474, (max: 4000) Solution converged: ||g(X)|| <= gatol We
Re: [petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues
Hi Julian, If I remember correctly, you have a code that worked fine with discrete adjoint (TSAdjoint). Was it for the same example? If so, how are the differences in the validation output between continuous adjoint and discrete adjoint? Hong (Mr.) > On Nov 22, 2017, at 3:48 AM, Julian Andrejwrote: > > Hello, > > we prepared a small example which computes the gradient via the continuous > adjoint method of a heating problem with a cost functional. > > We implemented the text book example and tested the gradient via a Taylor > Remainder (which works fine). Now we wanted to solve the > optimization problem with TAO and checked the gradient vs. the finite > difference gradient and run into problems. > > Testing hand-coded gradient (hc) against finite difference gradient (fd), if > the ratio ||fd - hc|| / ||hc|| is > 0 (1.e-8), the hand-coded gradient is probably correct. > Run with -tao_test_display to show difference > between hand-coded and finite difference gradient. > ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = (fd'hc)/||fdhc|| > = 0.99768 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = > 0.00973464 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference ||fd-hc|| = > 0.00243363 > ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = (fd'hc)/||fdhc|| = > 0.997609 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = > 0.0253185 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference ||fd-hc|| = > 0.00624562 > ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = (fd'hc)/||fdhc|| > = 0.997338 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| = > 0.00585376 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference ||fd-hc|| = > 0.00137836 > > Despite these differences we achieve convergence with our hand coded > gradient, but have to use -tao_ls_type unit. > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_gatol 1e-7 > -tao_ls_type unit > iter = 0, Function value: 0.000316722, Residual: 0.00126285 > iter = 1, Function value: 3.82272e-05, Residual: 0.000438094 > iter = 2, Function value: 1.26011e-07, Residual: 8.4194e-08 > Tao Object: 1 MPI processes > type: blmvm > Gradient steps: 0 > TaoLineSearch Object: 1 MPI processes >type: unit > Active Set subset type: subvec > convergence tolerances: gatol=1e-07, steptol=0., gttol=0. > Residual in Function/Gradient:=8.4194e-08 > Objective value=1.26011e-07 > total number of iterations=2, (max: 2000) > total number of function/gradient evaluations=3, (max: 4000) > Solution converged:||g(X)|| <= gatol > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_fd_gradient > iter = 0, Function value: 0.000316722, Residual: 4.87343e-06 > iter = 1, Function value: 0.000195676, Residual: 3.83011e-06 > iter = 2, Function value: 1.26394e-07, Residual: 1.60262e-09 > Tao Object: 1 MPI processes > type: blmvm > Gradient steps: 0 > TaoLineSearch Object: 1 MPI processes >type: more-thuente > Active Set subset type: subvec > convergence tolerances: gatol=1e-08, steptol=0., gttol=0. > Residual in Function/Gradient:=1.60262e-09 > Objective value=1.26394e-07 > total number of iterations=2, (max: 2000) > total number of function/gradient evaluations=3474, (max: 4000) > Solution converged:||g(X)|| <= gatol > > > We think, that the finite difference gradient should be in line with our hand > coded gradient for such a simple example. > > We appreciate any hints on debugging this issue. It is implemented in python > (firedrake) and i can provide the code if this is needed. > > Regards > Julian
Re: [petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues
Just to add on Emil's answer: being the adjoint ode linear, you may either being not properly scaling the initial condition (if your objective is a final value one) or the adjoint forcing (i.e. the gradient wrt the state of the objective function if you have a cost gradient) 2017-11-22 18:34 GMT+03:00 Smith, Barry F.: > > > > On Nov 22, 2017, at 3:48 AM, Julian Andrej wrote: > > > > Hello, > > > > we prepared a small example which computes the gradient via the > continuous adjoint method of a heating problem with a cost functional. > >Julian, > > The first thing to note is that the continuous adjoint is not exactly > the same as the adjoint for the actual algebraic system you are solving. > (It is only, as I understand it possibly the same in the limit with very > fine mesh and time step). Thus you would not actually expect these to match > with PETSc fd. Now as your refine space/time do the numbers get closer to > each other? > > Note the angle cosine is very close to one which means that they are > producing the same search direction, just different lengths. > >How is the convergence of the solver if you use -tao_fd_gradient do you > still need unit. > > > but have to use -tao_ls_type unit. > >This is slightly odd, because this line search always just takes the > full step, the other ones would normally be better since they are more > sophisticated in picking the step size. Please run without the -tao_ls_type > unit. and send the output > >Also does your problem have bound constraints? If not use -tao_type > lmvm and send the output. > >Just saw Emil's email, yes there could easily be a scaling issue with > your continuous adjoint. > > Barry > > > > > > > We implemented the text book example and tested the gradient via a > Taylor Remainder (which works fine). Now we wanted to solve the > > optimization problem with TAO and checked the gradient vs. the finite > difference gradient and run into problems. > > > > Testing hand-coded gradient (hc) against finite difference gradient > (fd), if the ratio ||fd - hc|| / ||hc|| is > > 0 (1.e-8), the hand-coded gradient is probably correct. > > Run with -tao_test_display to show difference > > between hand-coded and finite difference gradient. > > ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = > (fd'hc)/||fdhc|| = 0.99768 > > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = > 0.00973464 > > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference ||fd-hc|| = > 0.00243363 > > ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = > (fd'hc)/||fdhc|| = 0.997609 > > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = > 0.0253185 > > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference ||fd-hc|| = > 0.00624562 > > ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = > (fd'hc)/||fdhc|| = 0.997338 > > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| = > 0.00585376 > > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference ||fd-hc|| = > 0.00137836 > > > > Despite these differences we achieve convergence with our hand coded > gradient, but have to use -tao_ls_type unit. > > > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_gatol > 1e-7 -tao_ls_type unit > > iter = 0, Function value: 0.000316722, Residual: 0.00126285 > > iter = 1, Function value: 3.82272e-05, Residual: 0.000438094 > > iter = 2, Function value: 1.26011e-07, Residual: 8.4194e-08 > > Tao Object: 1 MPI processes > > type: blmvm > > Gradient steps: 0 > > TaoLineSearch Object: 1 MPI processes > >type: unit > > Active Set subset type: subvec > > convergence tolerances: gatol=1e-07, steptol=0., gttol=0. > > Residual in Function/Gradient:=8.4194e-08 > > Objective value=1.26011e-07 > > total number of iterations=2, (max: 2000) > > total number of function/gradient evaluations=3, (max: 4000) > > Solution converged:||g(X)|| <= gatol > > > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor > -tao_fd_gradient > > iter = 0, Function value: 0.000316722, Residual: 4.87343e-06 > > iter = 1, Function value: 0.000195676, Residual: 3.83011e-06 > > iter = 2, Function value: 1.26394e-07, Residual: 1.60262e-09 > > Tao Object: 1 MPI processes > > type: blmvm > > Gradient steps: 0 > > TaoLineSearch Object: 1 MPI processes > >type: more-thuente > > Active Set subset type: subvec > > convergence tolerances: gatol=1e-08, steptol=0., gttol=0. > > Residual in Function/Gradient:=1.60262e-09 > > Objective value=1.26394e-07 > > total number of iterations=2, (max: 2000) > > total number of function/gradient evaluations=3474, (max: 4000) > > Solution converged:||g(X)|| <= gatol > > > > > > We think, that the finite difference gradient should be in line with our > hand coded gradient for such a simple example.
Re: [petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues
> On Nov 22, 2017, at 3:48 AM, Julian Andrejwrote: > > Hello, > > we prepared a small example which computes the gradient via the continuous > adjoint method of a heating problem with a cost functional. Julian, The first thing to note is that the continuous adjoint is not exactly the same as the adjoint for the actual algebraic system you are solving. (It is only, as I understand it possibly the same in the limit with very fine mesh and time step). Thus you would not actually expect these to match with PETSc fd. Now as your refine space/time do the numbers get closer to each other? Note the angle cosine is very close to one which means that they are producing the same search direction, just different lengths. How is the convergence of the solver if you use -tao_fd_gradient do you still need unit. > but have to use -tao_ls_type unit. This is slightly odd, because this line search always just takes the full step, the other ones would normally be better since they are more sophisticated in picking the step size. Please run without the -tao_ls_type unit. and send the output Also does your problem have bound constraints? If not use -tao_type lmvm and send the output. Just saw Emil's email, yes there could easily be a scaling issue with your continuous adjoint. Barry > > We implemented the text book example and tested the gradient via a Taylor > Remainder (which works fine). Now we wanted to solve the > optimization problem with TAO and checked the gradient vs. the finite > difference gradient and run into problems. > > Testing hand-coded gradient (hc) against finite difference gradient (fd), if > the ratio ||fd - hc|| / ||hc|| is > 0 (1.e-8), the hand-coded gradient is probably correct. > Run with -tao_test_display to show difference > between hand-coded and finite difference gradient. > ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = (fd'hc)/||fdhc|| > = 0.99768 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = > 0.00973464 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference ||fd-hc|| = > 0.00243363 > ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = (fd'hc)/||fdhc|| = > 0.997609 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = > 0.0253185 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference ||fd-hc|| = > 0.00624562 > ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = (fd'hc)/||fdhc|| > = 0.997338 > 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| = > 0.00585376 > max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference ||fd-hc|| = > 0.00137836 > > Despite these differences we achieve convergence with our hand coded > gradient, but have to use -tao_ls_type unit. > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_gatol 1e-7 > -tao_ls_type unit > iter = 0, Function value: 0.000316722, Residual: 0.00126285 > iter = 1, Function value: 3.82272e-05, Residual: 0.000438094 > iter = 2, Function value: 1.26011e-07, Residual: 8.4194e-08 > Tao Object: 1 MPI processes > type: blmvm > Gradient steps: 0 > TaoLineSearch Object: 1 MPI processes >type: unit > Active Set subset type: subvec > convergence tolerances: gatol=1e-07, steptol=0., gttol=0. > Residual in Function/Gradient:=8.4194e-08 > Objective value=1.26011e-07 > total number of iterations=2, (max: 2000) > total number of function/gradient evaluations=3, (max: 4000) > Solution converged:||g(X)|| <= gatol > > $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_fd_gradient > iter = 0, Function value: 0.000316722, Residual: 4.87343e-06 > iter = 1, Function value: 0.000195676, Residual: 3.83011e-06 > iter = 2, Function value: 1.26394e-07, Residual: 1.60262e-09 > Tao Object: 1 MPI processes > type: blmvm > Gradient steps: 0 > TaoLineSearch Object: 1 MPI processes >type: more-thuente > Active Set subset type: subvec > convergence tolerances: gatol=1e-08, steptol=0., gttol=0. > Residual in Function/Gradient:=1.60262e-09 > Objective value=1.26394e-07 > total number of iterations=2, (max: 2000) > total number of function/gradient evaluations=3474, (max: 4000) > Solution converged:||g(X)|| <= gatol > > > We think, that the finite difference gradient should be in line with our hand > coded gradient for such a simple example. > > We appreciate any hints on debugging this issue. It is implemented in python > (firedrake) and i can provide the code if this is needed. > > Regards > Julian
Re: [petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues
On 11/22/17 3:48 AM, Julian Andrej wrote: Hello, we prepared a small example which computes the gradient via the continuous adjoint method of a heating problem with a cost functional. We implemented the text book example and tested the gradient via a Taylor Remainder (which works fine). Now we wanted to solve the optimization problem with TAO and checked the gradient vs. the finite difference gradient and run into problems. Testing hand-coded gradient (hc) against finite difference gradient (fd), if the ratio ||fd - hc|| / ||hc|| is 0 (1.e-8), the hand-coded gradient is probably correct. Run with -tao_test_display to show difference between hand-coded and finite difference gradient. ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = (fd'hc)/||fdhc|| = 0.99768 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = 0.00973464 max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference ||fd-hc|| = 0.00243363 ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = (fd'hc)/||fdhc|| = 0.997609 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| = 0.0253185 max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference ||fd-hc|| = 0.00624562 ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = (fd'hc)/||fdhc|| = 0.997338 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| = 0.00585376 max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference ||fd-hc|| = 0.00137836 Despite these differences we achieve convergence with our hand coded gradient, but have to use -tao_ls_type unit. Both give similar (assume descent) directions, but seem to be scaled differently. It could be a bad scaling by the mass matrix somewhere in the continuous adjoint. This could be seen if you plot them side by side as a quick diagnostic. Emil $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_gatol 1e-7 -tao_ls_type unit iter = 0, Function value: 0.000316722, Residual: 0.00126285 iter = 1, Function value: 3.82272e-05, Residual: 0.000438094 iter = 2, Function value: 1.26011e-07, Residual: 8.4194e-08 Tao Object: 1 MPI processes type: blmvm Gradient steps: 0 TaoLineSearch Object: 1 MPI processes type: unit Active Set subset type: subvec convergence tolerances: gatol=1e-07, steptol=0., gttol=0. Residual in Function/Gradient:=8.4194e-08 Objective value=1.26011e-07 total number of iterations=2, (max: 2000) total number of function/gradient evaluations=3, (max: 4000) Solution converged: ||g(X)|| <= gatol $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor -tao_fd_gradient iter = 0, Function value: 0.000316722, Residual: 4.87343e-06 iter = 1, Function value: 0.000195676, Residual: 3.83011e-06 iter = 2, Function value: 1.26394e-07, Residual: 1.60262e-09 Tao Object: 1 MPI processes type: blmvm Gradient steps: 0 TaoLineSearch Object: 1 MPI processes type: more-thuente Active Set subset type: subvec convergence tolerances: gatol=1e-08, steptol=0., gttol=0. Residual in Function/Gradient:=1.60262e-09 Objective value=1.26394e-07 total number of iterations=2, (max: 2000) total number of function/gradient evaluations=3474, (max: 4000) Solution converged: ||g(X)|| <= gatol We think, that the finite difference gradient should be in line with our hand coded gradient for such a simple example. We appreciate any hints on debugging this issue. It is implemented in python (firedrake) and i can provide the code if this is needed. Regards Julian