We are trying to write an interface to general purpose optimization solvers for
PDE constrained optimal control problems in Hilbert spaces based on PETSc.
Ipopt or the scipy implementation of L-BFGS work with the Euclidean inner
product. For discretizations of PDE constrained optimization problems in
Hilbert spaces one typically wants to work with inner products of the form x^T
A x (A s.p.d, A typically not equal to the identity matrix).
Not working with the correct inner product typically leads to mesh dependent
behavior of the optimization algorithm, which can be motivated by considering
the optimization problem
min_{x in X} 1/2 ||x||^2_X =: f(x)
where X denotes a Hilbert space. We choose x_0 != 0 and apply a gradient
descent algorithm with step size 1.
Case 1: X = R^d, d in N, || x ||_X ^2 = x^T x:
It holds nabla f(x) = x.
Applying gradient descent with step size 1: x_1 = x_0 - x_0 = 0.
Hence, we have convergence to the optimal solution in 1 iteration.
Case 2: X = H^1(Omega), Omega subset R^d, d in {2, 3},
(as a specific example for X being a infinite dimensional Hilbert
space)
|| x ||_X ^2 = int x(xi) x(xi) dxi + int nabla x(xi) . nabla
x(xi) *dxi
Derivative of f(x) is an element of the dual space of X and given
by
Df(x) (delta x) = int x (xi) delta_x(xi) dx + int nabla x(xi) .
nabla delta_x(xi) *dxi
for delta_x in X
Gradient (Riesz representation of the derivative) is given by
nabla f = x
Hence, gradient descent with step size 1: x_1 = x_0 - x_0 = 0
Case 3: Discretization of Case 2 using finite elements:
f(x) := 1/2 x^T (M + S) x, where M denotes the mass matrix, S the
stiffness matrix,
and x the degrees of freedom of the discretization
Neglecting (by working with the Euclidean inner product) that x
encodes a H^1-function
and just solving the discretized optimization problem with
gradient descent with step size 1 yields
nabla f(x) = (M + S) x
and x_1 = x_0 - (M + S) x_0
where the condition number of (M + S) depends on the mesh size,
i.e. mesh size dependent convergence to optimal solution (and not
in 1 iteration).
One can either respect the correct inner product (in the example x^T (M + S) x)
in the implementation of the optimization algorithms (see e.g.
https://github.com/funsim/moola) or by doing a "discrete hack". The latter
consists of introducing a auxiliary variable y, applying a linear
transformation y --> (L^T)^{-1} y =: x, where (M + S) = L L^T (that is why we
need the action of the inverse of the Cholesky factors (L^{-1} y is needed for
applying the chain rule when computing the derivative)), and optimizing for y
instead of x.
Doing this trick gives for the above example:
tilde f(y) := f(L^-T y) = 1/2 y^T L^{-1} (M + S) L^{-T} y = 1/2 y^T y
and gradient descent in y with step size 1 yields y_1 = y_0 - y_0 = 0.
Therefore, also x_1 = L^{-T} y_1 = 0 (i.e., convergence in 1 iteration).
Johannes
> On 2. Dec 2022, at 13:48, Jose E. Roman <[email protected]> wrote:
>
> As far as I know, the solveForward/Backward operations are not implemented
> for external packages.
>
> What are you trying to do? One may be tempted to use the Cholesky
> decomposition to transform a symmetric-definite generalized eigenvalue
> problem into a standard eigenvalue problem, as is done e.g. in LAPACK
> https://netlib.org/lapack/lug/node54.html - But note that in SLEPc you do not
> need to do this, as EPSSolve() will take care of symmetry using Cholesky
> without having to apply solveForward/Backward separately.
>
> Jose
>
>
>
>
>> El 2 dic 2022, a las 13:32, Johannes Haubner <[email protected]> escribió:
>>
>> Given a s.p.d. matrix A and a right-hand-side b, we want to compute
>> (L^T)^{-1}b and L^{-1}b, where A= L L^T is the Cholesky decomposition of A.
>> In serial, we are able to do this using pc_factor_mat_solver_type petsc and
>> using "solveForward" and "solveBackward" (see
>> https://github.com/JohannesHaubner/petsc-cholesky/blob/main/main.py). In
>> parallel, we are not able to do this yet. According to p. 52
>> https://slepc.upv.es/documentation/slepc.pdf , PETSc's Cholesky
>> factorization is not parallel. However, using "mumps" or "superlu_dist"
>> prevents us from using "solveForward" or "solveBackward". Is there a way of
>> using solveBackward in parallel with a distributed matrix (MPI)?
>