Re: [math] Large-Scale Optimization

Gilles Mon, 29 Jul 2013 04:30:20 -0700

Hello.


Sorry for the long problem description.

I implemented a radial basis function network for non-linear
regression with adaptive centers and adaptive basis shapes (diagonal
covariance matrix) using the Levenberg-Marquardt solver

(org.apache.commons.math3.optim.nonlinear.vector.jacobian.LevenbergMarquardtOptimizer)
and the ModelFunction's
DerivativeStructure[] value(DerivativeStructure[] x)

function using the DerivativeStructure API so that the derivativesare

computed analytically.

For a reasonable sized network with 200 radial bases, the number of
parameters is

(200 /* # bases */ +1 /* bias */ +((dim /* center of 1 basis */ + dim
/* shape parameters of 1 basis */)*200))

where "dim" is the dimension of the input vectors. This results in a
few hundred free parameters. For small amounts of data, everything
works fine. But in problems with high-dimensional input, I sometimes
use tens of thousands (or even hundreds of thousands) of training

samples. Unfortunately, with this much training data, I receiveeither

a Java Heap Error or a Garbage Collection Error (in the middle of
differentiation).

The main problem seems to be that the optimizer expects the
ModelFunction to return a vector evaluating all of the training
samples to compare with the Target instance passed in as
OptimizationData. For regular evaluation this isn't to much of a
problem, but the memory used by the DerivativeStructure instances

(spread out over a few hundred parameters times 10,000 evaluations)is

massive.


I am not sure I understand what you mean by "times 10000 evaluations":
Only a few evaluations (2, I think, for the LM algorithm) are kept in
memory at each iteration (then discarded at the next iteration).

I think that the "DerivativeStructure" is pretty much optimized (if you
store only what you really need).

The problem, as you indicate, probably comes for the large number of
observations ("target"), which are obviously required by the large
number of parameters.

Is there any way to get the solver to evaluate the residuals/gradient
incrementally?


The LM algorithm uses the Jacobian matrix whose number of entries is
the product of the number of elements in "target" and the number of
parameters. IIUC, what you suggest amounts to change the algorithm
(so that it would use only part of the observations).

Could you perhaps try the "NonLinearConjugateGradientOptimizer"?


Regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [math] Large-Scale Optimization

Reply via email to