Hello.
Sorry for the long problem description. I implemented a radial basis function network for non-linear regression with adaptive centers and adaptive basis shapes (diagonal covariance matrix) using the Levenberg-Marquardt solver (org.apache.commons.math3.optim.nonlinear.vector.jacobian.LevenbergMarquardtOptimizer) and the ModelFunction's DerivativeStructure[] value(DerivativeStructure[] x)function using the DerivativeStructure API so that the derivatives arecomputed analytically. For a reasonable sized network with 200 radial bases, the number of parameters is (200 /* # bases */ +1 /* bias */ +((dim /* center of 1 basis */ + dim /* shape parameters of 1 basis */)*200)) where "dim" is the dimension of the input vectors. This results in a few hundred free parameters. For small amounts of data, everything works fine. But in problems with high-dimensional input, I sometimes use tens of thousands (or even hundreds of thousands) of trainingsamples. Unfortunately, with this much training data, I receive eithera Java Heap Error or a Garbage Collection Error (in the middle of differentiation). The main problem seems to be that the optimizer expects the ModelFunction to return a vector evaluating all of the training samples to compare with the Target instance passed in as OptimizationData. For regular evaluation this isn't to much of a problem, but the memory used by the DerivativeStructure instances(spread out over a few hundred parameters times 10,000 evaluations) ismassive.
I am not sure I understand what you mean by "times 10000 evaluations":
Only a few evaluations (2, I think, for the LM algorithm) are kept in
memory at each iteration (then discarded at the next iteration).
I think that the "DerivativeStructure" is pretty much optimized (if you
store only what you really need).
The problem, as you indicate, probably comes for the large number of
observations ("target"), which are obviously required by the large
number of parameters.
Is there any way to get the solver to evaluate the residuals/gradient incrementally?
The LM algorithm uses the Jacobian matrix whose number of entries is the product of the number of elements in "target" and the number of parameters. IIUC, what you suggest amounts to change the algorithm (so that it would use only part of the observations). Could you perhaps try the "NonLinearConjugateGradientOptimizer"? Regards, Gilles --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
