Re: [Jprogramming] lbfgs minimization routine

Robert O'Boyle Tue, 13 Apr 2010 08:42:49 -0700

Hi Piet

Glad to see that you got the lbfgs to work. I had sent an email to the J
Forum a while back but I see that it might not have gotten through. It is
provided below. Like you, I used D: to estimate the partial derivatives. As
I state, I found the function quite slow and the NLLS much faster. I haven't
tried lbfgs on bigger problems but my suspicion is that the calls slow done
the function considerably.


Bob

----------------------
Quite a while back, I got this working with the help of Philip Viton. A
fully worked example is below. For comparison, I did the optimization in
Excel and using the NLLS Levenberg-Marquardt routine that I have developed
(that's in the scripts on the J website). I found the LBFGS very slow
compared to the NLLS. It would be great to have a fast maximum likelihood
optimizer that is fully with J.

Hope that these help.


Bob

load '~addons/math/lbfgs/lbfgs.ijs'
NB.===========================================================
NB.
NB. Von Bertalanfy fit using LBFGS optimizatin
NB. outputs stats matrix & parm vector
NB. solution with EXCEL's SOLVER is linf = 55.9778, k = 0.385595, t0 =
0.171371
NB.
NB.========================================================================
NB. INPUTS
years  =: 1.0 2.0 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3 11.3
length  =: 15.4 26.93 42.23 44.59 47.63 49.67 50.87 52.3 54.77 56.43 55.88
initial  =: 30.0 0.20 0.01
NB.=======================================================================
VON_B =: 3 : 0
'a b c' =. y
l_pred =. a * (1 -^ - (b * years - c))
rss  =. e +/ .* e =. length - l_pred
n2  =: ($years)%2
NB. sd  =: %: var =: rss%$e
NB. (($years)%2)*((^.(o.2))+(2*^.sd) +1)
NB. (($years)%2)*((^.(o.2))+(^.var) +1)
NB. (n2*(^.rss))-(n2*^.$years)-(n2*(^.o.2))+n2
NB. (n2*(^.var))+(n2*(^.o.2))+n2
n2*(^.rss)
)
NB.========================================================================
DERIVLOGL =: 3 : 0
epsd   =.  1e_5
f0    =. OBJ_FN y
grad   =. 0$0
for_i. i.#y do.
  yy   =. (epsd+i{y) i } y
  f1   =. OBJ_FN yy
  yyy  =.((-epsd)+i{y) i } y
  f2   =. OBJ_FN yyy
  grad   =. grad,(f1-f2)%+:epsd
end.
)
LBFGS  =: 4 : 0
OBJ_FN  =: (<x)`:6
n   =. $parm =. y       NB. length n; number of parameters
m   =. 5         NB. length 1; number of corrections used in BFGS update.
Should be between 3 and 7
diagco  =.  2-2         NB. length 1; Logical variable; set at 0; if 1.0,
need to provide diagonal matrix
diag  =. n$2.2-2.2       NB. length n$0; provide double array if diagco 1
iprint  =.  2 2-2 2        NB. length 2; only 0 0 implemented 
eps   =.  1.0e_5        NB. Accuracy of solution: grad < eps (max of 1,
fun_parm)
machine  =.  1.0e_16        NB. estimate of machine presision
w   =.  (n*(2*m+1)+2*m) $ 2.2-2.2   NB. workspace size for LBFGS
iflag  =.  2-2         NB. initially set to 0; codes which indicate
performance of routine
niter  =. 0
while. niter <: 1000 do.
fun_parm  =.  OBJ_FN parm       NB. Value of objective function at parm
grad =. ((0.00001*parm) + (0.00000001*parm = 0)) OBJ_FN D: 1 parm  NB.
length n; gradient of objective function at parm
NB. grad  =.  DERIVLOGL parm 
'n m parm fun_parm grad diagco diag iprint eps machine w iflag'=.r=.'lbfgs_'
call (n;m;parm;fun_parm;grad;diagco;diag;iprint;eps;machine;w;iflag)
 if. iflag <: 0 do. break. end.
 output  =: niter, ; 3 2 4{r NB. iter, value, point, grad
niter  =. >: niter
end.
)
'VON_B' LBFGS initial
--------------------------------------------




-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Piet de Jong
Sent: April 11, 2010 11:08 PM
To: Programming forum
Subject: Re: [Jprogramming] lbfgs minimization routine

I worked more on this and the code below seems to do things
Instead of my own numerical derivatives I let J itself provide
the derivatives (this seems to work well on situations I have tried),
Anyway below "minl" is an adverb which when applied to
function F and given a starting value for the function, X, returns
the the minimum, minimizer and partial derivatives at the minimum, as
well as the number of iterations.

minl=: 1 : 0   NB.  lbfgs minimization
  f=.u
  N=.#X=:2.2+(,y)-2.2  NB. ensures X is real
  M=.5                 NB. number of corrections
  'DIAGCO IFLAG'=.2-2  NB. logical set to 0=false
  DIAG=.N$2.2-2.2      NB. real set to 0
  IPRINT=.2 2-2 2      NB. integer: only 0 0 permitted
  'EPS XTOL'=.1.0e_5 1.0e_16    NB. real epsilon and tolerance
  W=.((N*(>:+:M))++:M)$2.2-2.2  NB. real work vector
  iter=.0
  whilst. (IFLAG>0)*.(iter<:2000) do. iter=.>:iter
    'F G'=. (f;f D. 1) X   NB. function and derivative value
    'N M X F G DIAGCO DIAG IPRINT EPS XTOL W IFLAG'=.'lbfgs_' call
(N;M;X;F;G;DIAGCO;DIAG;IPRINT;EPS;XTOL;W;IFLAG)
  end.
  F;X;G;iter
)

The "call" is as is in the lbfgs addon package.



On Thu, Apr 8, 2010 at 6:17 PM, Piet de Jong <[email protected]> wrote:
> Thanks for looking at this.
> What confused me was the "larger" dimension on X etc
> For example the code working off
> NDIM=. 2000
>
> Anyway after a lot of fiddling I seem to have it working at least with
> simple examples,
> including with the use of  numerical rather than exact derivatives.
> Here is my "test" case
> using N=4 with the sum of squares function centered on an (arbitrary)
origin.
>
> NB.  Df is exact derivative, while Dfh is numerical derivative using h
>
> mtr=: 3 : 0   NB.  lbfgs test
>  'N M'=.4 5           NB. dimension of X, number of corrections
>  f=.+/@:*:@:-&3 _2002 800 1 NB. example sum of squares  function
>
>  Df=.f D. 1           NB. exact derivative
>  h=.1.0e_8            NB. h for numerical derivative
>  H=.(h&*)@=...@i.@#      NB. diag h
>  Dfh=.-:@(%&h)@:((+(-&f)-)H) NB. numerical derivative
>
>  'DIAGCO IFLAG'=.2-2  NB. logical set to 0=false
>  DIAG=.N$2.2-2.2      NB. real set to 0
>  IPRINT=.2 2-2 2      NB. integer: only 0 0 permitted
>  'EPS XTOL'=.1.0e_5 1.0e_16    NB. real epsilon and tolerance
>  W=.((N*(>:+:M))++:M)$2.2-2.2  NB. real work vector
>
>  X=.N$400.2-2.2      NB. initialize X
>  iter=.0             NB. initialize iter
>  whilst. (IFLAG>0)*.(iter<:2000) do. iter=.>:iter
>    'F G'=. (f;Dfh) X   NB. funtion and derivative value -- use Dfh
> for numerical derivative
>    'N M X F G DIAGCO DIAG IPRINT EPS XTOL W IFLAG'=.'lbfgs_' call
> (N;M;X;F;G;DIAGCO;DIAG;IPRINT;EPS;XTOL;W;IFLAG)
>  end.
>  iter;(f;])X
> )
>
>
>
>
>
> On Thu, Apr 8, 2010 at 7:08 AM, Raul Miller <[email protected]> wrote:
>> On Wed, Apr 7, 2010 at 4:38 PM, Piet de Jong <[email protected]> wrote:
>>> Thanks for the response.   I downloaded the routine using the package
manager
>>> and J6.02.
>>>
>>> What I don't get is that the in the example lbfgs_test the routine is
>>> called with parameter
>>> values that do not correspond to what is stated in the provided
>>> FORTRAN description of the calling sequence.
>>
>> So... looking at math/lbfgs, I see:
>> path=: jpath '~addons\math\lbfgs\'
>> dll=:'"',path,'lbfgs.dll" '
>> ...
>> call=: 4 : 0
>> xx=. dll,x,' + i ',(+:#y)$' *'
>> r=. xx cd LASTIN=: , each y
>> ...
>>
>> and, I see:
>>
>> 'N M X F G DIAGCO DIAG IPRINT EPS XTOL W IFLAG'=.r=.'lbfgs_' call
>> (N;M;X;F;G;DIAGCO;DIAG;IPRINT;EPS;XTOL;W;IFLAG)
>>
>> So that means that the calling signature will be
>>
>>  "...lbfgs.dll" lbfgs_ + i * * * * * * * * * * * *
>>
>> According to http://www.jsoftware.com/help/user/call_procedure.htm
>> this means that under windows this routine uses the __cdecl calling
>> convention, returns a 4 byte integer result and takes 12 pointers as
>> arguments.
>>
>> I modified lbfgs_test to capture the data being passed to "call" and
>> when I interrupted the program the values were all one dimension
>> arrays, with dimensions:
>> N: 1
>> M: 1
>> X: 2000
>> F: 1
>> G: 2000
>> DIAGCO: 1
>> DIAG: 2000
>> IPRINT: 2
>> EPS: 1
>> XTOL: 1
>> W: 60000
>> IFLAG: 1
>>
>> Now... I do not know how lbfgs works, and I do not know
>> if there is some windows magic that happens to pass the
>> size of these arrays in a __cdecl call, or if the pointers to
>> these values are treated as pointers regions of memory where
>> extra values off the end would be ignored.
>>
>> But it very much looks like this code is directly calling
>> lbfgs.dll and not doing much of any thing special to
>> the data on its way out.
>>
>>> In particular and as an example, the dimension on X does not appear to
>>> be N but something much bigger.   Hence the example is hard, if not
>>> impossible to follow.   It might be useful to have a much simpler
>>> example.   I still can't get a simple example to work since I don't
>>> appear to understand the calling parameters.
>>
>> My current best guess is that (if N is supposed to be the
>> number of elements provided by X), values after the first 100 (the
>> value for N) are ignored by lbfgs.dll.
>>
>> Do you see anything which would contradict this point of view?
>>
>> Thanks,
>>
>> --
>> Raul
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
>
> --
> Piet de Jong
> --------------------------------------------------
> View my current research at
> http://ssrn.com/author=619154
> --------------------------------------------------
>



-- 
Piet de Jong
--------------------------------------------------
View my current research at
http://ssrn.com/author=619154
--------------------------------------------------
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] lbfgs minimization routine

Reply via email to