Re: [R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

2015-11-15 Thread ProfJCNash
Agreed on the default algorithm issue. That is important for users to
know, and I'm happy to underline it. Also that CG (which is based on one
of my codes) should be deprecated. BFGS (also based on one of my codes
from long ago) does much better than I would ever have expected.

Over the years I've tried different Nelder-Mead implementations. Cannot
say I've found any that is always better than that in optim() (also
based on an old code of mine), though nmkb() from dfoptim package seems
to do better a lot of the time, and it has a transformation method for
bounds, which may be useful, but does have the awkwardness that one
cannot start on a bound. For testing a function, I don't think it makes
a lot of difference which variant of NM one uses if the trace is on to
catch never-ending runs. For production use, it is a really good idea to
try different methods on a sample of likely cases and choose a method
that does well. That is the motivation for the optimx package or the
opm() function of the newer optimz (on R-forge) that I'm still
polishing. optimz has a function optimr() that has the same call as
optim() but incorporates over a dozen optimizers via method = "(selected
method)".

As a gradient-free choice, the Powell codes from minqa or other packages
(there are several implementations) can sometimes have spectacular
performance, but they also flub rather more regularly than Nelder-Mead
in my experience. That is, when they are good, they are very very good,
and when they are not they are horrid. (Plagiarism!)

JN

On 15-11-15 12:46 PM, Ravi Varadhan wrote:
> Hi John,
> My main point is not about Nelder-Mead per se.  It is *primarily* about the 
> Nelder-Mead implementation in optim().  
> 
> The users of optim() should be cautioned regarding the default algorithm and 
> that they should consider alternatives such as "BFGS" in optim(), or other 
> implementations of Nelder-Mead.
> 
> Best regards,
> Ravi
> 
> From: ProfJCNash 
> Sent: Sunday, November 15, 2015 12:21 PM
> To: Ravi Varadhan; 'r-help@r-project.org'; lorenzo.ise...@gmail.com
> Cc: b...@xs4all.nl; Gabor Grothendieck
> Subject: Re: Cautioning optim() users about "Nelder-Mead" default - 
> (originally) Optim instability
> 
> Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is
> "bad" per se. It's issues are that it assumes the parameters are all on
> the same scale, and the termination (not convergence) test can't use
> gradients, so it tends to get "near" the optimum very quickly -- say
> only 10% of the computational effort -- then spends an awful amount of
> effort deciding it's got there. It often will do poorly when the
> function has nearly "flat" zones e.g., long valley with very low slope.
> 
> So my message is still that Nelder-Mead is an unfortunate default -- it
> has been chosen I believe because it is generally robust and doesn't
> need gradients. BFGS really should use accurate gradients, preferably
> computed analytically, so it would only be a good default in that case
> or with very good approximate gradients (which are costly
> computationally).
> 
> However, if you understand what NM is doing, and use it accordingly, it
> is a valuable tool. I generally use it as a first try BUT turn on the
> trace to watch what it is doing as a way to learn a bit about the
> function I am minimizing. Rarely would I use it as a production minimizer.
> 
> Best, JN
> 
> On 15-11-15 12:02 PM, Ravi Varadhan wrote:
>> Hi,
>>
>>
>>
>> While I agree with the comments about paying attention to parameter
>> scaling, a major issue here is that the default optimization algorithm,
>> Nelder-Mead, is not very good.  It is unfortunate that the optim
>> implementation chose this as the "default" algorithm.  I have several
>> instances where people have come to me with poor results from using
>> optim(), because they did not realize that the default algorithm is
>> bad.  We (John Nash and I) have pointed this out before, but the R core
>> has not addressed this issue due to backward compatibility reasons.
>>
>>
>>
>> There is a better implementation of Nelder-Mead in the "dfoptim" package.
>>
>>
>>
>> ​require(dfoptim)
>>
>> mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)
>>
>> mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)
>>
>> mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)
>>
>> print(mm_def1$par)
>>
>> print(mm_def2$par)
>>
>> print(mm_def3$par)
>>
>>
>>
>> In general, better implementations of optimization algorithms are
>> available in packages such as "optimx", "nloptr".  It is unfortunate
>> that most naïve users of optimization in R do not recognize this.
>> Perhaps, there should be a "message" in the optim help file that points
>> this out to the users.
>>
>>
>>
>> Hope this is helpful,
>>
>> Ravi
>>
>>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see

Re: [R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

2015-11-15 Thread Mark Leeds
and just to add to john's comments, since he's too modest, in my
experience,  the algorithm in the rvmmin  package ( written by john ) shows
great improvement compared to the L-BFGS-B  algorithm so I don't use
L-BFGS-B anymore.  L-BFGS-B often has a dangerous convergence issue  in
that it can claim to converge when it hasn't. which, to
me is worse than not converging.  Most likely it has to do with the link
below which was pointed out to me by john a while back.

http://www.ece.northwestern.edu/~morales/PSfiles/acm-remark.pdf


On Sun, Nov 15, 2015 at 2:41 PM, ProfJCNash  wrote:

> Agreed on the default algorithm issue. That is important for users to
> know, and I'm happy to underline it. Also that CG (which is based on one
> of my codes) should be deprecated. BFGS (also based on one of my codes
> from long ago) does much better than I would ever have expected.
>
> Over the years I've tried different Nelder-Mead implementations. Cannot
> say I've found any that is always better than that in optim() (also
> based on an old code of mine), though nmkb() from dfoptim package seems
> to do better a lot of the time, and it has a transformation method for
> bounds, which may be useful, but does have the awkwardness that one
> cannot start on a bound. For testing a function, I don't think it makes
> a lot of difference which variant of NM one uses if the trace is on to
> catch never-ending runs. For production use, it is a really good idea to
> try different methods on a sample of likely cases and choose a method
> that does well. That is the motivation for the optimx package or the
> opm() function of the newer optimz (on R-forge) that I'm still
> polishing. optimz has a function optimr() that has the same call as
> optim() but incorporates over a dozen optimizers via method = "(selected
> method)".
>
> As a gradient-free choice, the Powell codes from minqa or other packages
> (there are several implementations) can sometimes have spectacular
> performance, but they also flub rather more regularly than Nelder-Mead
> in my experience. That is, when they are good, they are very very good,
> and when they are not they are horrid. (Plagiarism!)
>
> JN
>
> On 15-11-15 12:46 PM, Ravi Varadhan wrote:
> > Hi John,
> > My main point is not about Nelder-Mead per se.  It is *primarily* about
> the Nelder-Mead implementation in optim().
> >
> > The users of optim() should be cautioned regarding the default algorithm
> and that they should consider alternatives such as "BFGS" in optim(), or
> other implementations of Nelder-Mead.
> >
> > Best regards,
> > Ravi
> > 
> > From: ProfJCNash 
> > Sent: Sunday, November 15, 2015 12:21 PM
> > To: Ravi Varadhan; 'r-help@r-project.org'; lorenzo.ise...@gmail.com
> > Cc: b...@xs4all.nl; Gabor Grothendieck
> > Subject: Re: Cautioning optim() users about "Nelder-Mead" default -
> (originally) Optim instability
> >
> > Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is
> > "bad" per se. It's issues are that it assumes the parameters are all on
> > the same scale, and the termination (not convergence) test can't use
> > gradients, so it tends to get "near" the optimum very quickly -- say
> > only 10% of the computational effort -- then spends an awful amount of
> > effort deciding it's got there. It often will do poorly when the
> > function has nearly "flat" zones e.g., long valley with very low slope.
> >
> > So my message is still that Nelder-Mead is an unfortunate default -- it
> > has been chosen I believe because it is generally robust and doesn't
> > need gradients. BFGS really should use accurate gradients, preferably
> > computed analytically, so it would only be a good default in that case
> > or with very good approximate gradients (which are costly
> > computationally).
> >
> > However, if you understand what NM is doing, and use it accordingly, it
> > is a valuable tool. I generally use it as a first try BUT turn on the
> > trace to watch what it is doing as a way to learn a bit about the
> > function I am minimizing. Rarely would I use it as a production
> minimizer.
> >
> > Best, JN
> >
> > On 15-11-15 12:02 PM, Ravi Varadhan wrote:
> >> Hi,
> >>
> >>
> >>
> >> While I agree with the comments about paying attention to parameter
> >> scaling, a major issue here is that the default optimization algorithm,
> >> Nelder-Mead, is not very good.  It is unfortunate that the optim
> >> implementation chose this as the "default" algorithm.  I have several
> >> instances where people have come to me with poor results from using
> >> optim(), because they did not realize that the default algorithm is
> >> bad.  We (John Nash and I) have pointed this out before, but the R core
> >> has not addressed this issue due to backward compatibility reasons.
> >>
> >>
> >>
> >> There is a better implementation of Nelder-Mead in the "dfoptim"
> package.
> >>
> >>
> >>
> >> ​require(dfoptim)
> >>
> >> 

Re: [R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

2015-11-15 Thread ProfJCNash
Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is
"bad" per se. It's issues are that it assumes the parameters are all on
the same scale, and the termination (not convergence) test can't use
gradients, so it tends to get "near" the optimum very quickly -- say
only 10% of the computational effort -- then spends an awful amount of
effort deciding it's got there. It often will do poorly when the
function has nearly "flat" zones e.g., long valley with very low slope.

So my message is still that Nelder-Mead is an unfortunate default -- it
has been chosen I believe because it is generally robust and doesn't
need gradients. BFGS really should use accurate gradients, preferably
computed analytically, so it would only be a good default in that case
or with very good approximate gradients (which are costly
computationally).

However, if you understand what NM is doing, and use it accordingly, it
is a valuable tool. I generally use it as a first try BUT turn on the
trace to watch what it is doing as a way to learn a bit about the
function I am minimizing. Rarely would I use it as a production minimizer.

Best, JN

On 15-11-15 12:02 PM, Ravi Varadhan wrote:
> Hi,
> 
>  
> 
> While I agree with the comments about paying attention to parameter
> scaling, a major issue here is that the default optimization algorithm,
> Nelder-Mead, is not very good.  It is unfortunate that the optim
> implementation chose this as the "default" algorithm.  I have several
> instances where people have come to me with poor results from using
> optim(), because they did not realize that the default algorithm is
> bad.  We (John Nash and I) have pointed this out before, but the R core
> has not addressed this issue due to backward compatibility reasons. 
> 
>  
> 
> There is a better implementation of Nelder-Mead in the "dfoptim" package.
> 
>  
> 
> ​require(dfoptim)
> 
> mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)
> 
> mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)
> 
> mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)
> 
> print(mm_def1$par)
> 
> print(mm_def2$par)
> 
> print(mm_def3$par)
> 
>  
> 
> In general, better implementations of optimization algorithms are
> available in packages such as "optimx", "nloptr".  It is unfortunate
> that most naïve users of optimization in R do not recognize this. 
> Perhaps, there should be a "message" in the optim help file that points
> this out to the users. 
> 
>  
> 
> Hope this is helpful,
> 
> Ravi
> 
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

2015-11-15 Thread Ravi Varadhan
Hi,



While I agree with the comments about paying attention to parameter scaling, a 
major issue here is that the default optimization algorithm, Nelder-Mead, is 
not very good.  It is unfortunate that the optim implementation chose this as 
the "default" algorithm.  I have several instances where people have come to me 
with poor results from using optim(), because they did not realize that the 
default algorithm is bad.  We (John Nash and I) have pointed this out before, 
but the R core has not addressed this issue due to backward compatibility 
reasons.



There is a better implementation of Nelder-Mead in the "dfoptim" package.



?require(dfoptim)

mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)

mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)

mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)

print(mm_def1$par)

print(mm_def2$par)

print(mm_def3$par)



In general, better implementations of optimization algorithms are available in 
packages such as "optimx", "nloptr".  It is unfortunate that most na�ve users 
of optimization in R do not recognize this.  Perhaps, there should be a 
"message" in the optim help file that points this out to the users.



Hope this is helpful,

Ravi


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

2015-11-15 Thread John C Frain
In econometrics it was common to start an optimization with Nelder-Mead and
then switch to one of the other algorithms to finish the optimization. As
John Nash states NM gets one close. switching then speeds the final
solution.

John

John C Frain
3 Aranleigh Park
Rathfarnham
Dublin 14
Ireland
www.tcd.ie/Economics/staff/frainj/home.html
mailto:fra...@tcd.ie
mailto:fra...@gmail.com

On 15 November 2015 at 20:05, Mark Leeds  wrote:

> and just to add to john's comments, since he's too modest, in my
> experience,  the algorithm in the rvmmin  package ( written by john ) shows
> great improvement compared to the L-BFGS-B  algorithm so I don't use
> L-BFGS-B anymore.  L-BFGS-B often has a dangerous convergence issue  in
> that it can claim to converge when it hasn't. which, to
> me is worse than not converging.  Most likely it has to do with the link
> below which was pointed out to me by john a while back.
>
> http://www.ece.northwestern.edu/~morales/PSfiles/acm-remark.pdf
>
>
> On Sun, Nov 15, 2015 at 2:41 PM, ProfJCNash  wrote:
>
> > Agreed on the default algorithm issue. That is important for users to
> > know, and I'm happy to underline it. Also that CG (which is based on one
> > of my codes) should be deprecated. BFGS (also based on one of my codes
> > from long ago) does much better than I would ever have expected.
> >
> > Over the years I've tried different Nelder-Mead implementations. Cannot
> > say I've found any that is always better than that in optim() (also
> > based on an old code of mine), though nmkb() from dfoptim package seems
> > to do better a lot of the time, and it has a transformation method for
> > bounds, which may be useful, but does have the awkwardness that one
> > cannot start on a bound. For testing a function, I don't think it makes
> > a lot of difference which variant of NM one uses if the trace is on to
> > catch never-ending runs. For production use, it is a really good idea to
> > try different methods on a sample of likely cases and choose a method
> > that does well. That is the motivation for the optimx package or the
> > opm() function of the newer optimz (on R-forge) that I'm still
> > polishing. optimz has a function optimr() that has the same call as
> > optim() but incorporates over a dozen optimizers via method = "(selected
> > method)".
> >
> > As a gradient-free choice, the Powell codes from minqa or other packages
> > (there are several implementations) can sometimes have spectacular
> > performance, but they also flub rather more regularly than Nelder-Mead
> > in my experience. That is, when they are good, they are very very good,
> > and when they are not they are horrid. (Plagiarism!)
> >
> > JN
> >
> > On 15-11-15 12:46 PM, Ravi Varadhan wrote:
> > > Hi John,
> > > My main point is not about Nelder-Mead per se.  It is *primarily* about
> > the Nelder-Mead implementation in optim().
> > >
> > > The users of optim() should be cautioned regarding the default
> algorithm
> > and that they should consider alternatives such as "BFGS" in optim(), or
> > other implementations of Nelder-Mead.
> > >
> > > Best regards,
> > > Ravi
> > > 
> > > From: ProfJCNash 
> > > Sent: Sunday, November 15, 2015 12:21 PM
> > > To: Ravi Varadhan; 'r-help@r-project.org'; lorenzo.ise...@gmail.com
> > > Cc: b...@xs4all.nl; Gabor Grothendieck
> > > Subject: Re: Cautioning optim() users about "Nelder-Mead" default -
> > (originally) Optim instability
> > >
> > > Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is
> > > "bad" per se. It's issues are that it assumes the parameters are all on
> > > the same scale, and the termination (not convergence) test can't use
> > > gradients, so it tends to get "near" the optimum very quickly -- say
> > > only 10% of the computational effort -- then spends an awful amount of
> > > effort deciding it's got there. It often will do poorly when the
> > > function has nearly "flat" zones e.g., long valley with very low slope.
> > >
> > > So my message is still that Nelder-Mead is an unfortunate default -- it
> > > has been chosen I believe because it is generally robust and doesn't
> > > need gradients. BFGS really should use accurate gradients, preferably
> > > computed analytically, so it would only be a good default in that case
> > > or with very good approximate gradients (which are costly
> > > computationally).
> > >
> > > However, if you understand what NM is doing, and use it accordingly, it
> > > is a valuable tool. I generally use it as a first try BUT turn on the
> > > trace to watch what it is doing as a way to learn a bit about the
> > > function I am minimizing. Rarely would I use it as a production
> > minimizer.
> > >
> > > Best, JN
> > >
> > > On 15-11-15 12:02 PM, Ravi Varadhan wrote:
> > >> Hi,
> > >>
> > >>
> > >>
> > >> While I agree with the comments about paying attention to parameter
> > >> scaling, a major 

Re: [R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

2015-11-15 Thread lorenzo.ise...@gmail.com

Thanks a lot, Ravi.
Indeed you best understood the point of my email.
I am perfectly aware that most of the optimization algorithms find
local rather than global minima and therefore the choice of the
initial parameters plays (at least in principle) a role.
Nevertheless, my optimization problem is rather trivial and I did not
bother to look for anything beyond the most basic tool in R for
optimization.
What surprised me is that an algorithm different from the default one
in optim() is extremely robust to a partially deliberate bad choice
ofthe initial parameters, whereas the standard one is not.
You perfectly answered my question.
Regards

Lorenzo


On Sun, Nov 15, 2015 at 05:02:52PM +, Ravi Varadhan wrote:

Hi,



While I agree with the comments about paying attention to parameter scaling, a major 
issue here is that the default optimization algorithm, Nelder-Mead, is not very good.  It 
is unfortunate that the optim implementation chose this as the "default" 
algorithm.  I have several instances where people have come to me with poor results from 
using optim(), because they did not realize that the default algorithm is bad.  We (John 
Nash and I) have pointed this out before, but the R core has not addressed this issue due 
to backward compatibility reasons.



There is a better implementation of Nelder-Mead in the "dfoptim" package.



?require(dfoptim)

mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)

mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)

mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)

print(mm_def1$par)

print(mm_def2$par)

print(mm_def3$par)



In general, better implementations of optimization algorithms are available in packages such as 
"optimx", "nloptr".  It is unfortunate that most naïve users of optimization in R do not 
recognize this.  Perhaps, there should be a "message" in the optim help file that points this out 
to the users.



Hope this is helpful,

Ravi



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

2015-11-15 Thread Ravi Varadhan
Hi John,
My main point is not about Nelder-Mead per se.  It is *primarily* about the 
Nelder-Mead implementation in optim().  

The users of optim() should be cautioned regarding the default algorithm and 
that they should consider alternatives such as "BFGS" in optim(), or other 
implementations of Nelder-Mead.

Best regards,
Ravi

From: ProfJCNash 
Sent: Sunday, November 15, 2015 12:21 PM
To: Ravi Varadhan; 'r-help@r-project.org'; lorenzo.ise...@gmail.com
Cc: b...@xs4all.nl; Gabor Grothendieck
Subject: Re: Cautioning optim() users about "Nelder-Mead" default - 
(originally) Optim instability

Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is
"bad" per se. It's issues are that it assumes the parameters are all on
the same scale, and the termination (not convergence) test can't use
gradients, so it tends to get "near" the optimum very quickly -- say
only 10% of the computational effort -- then spends an awful amount of
effort deciding it's got there. It often will do poorly when the
function has nearly "flat" zones e.g., long valley with very low slope.

So my message is still that Nelder-Mead is an unfortunate default -- it
has been chosen I believe because it is generally robust and doesn't
need gradients. BFGS really should use accurate gradients, preferably
computed analytically, so it would only be a good default in that case
or with very good approximate gradients (which are costly
computationally).

However, if you understand what NM is doing, and use it accordingly, it
is a valuable tool. I generally use it as a first try BUT turn on the
trace to watch what it is doing as a way to learn a bit about the
function I am minimizing. Rarely would I use it as a production minimizer.

Best, JN

On 15-11-15 12:02 PM, Ravi Varadhan wrote:
> Hi,
>
>
>
> While I agree with the comments about paying attention to parameter
> scaling, a major issue here is that the default optimization algorithm,
> Nelder-Mead, is not very good.  It is unfortunate that the optim
> implementation chose this as the "default" algorithm.  I have several
> instances where people have come to me with poor results from using
> optim(), because they did not realize that the default algorithm is
> bad.  We (John Nash and I) have pointed this out before, but the R core
> has not addressed this issue due to backward compatibility reasons.
>
>
>
> There is a better implementation of Nelder-Mead in the "dfoptim" package.
>
>
>
> ​require(dfoptim)
>
> mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)
>
> mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)
>
> mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)
>
> print(mm_def1$par)
>
> print(mm_def2$par)
>
> print(mm_def3$par)
>
>
>
> In general, better implementations of optimization algorithms are
> available in packages such as "optimx", "nloptr".  It is unfortunate
> that most naïve users of optimization in R do not recognize this.
> Perhaps, there should be a "message" in the optim help file that points
> this out to the users.
>
>
>
> Hope this is helpful,
>
> Ravi
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.