Re: [R-sig-phylo] Difference in alpha value estimates between geiger and OUCH - data

Luke Harmon Tue, 20 Apr 2010 14:01:57 -0700

Hi everyone, this is a good discussion. If you look at the "guts" of the Geiger 
function I did something pretty close to what Brian described. The code tries 
both evenly spaced and random starting points. I've run a few datasets through 
(maybe ~200 total) and found that the defaults "mostly" work, sometimes you 
have to adjust the bounds. The likelihood surface for this problem sometimes 
has a ridge of high alpha and variance, but (as Carl notes) a constant ratio of 
the two (and a nearly-constant lnL).


Practical advice for the geiger version at least: try to run the function, and 
if you get either an alpha or a sigma squared close to the upper bounds, try to 
run it again with different bounds, starting point, or both. If the likelihood 
is the same and you always get high values of alpha, then you have a ridge in 
your likelihood surface. 

Hope this is helpful - Luke

On Apr 20, 2010, at 11:57 AM, Brian O'Meara wrote:

> At times when I've seen this sort of issue (not in this particular case, but 
> optimization in general) I've tried many starting points and plotted lines 
> connecting each starting point with each ending point (plotting only two 
> parameters at a time). Ideally, wherever you start in parameter space you get 
> to the same end point, and this allows you to visually inspect for islands 
> sensitive to starting position (if all starts converge to a few end points) 
> or issues of flat surfaces (for example, if you can't estimate alpha and 
> sigma^2 separately, all end points might fall on a line with a fixed 
> alpha/sigma^2 ratio). You could use color or height to look at likelihoods of 
> each solution.
> 
> You might try embedding the hansen function in a loop (or vectorization) 
> starting with random starting points (or points on a grid) and take the 
> replicate with the best likelihood. This won't help you if the issue is 
> non-identifiability of the parameters (the star case Carl described), but it 
> may help if the issue is a problem in the search not finding the true optimum 
> (unless you get into numerical precision issues....).
> 
> Best,
> Brian
> 
> On Apr 20, 2010, at 2:10 PM, Carl Boettiger wrote:
> 
>> Hi Alejandro,
>> 
>> I think you've raised an excellent question and I'd also be interested in
>> the thoughts of others on this.  I believe the short answer is that you
>> can't; your data may lack the information to estimate alpha and sigma^2
>> independently and you should therefore only attempt to estimate their
>> ratio.  For an extreme example, image a star phylogeny.  On such a tree the
>> OU model is described entirely by two parameters, mean (theta) and variance
>> (sigma^2/2 alpha), so there's no way to estimate all three parameters.
>> 
>> Geiger and ouch take different approaches to optimization.  Though both use
>> the optim package function Geiger uses the L-BFGS-B method which requires
>> bounds on parameters.  OUCH implements Nelder-Meade by default, which takes
>> intitial conditions rather than bounds.  OUCH can use any of the methods
>> provided by optim(), including L-BFGS-B, so you could specify: out <-
>> hansen(data, tree, regimes, sqrt.alpha, sigma, method="L-BFGS-B").  Wasn't
>> obvious to me how OUCH generates the bounds in that case.  Geiger tries a
>> couple different bounds, some in the near-brownian limit and some in the
>> strong selection limit.  I think the takehome message is to be suspicious of
>> any result that seems sensitive to the numerical method / starting
>> conditions.  Sorry if this didn't make sense, I'm sure the package
>> developers and others can shed more light on this.
>> 
>> Cheers,
>> Carl
>> 
>> 
>> 
>> On Tue, Apr 20, 2010 at 10:40 AM, Alejandro Gonzalez <
>> [email protected]> wrote:
>> 
>>> Hello,
>>> 
>>> Sorry for not sending in any data to replicate the problem I posted in the
>>> list. I now include as attachment a small database and phylogenetic tree in
>>> Nexus format.
>>> Carl Boettiger pointed out correctly that the cases in which I found a
>>> large difference between alpha estimates probably also had very large
>>> sigma^2 estimates. That is what was going on, compared with the sigma^2 of
>>> the cases where the difference between alpha estimates was low, the
>>> "problematic" cases did present high sigma^2 values. As he also suggested
>>> I've modified the starting values of sqrt.alpha and sigma in the hansen
>>> function. Low starting values do result in estimates which are closer to
>>> those obtained with geiger. Is there any rule of thumb regarding how the
>>> starting values should be modified? How do these modifications affect the
>>> estimate of alpha and sigma^2?
>>> 
>>> Thank you again for the assistance.
>>> 
>>> 
>>> Best wishes
>>> 
>>> 
>>> Alejandro
>>> 
>>> 
>>> 
>>> __________________________________
>>> 
>>> Alejandro Gonzalez Voyer
>>> 
>>> Post-doc
>>> 
>>> NEW ADDRESS
>>> 
>>> Estación Biológica de Doñana
>>> Consejo Superior de Investigaciones Científicas (CSIC)
>>> Av Américo Vespucio s/n
>>> 41092 Sevilla
>>> Spain
>>> 
>>> Tel: + 34 - 954 466700, ext 1749
>>> 
>>> E-mail: [email protected]
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> R-sig-phylo mailing list
>>> [email protected]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>> 
>>> 
>> 
>> 
>> -- 
>> Carl Boettiger
>> Population Biology, UC Davis
>> http://two.ucdavis.edu/~cboettig
>> 
>>      [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-phylo mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> 
> ------------------------------------------------------
> Brian O'Meara
> http://www.brianomeara.info
> Assistant Prof.
> Dept. Ecology & Evolutionary Biology
> U. of Tennessee, Knoxville
> 
> _______________________________________________
> R-sig-phylo mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Luke Harmon
Assistant Professor
Biological Sciences
University of Idaho
208-885-0346
[email protected]

_______________________________________________
R-sig-phylo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Difference in alpha value estimates between geiger and OUCH - data

Reply via email to