Just a quick note re: fitContinuous and large sample sizes:

I think trying to speed up these calculations for big data sets is a great 
idea. There are two reasons why fitContinuous is slow - and it's very slow! 

1. fitContinuous uses loops and other non-optimal algorithms. My code is not 
optimized for r. Also it's pure r - would be faster if some was in c.
2. Fitting these models is hard - harder, in general, than what r can handle 
without help. So, for example, when estimating lambda or fitting OU, I've found 
that r's optimization functions - either optim or nlm - are insufficient. 
fitContinuous is a bit paranoid, so it tries optimization with 20 different 
starting points. Even then the algorithm still doesn't always work, and 
sometimes one must mess with the boundaries. So, when someone makes a faster 
way to do these calculations - using the code below, perhaps! - I would make 
sure that the optimization is solid. I'm happy to help in any way I can - I 
know my code is completely uncommented so I can help anyone figure out what 
different bits are doing.

Thanks,
lh
On Nov 15, 2010, at 7:45 AM, Peter E. Midford wrote:

> Enrico,
>             This looks very interesting - mind if I play with it too?
> 
> Peter
> 
> 
> On Nov 15, 2010, at 10:36, Enrico Rezende wrote:
> 
>> Manuela and David,
>> perhaps you want to try the OU-transformation syntax I wrote to translate 
>> the MATLAB code developed by Blomberg, Ives & Garland. It works well with 
>> small phylogenies, I never tried it in large ones and am actually curious to 
>> see what would be the outcome.
>> 
>> You will find the details below.
>> Hope this helps.
>> Cheers,
>> Enrico
>> 
>> 
>> 
>> 
>> **************************************************************************************************************************
>> # This function estimates the parameter d for OU-transformation (Blomberg et 
>> al. 2003).
>> # Results are identical to those obtained from the Matlab module 
>> RegressionV2  (Lavin et al. 2008) when no independent variables are included 
>> in the model.
>> # To obtain the optimal branch lengths after OU-transformation, see 
>> phylo.OU.matrix.
>> # Plot command shows the REML curve as a funtion of d (default plot=T).
>> # IMPORTANT! In some instances d does not converge because some internal 
>> branches may collapse before the optimal transformation is found (this is 
>> very common with trees obtained with the rcoal command). When this occurs, 
>> other initial branch lengths should be employed. Note that the crashes 
>> coincide with those observed in MATLAB, so it is not a problem with the code.
>> 
>> library(ape)
>> 
>> 'phylo.OU' <-
>> function(trait,phy,plot=T)
>> {
>> V1 <- vcv.phylo(phy)
>> nspp1 <- dim(V1)[1]
>> V1 <- V1/det(V1)^(1/nspp1)
>> initV1 <- V1
>> U <- cbind(rep(1, nspp1))
>> tau1 <- matrix(diag(initV1), nspp1, nspp1) + t(matrix(diag(initV1),nspp1, 
>> nspp1)) - 2 * initV1
>>       pgls.ML <- function(d1) {                        V1 <- (d1^tau1) * (1 
>> - d1^(2 * initV1))/(1 - d1^2)
>>                  V1 <- V1/det(V1)^(1/nspp1)
>>                  invV <- solve(V1,diag(nspp1))
>>                  a <- solve((t(U) %*% invV %*% U), ((t(U) %*% invV %*% 
>> trait)))
>>                  E <- (trait - U %*% a)
>>                  s2 <- (t(E) %*% invV %*% E)/(nspp1 - 1)
>>                  ML<- (-0.5*nspp1*log(2*pi)) - 
>> (0.5*(nspp1*log((nspp1-1)*s2/nspp1) + log(det(V1)) + nspp1))
>>                  -ML
>>               }
>> est <- optimize(pgls.ML,lower=0,upper=3)                           d.ML <- 
>> est$minimum
>> ML <- -(est$objective)
>> 
>>          pgls.REML <- function(d1) {                        V1 <- (d1^tau1) 
>> * (1 - d1^(2 * initV1))/(1 - d1^2)
>>                  V1 <- V1/det(V1)^(1/nspp1)
>>                  invV <- solve(V1,diag(nspp1))
>>                  a <- solve((t(U) %*% invV %*% U), ((t(U) %*% invV %*% 
>> trait)))
>>                  E <- (trait - U %*% a)
>>                     s2 <- (t(E) %*% invV %*% E)/(nspp1 - 1)
>>                  REML <- (-0.5*(nspp1-1)*log(2*pi)) + 
>> (0.5*log(det(t(U)%*%U))) - ((0.5*((nspp1 - 1)*log(s2) + 
>> log(det(V1))+log(det((t(U) %*% invV %*% U)))+(nspp1 - 1))))
>>                  -REML
>>               }
>> est1 <- optimize(pgls.REML,lower=0,upper=3)
>> d.REML <- est1$minimum
>> REML <- -(est1$objective)
>> 
>> if (plot){
>> ## Plotting all the data for REML
>> XX <- matrix(1:200/100+0.001,200,2)
>> for (i in 1:200){
>> d1 <- XX[i,1]
>>   V1 <- V1/det(V1)^(1/nspp1)
>>  V1 <- (d1^tau1) * (1 - d1^(2 * initV1))/(1 - d1^2)
>>       invV <- solve(V1,diag(nspp1))
>>   a <- solve((t(U) %*% invV %*% U), ((t(U) %*% invV %*% trait)))
>>   E <- (trait - U %*% a)
>>  s2 <- (t(E) %*% invV %*% E)/(nspp1 - 1)
>> REML.plot <- (-0.5*(nspp1-1)*log(2*pi)) + (0.5*log(det(t(U)%*%U))) - 
>> ((0.5*((nspp1 - 1)*log(s2) + log(det(V1))+log(det((t(U) %*% invV %*% 
>> U)))+(nspp1 - 1))))
>> XX[i,2]<- REML.plot}
>> plot(XX[,1],XX[,2],xlab='OU-transformation parameter (d)',ylab='REML')}
>> else{
>>          }
>> vcv <- (d.REML^tau1) * (1 - d.REML^(2 * initV1))/(1 - d.REML^2)
>> data.frame(d.REML,REML, d.ML,ML)
>> }
>> 
>> **************************************************************************************************************************
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> David Bapst escribió:
>>> Hi Manuela,
>>> I just ran the fitContinuous last night (lambda model) on a dataset of
>>> 1766 graptoloid species and their geologic durations. It took it more
>>> than 12 hours to resolve, but it did come up with an answer, finally.
>>> So, maybe your analysis was not permanently stuck.
>>> 
>>> However, I have seen fitContinuous get permanently 'stuck' on much
>>> smaller trees (103 graptolites), particularly when fitting the OU
>>> model. I once let it sit for two weeks but nothing happened. (I would
>>> be interested in knowing if there is a way to avoid this, as I run my
>>> analyses over large sets of trees...)
>>> -Dave Bapst, UChicago
>>> 
>>> On Mon, Nov 15, 2010 at 3:40 AM, Manuela Gonzalez
>>> <[email protected]> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I am trying to estimate the phylogenetic "inertia" in a variable
>>>> reflecting counts for 3975 mammalian species (with an ultrametric and
>>>> fully resolved phylogeny).
>>>> I can successfully obtain an estimate of K (as in Blomberg et al 2003)
>>>> using the command phylosignal (picante package), but I am having trouble
>>>> getting an estimate of lambda with fitContinuous from Geiger package. I
>>>> am using this code.
>>>> /
>>>> lambdaFit=fitContinuous(prunedtree, m1$total_bino, model="lambda") /
>>>> 
>>>> The command seems to run fine (no error messages) but it just keeps
>>>> running and I am not sure if it is really working. I just stopped R
>>>> after more than 60 hours (still running but no results). Is it possible
>>>> that running this command for nearly 4000 species is just too much? or
>>>> should I be (more) patience?
>>>> 
>>>> Any help will be greatly appreciated.
>>>> Best regards,
>>>> Manuela
>>>> 
>>>> --
>>>> "I have never let my schooling interfere with my education"
>>>> Mark Twain
>>>> 
>>>> Manuela Gonzalez Suarez, Ph.D.
>>>> Department of Conservation Biology
>>>> Estación Biológica de Doñana CSIC
>>>> Calle Américo Vespucio s/n,
>>>> 41092 Sevilla, Spain
>>>> http://www.ebd.csic.es/carnivoros/personal/gonzalez/
>>>> 
>>>> 
>>>>      [[alternative HTML version deleted]]
>>>> 
>>>> 
>>>> _______________________________________________
>>>> R-sig-phylo mailing list
>>>> [email protected]
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>>> 
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> R-sig-phylo mailing list
>>> [email protected]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>> 
>>> 
>> 
>> 
>> -- 
>> *******************************************************************
>> Enrico L. Rezende
>> 
>> Departament de Genètica i de Microbiologia
>> Facultat de Biociències, Edifici Cn
>> Universitat Autònoma de Barcelona
>> 08193 Bellaterra (Barcelona)
>> SPAIN
>> 
>> Telephone: +34 93 581 4705
>> Fax: +34 93 581 2387
>> E-mail:    [email protected]
>> 
>> _______________________________________________
>> R-sig-phylo mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> 
> _______________________________________________
> R-sig-phylo mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Luke Harmon
Assistant Professor
Biological Sciences
University of Idaho
208-885-0346
[email protected]

_______________________________________________
R-sig-phylo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Reply via email to