Re: [R-sig-phylo] Model Selection and PGLS

Simone Blomberg Fri, 02 Jul 2021 17:53:16 -0700

I knew there would be a mistake somewhere. Here is the correction:

## C <- 1+(lambda-1)*(1-mat) ## Lambda correction WRONG!!
C <- (lambda*mat)+(1-lambda)*diag(diag(mat)) ## correct, I think..


I tested it on a toy example of a tree with 5 species and it works fine..

I hope there are no other mistakes! See other comments below.

Cheers,

Simone.

On 3/7/21 8:40 am, Russell Engelman wrote:
> Dear All,
>
> Okay, so I tried the code that Dr. Blomberg provided, and it doesn't 
> seem to work. I rewrote the code slightly in order for it to try and 
> predict the body mass of the taxa used to calculate the line, rather 
> than some new taxon, because I need to do that in order to calculate 
> prediction error and be able to compare it to previous studies. I got 
> a lambda of about 0.9, which is approximately the same as what I got 
> previously. So on the surface it seems like the method should work.
>
> The problem comes when I try to apply this to the data itself to 
> calculate mu. This is the modified code I used if that helps, though I 
> don't think I can send the data or tree to make it replicable because 
> the message wouldn't go through last time.
>
> tree.trimmed <- drop.tip(trees[[1]], which(is.na <http://is.na>(b$bm)))
> b.trimmed <- b[complete.cases(b$bm),]
> fit <- gls(log(bm)~I(log(ocw)^(2/3)), corPagel(0.5, phy=tree.trimmed, 
> form=~species),
>            data=b.trimmed)
> (lambda <- as.numeric(fit$modelStruct))
> model.matFULL <- model.matrix(~I(log(ocw)^(2/3)), data=b)
> preds <- predict(fit,newdata=b)
> mat <- vcv.phylo(trees[[1]])
> C <- 1+(lambda-1)*(1-mat) ## Lambda correction
> XnoMean <- scale(model.matFULL, scale=FALSE)
> mu <- C %*% solve(C) %*%  XnoMean ## Replaced CCompletedata and cihmat 
> with C, since both seem to be subsets of that data. This is the only 
> place I could think things might go wrong.

C is the full variance-covariance matrix, made from the tree that 
includes both the known and unknown species. CCompletedata is the 
variance-covariance matrix WITHOUT the unknown species. This corresponds 
to C in Ted and Tony's paper.  cihmat is the matrix of relationships of 
the unknown species to the known species, one row for each unknown 
species. So you cannot replace cihmat with C.

> mu2<-(mu[match(b$species,row.names(mu)),])[,2] ## Have to re-sort mu 
> based on the data sheet. Mu is reported in phylogenetic order, whereas 
> predict.gls will always report the data in alphabetical order of the 
> row names even if a phylogeny is present.
predict.gls reports the predicted values in the order that is present in 
the newdata argument. You shouldn't have to re-order anything in my 
example code. YMMV however if you adapt my code.
> exp(preds+mu2) ## "correct" prediction accounting for bias due to 
> correlations with other species. Exp() is because data were 
> log-transformed.
>
> So when I check the results of this equation the predicted values are 
> actually /less /accurate than the fitted values reported from gls 
> without correction. Normally the accuracy is about +/- 64% of the 
> actual value, but adding the correction here results in the data 
> being +/- 84% of the actual value. What's more, I took a look at the 
> values of mu and they don't seem to make sense when compared on the 
> phylogeny. Namely, if mu were being calculated right, one would expect 
> monotremes to have a very negative value compared to all other mammals 
> given the deep divergence in trait states between monotremes and all 
> other taxa in the figure I previously attached. But when I check the 
> mu values, I get a value of 0.35 for /Zaglossus/, -0.58 for 
> /Tachyglossus/, and 0.59 for /Ornithorhynchus/. Which is not what 
> would be expected if mu were modelling a phylogenetic correction 
> factor (if nothing else, the range of mu should probably not span 
> zero). So it suggests to me I'm not doing something right. I know that 
> PGLS doesn't always result in better fits than OLS, but I'm surprised 
> adding signal back in made the results worse when the fit was so poor 
> when it was fitted through the ancestral node.
>
> Sincerely,
> Russell

-- 
Simone Blomberg, BSc (Hons), PhD, MAppStat (she/her)
Senior Lecturer and Consultant Statistician
School of Biological Sciences
The University of Queensland
St. Lucia Queensland 4072
Australia
T: +61 7 3365 2506
email: S.Blomberg1_at_uq.edu.au
Twitter: @simoneb66
UQ ALLY Supporting the diversity of sexuality and gender at UQ.

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

Basically, I'm not interested in doing research
and I never have been. I'm interested in
understanding, which is quite a different thing.
And often to understand something you have to
work it out for yourself because no one else
has done it. - David Blackwell


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Model Selection and PGLS

Reply via email to