Re: [R-sig-phylo] understanding variance-covariance matrix

`Many thanks to all for your very helpful references, commands and examples.`
```
For the sake of future readers, a summary of the off-list discussions and

Andrew Hipp provided a very didactic way to look at the vcv matrix as a
triangular distance matrix

library(geiger)
tr <- sim.bdtree(n = 100)
C = as.dist(vcv(tr))
C

Now, it made sense that for n species where a trait X has been measured,
Cov[xi,xj] is an nxn matrix with species coordinates, which for any pair of
tip values Xi and Xj, represents the shared distances of these tips from
the root, multiplied for a rate of evolution, or matrix of rates. The
results of such multiplication constitute the *expected *covariances among
values at the tips.

Under the case of phylo independence, all off-diagonal values of Cov[xi,xj]
are expected to become zero (because of shared history=0, and is a
multiplicative term in the covariance formula). At the diagonal, each trait
covariance with itself consists in the squared distance from the tip to the
root of the tree multiplied by the rate of evolution (or rates, if they are
not homogeneous). That can be represented by a unity matrix multiplied by
the squared term "sigma", which represents the rate of evolution, or a
matrix of rates of evolution along the tree.

Under the case of phylo dependence, all off-diagonal values are expected to
be non zero (because of shared history>0 and is a multiplicative term in
the covariance formula). The rest of Cov [xi,xj] is similar to a case of
phylo independence.

Thus, during a PGLS model fit procedure, R calculates  Cov [xi,xj] for a
given model of evolution over the residuals of the OLS regression. These
residuals should contain phylogenetic dependence coming from shared
evolution of trait values in X and Y. That is because the OLS regression
does not account for such dependence. The residuals of an OLS do not need
to be one unity matrix, but the dependence in residuals should look like it
to assume phylo independence in trait values. Finally, the PGLS procedure
modifies the predictive and predicted variables accordingly, to avoid a
case of phylo dependence under the specified model of evolution, and
finally performs an OLS.

Many thanks again to the refs pointed by Brian, and Liam's 2010 paper on
phylo signal. Matterial from Andrew, Julen and Diogo Provete were also
super didactic.

Hope that help others too,
Cheers

Dr. Agustín Camacho Guerrero. Universidade de São Paulo.
http://www.agustincamacho.com
Laboratório de Comportamento e Fisiologia Evolutiva, Departamento de
Fisiologia,
Instituto de Biociências, USP.Rua do Matão, trav. 14, nº 321, Cidade
Universitária,
São Paulo - SP, CEP: 05508-090, Brasil.

El dom., 26 ago. 2018 a las 21:55, Julien Clavel (<julien.cla...@hotmail.fr>)
escribió:

> Hi Agus,
>
> I just posted some courses I did a while ago to understand the variances
> and covariances of a BM process on trees and time-series (R markdown):
> https://github.com/JClavel/Examples <https://github.com/JClavel/Teaching>
>
> This is also based on the previous mentioned references illustrated with
> some simulations.
> Hope it may helps...
>
> Best,
>
> Julien
>
> ------------------------------
> *De :* R-sig-phylo <r-sig-phylo-boun...@r-project.org> de la part de
> Andrew Hipp <ah...@mortonarb.org>
> *Envoyé :* dimanche 26 août 2018 05:36
> *À :* bome...@utk.edu
> *Cc :* mailman, r-sig-phylo; Agus Camacho
> *Objet :* Re: [R-sig-phylo] understanding variance-covariance matrix
>
> I'll second Brian's self-citation. O'Meara et al. 2006 is I think one of
> the best introductions to the phylogenetic covariance matrix, and I often
> direct students to it.
>
> Brian's point about the relationship between observed and expected
> covariance is illustrated here in a brief note I wrote up for students this
> spring:
>
>
> https://github.com/andrew-hipp/PCM-2018/blob/master/R-tutorials/2018-PCM-covarianceMatrixRuminations.ipynb
>
> It might be helpful, or it might not. I hope so!
>
> Take care,
> Andrew
>
> On Sat, Aug 25, 2018 at 1:33 PM, Brian O'Meara <omeara.br...@gmail.com>
> wrote:
>
> > Hi, Agus. The variance-covariance matrix comes from the tree and the
> > evolutionary model, not the data. Each entry between taxa A and B in the
> > VCV is how much covariance I should expect between data for taxa A and B
> > simulated up that tree using that model. I don't want to be *that guy*,
> but
> > O'Meara et al. (2006)
> > https://onlinelibrary.wiley.com/doi/10.1111/j.0014-3820.2006.tb01171.x
> has
> > a fairly accessible explanation of this (largely b/c I was just learning
> > about VCVs when working on that paper). Hansen and Martins (1996)
> > https://onlinelibrary.wiley.com/doi/10.1111/j.1558-5646.1996.tb03914.x
> > have
> > a much more detailed description of how you get these covariance matrices
> > from microevolutionary processes.
> >
> > Typically, ape::vcv() is how you get a variance covariance for a
> phylogeny,
> > assuming Brownian motion and no measurement error. It just basically
> takes
> > the history two taxa share to create the covariance (or variance, if the
> > two taxa are the same taxon). A different approach, which seems to be
> what
> > you're doing, would be to simulate up a tree many times, and then for
> each
> > pair of taxa (including the pair of a taxon with itself, the diagonal of
> > the VCV), calculate the covariance. These approaches should get the same
> > results, though the shared history on the tree approach is faster.
> >
> > Best,
> > Brian
> >
> >
> > _______________________________________________________________________
> > Brian O'Meara, http://www.brianomeara.info, especially Calendar
> > <http://brianomeara.info/calendars/omeara/>, CV
> > <http://brianomeara.info/cv/>, and Feedback
> > <http://brianomeara.info/teaching/feedback/>
> >
> > Associate Professor, Dept. of Ecology & Evolutionary Biology, UT
> Knoxville
> > Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville
> >
> >
> >
> > On Sat, Aug 25, 2018 at 1:16 PM Agus Camacho <agus.cama...@gmail.com>
> > wrote:
> >
> > > Dear list users,
> > >
> > > I am trying to make an easy R demonstration to teach the
> > > variance-covariance matrix to students. However, After consulting the
> > > internet and books, I found myself facing three difficulties to
> > understand
> > > the math and code behind this important matrix. As this list is
> > by
> > > several authors of books of phylocomp methods, thought this might make
> an
> > > useful general discussion.
> > >
> > > Here we go,
> > >
> > > 1) I dont know how to generate a phyloVCV matrix in R (Liams kindly
> > > described some options here
> > > <
> > > http://blog.phytools.org/2013/12/three-different-ways-to-
> > calculate-among.html
> > > >
> > > but I cannot tell for sure what is X made of. It would seem a dataframe
> > of
> > > some variables measured across species. But then, I get errors when I
> > > write:
> > >
> > >  tree <- pbtree(n = 10, scale = 1)
> > >  tree\$tip.label <- sprintf("sp%s",seq(1:n))
> > >  x <- fastBM(tree)
> > > y <- fastBM(tree)
> > >   X=data.frame(x,y)
> > >  rownames(X)=tree\$tip.label
> > >  ## Revell (2009)
> > >  A<-matrix(1,nrow(X),1)%*%apply(X,2,fastAnc,tree=tree)[1,]
> > >  V1<-t(X-A)%*%solve(vcv(tree))%*%(X-A)/(nrow(X)-1)
> > >    ## Butler et al. (2000)
> > >    Z<-solve(t(chol(vcv(tree))))%*%(X-A)
> > >  V2<-t(Z)%*%Z/(nrow(X)-1)
> > >
> > >    ## pics
> > >    Y<-apply(X,2,pic,phy=tree)
> > >  V3<-t(Y)%*%Y/nrow(Y)
> > >
> > > 2) The phyloVCV matrix has n x n coordinates defined by the n species,
> > and
> > > it represents covariances among observations made across the n species,
> > > right?. Still, I do no know whether these covariances are calculated
> over
> > > a) X vs Y values for each pair of species coordinates in the matrix,
> > across
> > > the n species, or b) directly over the vector of n residuals of Y,
> after
> > > correlating Y vs X, across all pairs of species coordinates. I think it
> > may
> > > be a) because, by definition, variance cannot be calculated for a
> single
> > > value. I am not sure though, since it seems the whole point of PGLS is
> to
> > > control phylosignal within the residuals of a regression procedure,
> prior
> > > to actually making it.
> > >
> > > 3) If I create two perfeclty correlated variables with independent
> > > observations and calculate a covariance or correlation matrix for
> them, I
> > > do not get a diagonal matrix, with zeros at the off diagonals (ex. here
> > > <
> > >
> https://www.dropbox.com/s/y8g3tkzk509pz58/vcvexamplewithrandomvariables.
> > xlsx?dl=0
> > > >),
> > > why expect then a diagonal matrix for the case of independence among
> the
> > > observations?
> > >
> > > Thanks in advance and sorry if I missed anything obvious here!
> > > Agus
> > > Dr. Agustín Camacho Guerrero. Universidade de São Paulo.
> > > http://www.agustincamacho.com
> > > Laboratório de Comportamento e Fisiologia Evolutiva, Departamento de
> > > Fisiologia,
> > > Instituto de Biociências, USP.Rua do Matão, trav. 14, nº 321, Cidade
> > > Universitária,
> > > São Paulo - SP, CEP: 05508-090, Brasil.
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > R-sig-phylo mailing list - R-sig-phylo@r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> > > Searchable archive at
> > > http://www.mail-archive.com/r-sig-phylo@r-project.org/
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-phylo mailing list - R-sig-phylo@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> > Searchable archive at http://www.mail-archive.com/r-
> > sig-ph...@r-project.org/
> >
>
>
>
> --
> Andrew Hipp, PhD
> Senior Scientist in Plant Systematics and Herbarium Curator, The Morton
> Arboretum
> Lecturer, Committee on Evolutionary Biology, University of Chicago
>
> The Morton Arboretum
> 4100 Illinois Route 53 / Lisle IL 60532-1293 / USA
> +1 630 725 2094
>
> Lab: http://systematics.mortonarb.org/lab
> Hebarium data: http://vplants.org
> U of Chicago, CEB: http://evbio.uchicago.edu/
> Phenology of the East Woods: http://systematics.mortonarb.org/phenology
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>

[[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
```