Hi Agus, I just posted some courses I did a while ago to understand the variances and covariances of a BM process on trees and time-series (R markdown): https://github.com/JClavel/Examples<https://github.com/JClavel/Teaching>

This is also based on the previous mentioned references illustrated with some simulations. Hope it may helps... Best, Julien

I'll second Brian's self-citation. O'Meara et al. 2006 is I think one of the best introductions to the phylogenetic covariance matrix, and I often direct students to it. Brian's point about the relationship between observed and expected covariance is illustrated here in a brief note I wrote up for students this spring: https://github.com/andrew-hipp/PCM-2018/blob/master/R-tutorials/2018-PCM-covarianceMatrixRuminations.ipynb

Take care, Andrew

On Sat, Aug 25, 2018 at 1:33 PM, Brian O'Meara <omeara.br...@gmail.com> wrote:
> Hi, Agus. The variance-covariance matrix comes from the tree and the > evolutionary model, not the data. Each entry between taxa A and B in the > VCV is how much covariance I should expect between data for taxa A and B > simulated up that tree using that model. I don't want to be *that guy*, but > O'Meara et al. (2006) > https://onlinelibrary.wiley.com/doi/10.1111/j.0014-3820.2006.tb01171.x has > a fairly accessible explanation of this (largely b/c I was just learning > about VCVs when working on that paper). Hansen and Martins (1996) > https://onlinelibrary.wiley.com/doi/10.1111/j.1558-5646.1996.tb03914.x > have > a much more detailed description of how you get these covariance matrices > from microevolutionary processes. > > Typically, ape::vcv() is how you get a variance covariance for a phylogeny, > assuming Brownian motion and no measurement error. It just basically takes > the history two taxa share to create the covariance (or variance, if the > two taxa are the same taxon). A different approach, which seems to be what > you're doing, would be to simulate up a tree many times, and then for each > pair of taxa (including the pair of a taxon with itself, the diagonal of > the VCV), calculate the covariance. These approaches should get the same results, though the shared history on the tree approach is faster.

Best,
Brian

On Sat, Aug 25, 2018 at 1:16 PM Agus Camacho <agus.cama...@gmail.com> wrote:

> Dear list users,
>
> I am trying to make an easy R demonstration to teach the variance-covariance matrix to students. However, After consulting the internet and books, I found myself facing three difficulties to understand the math and code behind this important matrix. As this list is answered by several authors of books of phylocomp methods, thought this might make an useful general discussion.
>
> Here we go,
>
> 1) I dont know how to generate a phyloVCV matrix in R (Liams kindly described some options here
> <http://blog.phytools.org/2013/12/three-different-ways-to-calculate-among.html>
> but I cannot tell for sure what is X made of. It would seem a dataframe of some variables measured across species. But then, I get errors when I write:
>
> tree <- pbtree(n = 10, scale = 1)
> tree$tip.label <- sprintf("sp%s",seq(1:n))
> x <- fastBM(tree)
> y <- fastBM(tree)
> X=data.frame(x,y)
> rownames(X)=tree$tip.label
> ## Revell (2009)
> A<-matrix(1,nrow(X),1)%*%apply(X,2,fastAnc,tree=tree)[1,]
> V1<-t(X-A)%*%solve(vcv(tree))%*%(X-A)/(nrow(X)-1)
> ## Butler et al. (2000)
> Z<-solve(t(chol(vcv(tree))))%*%(X-A)
> V2<-t(Z)%*%Z/(nrow(X)-1)
>
> ## pics
> Y<-apply(X,2,pic,phy=tree)
> V3<-t(Y)%*%Y/nrow(Y)
>
> 2) The phyloVCV matrix has n x n coordinates defined by the n species, and it represents covariances among observations made across the n species, right?. Still, I do no know whether these covariances are calculated over a) X vs Y values for each pair of species coordinates in the matrix, across the n species, or b) directly over the vector of n residuals of Y, after correlating Y vs X, across all pairs of species coordinates. I think it may be a) because, by definition, variance cannot be calculated for a single value. I am not sure though, since it seems the whole point of PGLS is to control phylosignal within the residuals of a regression procedure, prior to actually making it.
>
> 3) If I create two perfeclty correlated variables with independent observations and calculate a covariance or correlation matrix for them, I do not get a diagonal matrix, with zeros at the off diagonals (ex. here <https://www.dropbox.com/s/y8g3tkzk509pz58/vcvexamplewithrandomvariables.xlsx?dl=0>), why expect then a diagonal matrix for the case of independence among the observations?

Agus Dr. Agustín Camacho Guerrero.
Universidade de São Paulo.
http://www.agustincamacho.com

