[R-sig-phylo] understanding variance-covariance matrix

Agus Camacho Sat, 25 Aug 2018 10:17:13 -0700

Dear list users,

I am trying to make an easy R demonstration to teach the
variance-covariance matrix to students. However, After consulting the
internet and books, I found myself facing three difficulties to understand
the math and code behind this important matrix. As this list is answered by
several authors of books of phylocomp methods, thought this might make an
useful general discussion.


Here we go,

1) I dont know how to generate a phyloVCV matrix in R (Liams kindly
described some options here
<http://blog.phytools.org/2013/12/three-different-ways-to-calculate-among.html>
but I cannot tell for sure what is X made of. It would seem a dataframe of
some variables measured across species. But then, I get errors when I
write:

 tree <- pbtree(n = 10, scale = 1)
 tree$tip.label <- sprintf("sp%s",seq(1:n))
 x <- fastBM(tree)
y <- fastBM(tree)
  X=data.frame(x,y)
 rownames(X)=tree$tip.label
 ## Revell (2009)
 A<-matrix(1,nrow(X),1)%*%apply(X,2,fastAnc,tree=tree)[1,]
 V1<-t(X-A)%*%solve(vcv(tree))%*%(X-A)/(nrow(X)-1)
   ## Butler et al. (2000)
   Z<-solve(t(chol(vcv(tree))))%*%(X-A)
 V2<-t(Z)%*%Z/(nrow(X)-1)

   ## pics
   Y<-apply(X,2,pic,phy=tree)
 V3<-t(Y)%*%Y/nrow(Y)

2) The phyloVCV matrix has n x n coordinates defined by the n species, and
it represents covariances among observations made across the n species,
right?. Still, I do no know whether these covariances are calculated over
a) X vs Y values for each pair of species coordinates in the matrix, across
the n species, or b) directly over the vector of n residuals of Y, after
correlating Y vs X, across all pairs of species coordinates. I think it may
be a) because, by definition, variance cannot be calculated for a single
value. I am not sure though, since it seems the whole point of PGLS is to
control phylosignal within the residuals of a regression procedure, prior
to actually making it.

3) If I create two perfeclty correlated variables with independent
observations and calculate a covariance or correlation matrix for them, I
do not get a diagonal matrix, with zeros at the off diagonals (ex. here
<https://www.dropbox.com/s/y8g3tkzk509pz58/vcvexamplewithrandomvariables.xlsx?dl=0>),
why expect then a diagonal matrix for the case of independence among the
observations?

Thanks in advance and sorry if I missed anything obvious here!
Agus
Dr. Agustín Camacho Guerrero. Universidade de São Paulo.
http://www.agustincamacho.com
Laboratório de Comportamento e Fisiologia Evolutiva, Departamento de
Fisiologia,
Instituto de Biociências, USP.Rua do Matão, trav. 14, nº 321, Cidade
Universitária,
São Paulo - SP, CEP: 05508-090, Brasil.

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

[R-sig-phylo] understanding variance-covariance matrix

Reply via email to