Re: [R-sig-phylo] Question on pCCA (phyl.cca)

Julien Clavel Thu, 15 Oct 2015 11:32:47 -0700

Hi Sandra,

Both approaches are (p)GLS, but they represent different models: multivariate 
and multiple regression.
In the first you have several dependent variables, in the second you have 
several independent variables. 
The error associated with the different responses (or dependent) variables may 
have different variances and may be correlated in the multivariate model.
For sure you can dilute the power of the analysis when you try to make more 
estimates with the same amount of data (as it is the case in the multivariate 
model).


The approach which consist in replacing the dependent and independent variable, 
is similar to what is called "reverse regression". But doing so you make 
particular assumptions because the nature of the stochastic error is different 
in each regressions. Maybe someone have more thought on this and if this can be 
misleading?


Julien

________________________________
> Date: Wed, 14 Oct 2015 16:13:03 -0300
> Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca)
> From: [email protected]
> To: [email protected]
> CC: [email protected]; [email protected]
>
> Oi Julien,
>
> Yes you are right, I was comparing things that are not the same!
> However, even changing the p-values as you suggested, the conclusions
> (which is really what i'm interested in) are still much different
> between the two methods. For the example I gave above, the correlation
> is still not significant (p-value = 0.16).
>
> Overall i get much less significant correlations with this method
> compared to pGLS. Is there an explanation? Could it be a problem of
> over-parametrisation?
>
> Best,
> Sandra.
>
>
> 2015-10-14 13:37 GMT-03:00 Julien Clavel
> <[email protected]<mailto:[email protected]>>:
> Sandra,
>
> I just figured out, are you directly comparing the coefficients
> (slopes) of the multiple regression analysis with the evolutionary
> correlations computed with mvBM?
>
> These are two distincts things although they are related (same sign)!
> For instance, while correlations (and evolutionary correlations) range
> between -1 and 1, the range of the slope can take higher values.
>
> Best,
>
> Julien
>
> ----------------------------------------
>> From: [email protected]<mailto:[email protected]>
>> To: [email protected]<mailto:[email protected]>;
> [email protected]<mailto:[email protected]>
>> Date: Wed, 14 Oct 2015 17:50:37 +0200
>> CC: [email protected]<mailto:[email protected]>
>> Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca)
>>
>> Hi Sandra,
>>
>> Maybe I choose a bad example to illustrate the approach. In fact I
> simulated positive correlations so the way the p-value was computed was
> for illustrative purpose and is wrong for negative correlations (we
> need a two-sided test). In fact you must look whether your 95%
> confidence interval for the parameter include 0 or not.
>>
>> Just replace:
>> p_values <- apply(correl_boot,2,function(x){ mean(x < 0) })
>>
>> by something like that (if I'm not wrong)
>>
>> pvalue <- function(x){
>> neg <- mean(x>0)
>> pos <- mean(x<0)
>> pval <- min(neg,pos)*2
>> return(pval)
>> }
>>
>> p_values <- apply(correl_boot,2, pvalue)
>>
>> This should fix the problem.
>>
>> Julien
>>
>> ________________________________
>>> Date: Wed, 14 Oct 2015 11:35:43 -0300
>>> Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca)
>>> From: [email protected]<mailto:[email protected]>
>>> To: [email protected]<mailto:[email protected]>
>>> CC: [email protected]<mailto:[email protected]>;
> [email protected]<mailto:[email protected]>
>>>
>>> Dear Liam and Julien,
>>>
>>> Thank you both for your help. I have tried both solutions on the same
>>> dataset under a BM model to compare them. I expected to get similar
>>> results, at least for the general conclusions on the significance of
>>> correlations.
>>>
>>> for Julien's solution, I ran:
>>>
>>> model<-mvBM(tree, data, model="BM1",method="pic")
>>> cov2cor(model$sigma)
>>>
>>> # bootstrapping to get p-values
>>> data_boot <- simulate(model, tree=tree, nsim=100)
>>> model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x,
>>> model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) })
>>> correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] }))
>>> p_values <- apply(correl_boot,2,function(x){ mean(x < 0) })
>>>
>>> # the correlations and associated p-values
>>> correl <- t( cov2cor(model$sigma)[upper.tri(cov2cor(model$sigma))] )
>>> res<-rbind(correl, p_values)
>>>
>>>
>>> For Liam's solution, I ran:
>>>
>>> corbr<-corBrownian(phy=tree)
>>>
>>> glsp1<-
>>> gls(ac.var1~env.var1+env.var2+body.size+ac.var2+ac.var3+ac.var4,
>>> correlation=corbr, data=data)
>>> glsp2<-
>>> gls(ac.var2~env.var1+env.var2+body.size+ac.var1+ac.var3+ac.var4,
>>> correlation=corbr, data=data)
>>> glsp3<-
>>> gls(ac.var3~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var4,
>>> correlation=corbr, data=data)
>>> glsp4<-
>>> gls(ac.var4~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var3,
>>> correlation=corbr, data=data)
>>>
>>>
>>> I'm getting much more significant correlations with pGLS than with
>>> mvBM, and the correlation coefficients are sometimes very different as
>>> well. The results I get from the pGLS are generally more intuitive
>>> biologically. For example, for the dominant frequency of calls:
>>>
>>>
>>> DF~body.size with mvBM: -0.29 p-value=0.92 and with pGLS: -0.26
>>> p-value=0.016. Here the coefficients are close, but the significance is
>>> very different. Biologically, we expect a significant correlation
>>> between these two variables, so it is quite surprising to have such a
>>> high p-value for the mvBM method.
>>>
>>> DF~env.var1 with mvBM:0.35 p-value=0.03 and with pGLS: 0.021 p-value=
>>> 0.09. Here the coefficients are really quite different...
>>>
>>> Is it normal that I'm getting such different results? Did I miss
> something?
>>>
>>> Best,
>>> Sandra.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2015-10-13 15:18 GMT-03:00 Liam J. Revell
>>>
> <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>:
>>> Hi Sandra.
>>>
>>> If you are interested in the individual correlations, why not just do a
>>> PGLS in which (on at a time) the variables of interest are set as the
>>> independent variable.
>>>
>>> So, for instance, to test "is environemental variable A correlated with
>>> acoustic variable Z" (controlling for all other environmental variables
>>> & acoustic variables), one would do:
>>>
>>> acoustic variable Z ~ environmental variable A + all other
>>> environmental variable + all other acoustic variables + phylogenetic
>>> residual error
>>>
>>> If you have collinearity, then you may not be able distinguish between
>>> the effects of different variables - but this will be true for all
>>> statistical models. (Imagine the worst case in which variables A & B
>>> were perfectly collinear, well then it becomes theoretically impossible
>>> to determine if A is affecting Z or B, no matter what we do.)
>>>
>>> All the best, Liam
>>>
>>> Liam J. Revell, Associate Professor of Biology
>>> University of Massachusetts Boston
>>> web: http://faculty.umb.edu/liam.revell/
>>> email:
> [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>
>>> blog: http://blog.phytools.org
>>>
>>>
>>> On 10/13/2015 5:47 AM, Julien Clavel wrote:
>>> Hi Sandra,
>>>
>>> I think, a multivariate correlational model with your continuous traits
>>> as response variables can be used to do what you want.
>>> For instance, assuming Brownian motion:
>>>
>>>
>>> library(mvMORPH)
>>> set.seed(123)
>>>
>>> ## First simulate some data (3 traits)
>>> # simulate a tree
>>> tree <- pbtree(n=100)
>>>
>>> # Define a correlation matrix
>>> C <- matrix(c(1,0.7,0.3,0.7,1,0.3,0.3,0.3,1),3)
>>>
>>> # Define the variance to build a covariance matrix
>>> D <- diag(c(sqrt(0.3),sqrt(0.2),sqrt(0.1)),3)
>>>
>>> # Compute the covariance matrix
>>> sigma <- D%*%C%*%D
>>>
>>> # Ancestral states
>>> theta <- c(0,0,0)
>>>
>>> # simulate the traits
>>> data<-mvSIM(tree, param=list(sigma=sigma, ntraits=3, mu=theta,
>>> names_traits=c("Trait 1","Trait 2", "Trait
>>> 3")), model="BM1", nsim=1)
>>>
>>>
>>>
>>>
>>> ## Now we fit the multivariate brownian model
>>> model<-mvBM(tree, data, model="BM1",method="pic")
>>>
>>> # Obtain the traits correlations (very close to the generating values):
>>> cov2cor(model$sigma)
>>>
>>> # You can compare the fit of a model with independent Brownian motion
>>> model_2<-mvBM(tree, data, model="BM1",method="pic",
>>> param=list(constraint="diagonal"))
>>>
>>> # Test the significance with an LRT test
>>> LRT(model,model_2)
>>>
>>>
>>>
>>>
>>> # But if you want to assess the significance for each traits when there
>>> is more than two traits, use bootstrap instead:
>>>
>>> # Simulate data under the MLE and then fit the model to simulated traits
>>> data_boot <- simulate(model, tree=tree, nsim=100)
>>> model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x,
>>> model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) })
>>>
>>> # Retrieve the results
>>> correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] }))
>>> colnames(correl_boot)<-c("[1,2]","[1,3]","[2,3]")
>>>
>>> # Plot the results
>>> boxplot(correl_boot, main="Correlations")
>>>
>>> # Assess if simulated data under the MLE are significantly different
>>> from zero
>>> p_values <- apply(correl_boot,2,function(x){ mean(x < 0) })
>>> p_values
>>>
>>>
>>>
>>> # You can also use the same approach described above if the best fit
>>> process is an Ornstein-Uhlenbeck (which is unlikely here); e.g.:
>>>
>>> model_ou <- mvOU(tree, data, model="OU1", method="rpf",
>>> param=list(vcv="ouch"))
>>> cov2cor(stationary(model_ou))
>>>
>>> There is some related examples in the package vignette.
>>> HTH,
>>>
>>>
>>> Julien
>>>
>>> Date: Mon, 12 Oct 2015 14:34:26 -0300
>>> From:
> [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>
>>> To:
> [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>
>>> Subject: [R-sig-phylo] Question on pCCA (phyl.cca)
>>>
>>> Dear list,
>>>
>>> I have 2 sets of continuous variables (acoustic and environmental) and a
>>> phylogeny for a group of species. I am looking for correlations
> between the
>>> environmental variables and the acoustic ones. I ran a pCCA (phyl.cca
>>> {phytools} ), which gave me p-values for the correlations of each of my
>>> environmental variables with* all* acoustic variables. However, i have
>>> specific hyptotheses I would like to test such as: is environemental
>>> variable A correlated with acoustic variable Z?
>>>
>>> If I run univariate pGLS for each of my acoustic variables, I have the
>>> problem that I am not taking into account the potential colinearity
> between
>>> my acoustic variables... So I can first run pPCA, but then i'm not really
>>> testing my original hypotheses on specific variables.
>>>
>>> So my question is: is there a way to test the significance of correlation
>>> among continuous variables from two sets (taking the phylogenetic
> componant
>>> into account)?
>>>
>>> Any input will be very much apreciated!
>>> Best,
>>> Sandra.
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-phylo mailing list -
>>>
> [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>> Searchable archive at
>>>
> http://www.mail-archive.com/[email protected]/<http://www.mail-archive.com/r-sig-phylo%40r-project.org/><http://www.mail-archive.com/r-sig-phylo%40r-project.org/>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-phylo mailing list -
>>>
> [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>> Searchable archive at
>>>
> http://www.mail-archive.com/[email protected]/<http://www.mail-archive.com/r-sig-phylo%40r-project.org/><http://www.mail-archive.com/r-sig-phylo%40r-project.org/>
>>>
>>>
>>>
>>>
>>> --
>>> Ph.D.
>>> Laboratório de História Natural de Anfíbios Brasileiros
>>> (LaHNAB<http://www.naturalhistory.com.br/>)
>>> Departamento de Biologia Animal (Bloco P)
>>> Instituto de Biologia (IB)
>>> Universidade Estadual de Campinas (UNICAMP)
>>> Rua Monteiro Lobato, 255, CEP 13083-862
>>> Campinas, São Paulo
>>> Brazil
>>> tel: +55 (19) 99978-7173
>>> Flickr:
>>>
> https://www.flickr.com/gp/133250906@N05/g4196q<https://www.flickr.com/gp/133250906%40N05/g4196q><https://www.flickr.com/gp/133250906%40N05/g4196q>
>>>
>>
>> _______________________________________________
>> R-sig-phylo mailing list -
> [email protected]<mailto:[email protected]>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at
> http://www.mail-archive.com/[email protected]/<http://www.mail-archive.com/r-sig-phylo%40r-project.org/>
>
>
>
>
> --
> Ph.D.
> Laboratório de História Natural de Anfíbios Brasileiros
> (LaHNAB<http://www.naturalhistory.com.br/>)
> Departamento de Biologia Animal (Bloco P)
> Instituto de Biologia (IB)
> Universidade Estadual de Campinas (UNICAMP)
> Rua Monteiro Lobato, 255, CEP 13083-862
> Campinas, São Paulo
> Brazil
> tel: +55 (19) 99978-7173
> Flickr:
> https://www.flickr.com/gp/133250906@N05/g4196q<https://www.flickr.com/gp/133250906%40N05/g4196q>
>
                                          
_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

Re: [R-sig-phylo] Question on pCCA (phyl.cca)

Reply via email to