Re: [R-sig-phylo] PGLS vs lm

2013-08-02 Thread Tom Schoenemann
My goal, it seems to me, is to get a bunch of replications of data in which one 
trait shows a phylogenetic signal, but the other one does not, but also that 
both share some predefined correlation with each other (over time). I can then 
test different kinds of methods to see which would be most appropriate 
statistical method for this kind of problem.

I can see how I could simulate traits evolving with a given correlation value 
over a given tree, using sim.char() in R. However, won't this leave me with 
traits in which both have the same phylogenetic signal?

Is my only option to simulate huge numbers of traits, half of which are 
evolving consistent with some tree, and the other half are independent of the 
tree (i.e., random numbers?), and then correlate pairs (one from each of these 
groups), retaining just those that have the level of correlation I'm interested 
in exploring? 

Thanks for any suggestions,

-Tom


On Jul 26, 2013, at 6:42 PM, Theodore Garland Jr  
wrote:

> Hi Tom,
> 
> So far I have resisted jumping in here, but maybe this will help.
> Come up with a model for how you think your traits of interest might evolve 
> together in a correlated fashion along a phylogenetic tree.
> Now implement it in a computer simulation along a phylogenetic tree.
> Also implement the model with no correlation between the traits.  
> Analyze the data with whatever methods you choose.
> Check the Type I error rate and then the power of each method.  Also check 
> the bias and means squared error for the parameter you are trying to estimate.
> See what method works best.
> Use that method for your data if you have some confidence that the model you 
> used to simulate trait evolution is reasonable, based on your understanding 
> (and intuition) about the biology involved.
> 
> Lots of us have done this sort of thing, e.g., check this:
> 
> Martins, E. P., and T. Garland, Jr. 1991. Phylogenetic analyses of the 
> correlated evolution of continuous characters: a simulation study. Evolution 
> 45:534-557.
> 
> 
> 
> Cheers,
> Ted
> 
> Theodore Garland, Jr., Professor
> Department of Biology
> University of California, Riverside
> Riverside, CA 92521
> Office Phone:  (951) 827-3524
> Wet Lab Phone:  (951) 827-5724
> Dry Lab Phone:  (951) 827-4026
> Home Phone:  (951) 328-0820
> Skype:  theodoregarland
> Facsimile:  (951) 827-4286 = Dept. office (not confidential)
> Email:  tgarl...@ucr.edu
> http://www.biology.ucr.edu/people/faculty/Garland.html
> http://scholar.google.com/citations?hl=en&user=iSSbrhwJ
> 
> Inquiry-based Middle School Lesson Plan:
> "Born to Run: Artificial Selection Lab"
> http://www.indiana.edu/~ensiweb/lessons/BornToRun.html
> 
> From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] 
> on behalf of Tom Schoenemann [t...@indiana.edu]
> Sent: Friday, July 26, 2013 3:21 PM
> To: Tom Schoenemann
> Cc: r-sig-phylo@r-project.org
> Subject: Re: [R-sig-phylo] PGLS vs lm
> 
> OK, so I haven't gotten any responses that convince me that PGLS isn't 
> biologically suspect. At the risk of thinking out loud to myself here, I 
> wonder if my finding might have to do with the method detecting phylogenetic 
> signal in the error (residuals?):
> 
> From:
> Revell, L. J. (2010). Phylogenetic signal and linear regression on species 
> data. Methods in Ecology and Evolution, 1(4), 319-329.
> 
> I note the following: "...the suitability of a phylogenetic regression should 
> actually be diagnosed by estimating phylogenetic signal in the residual 
> deviations of Y given our predictors (X1, X2, etc.)."
> 
> Let's say one variable, "A", has a strong evolutionary signal, but the other, 
> variable "B", does not. Would we expect this to affect a PGLS differently if 
> we use A to predict B, vs. using B to predict A?  
> 
> If so, it would explain my findings. However, given the difference, I can 
> have no confidence that there is, or is not, a significant covariance between 
> A and B independent of phylogeny. Doesn't this finding call into question the 
> method itself?
> 
> More directly, how is one to interpret such a finding? Is there, or is there 
> not, a significant biological association?
> 
> -Tom
> 
> 
> On Jul 21, 2013, at 11:47 PM, Tom Schoenemann  wrote:
> 
> > Thanks Liam,
> > 
> > A couple of questions: 
> > 
> > How does one do a hypothesis test on a regression, controlling for 
> > phylogeny, if not using PGLS as I am doing?  I realize one could use 
> > independent contrasts, though I was led to believe that is equivalent to a 
> > PGLS with lambda = 1.  
> > 
> > I take it from what you wrote th

Re: [R-sig-phylo] PGLS vs lm

2013-07-26 Thread Tom Schoenemann
Thanks for the suggestions. I'll see if I can implement them.

However, I'm curious if anyone can address my specific questions: Does it make 
biological sense for one variable "A" to predict another "B" significantly, but 
for "B" to predict "A"?

-Tom

On Jul 26, 2013, at 6:42 PM, Theodore Garland Jr  
wrote:

> Hi Tom,
> 
> So far I have resisted jumping in here, but maybe this will help.
> Come up with a model for how you think your traits of interest might evolve 
> together in a correlated fashion along a phylogenetic tree.
> Now implement it in a computer simulation along a phylogenetic tree.
> Also implement the model with no correlation between the traits.  
> Analyze the data with whatever methods you choose.
> Check the Type I error rate and then the power of each method.  Also check 
> the bias and means squared error for the parameter you are trying to estimate.
> See what method works best.
> Use that method for your data if you have some confidence that the model you 
> used to simulate trait evolution is reasonable, based on your understanding 
> (and intuition) about the biology involved.
> 
> Lots of us have done this sort of thing, e.g., check this:
> 
> Martins, E. P., and T. Garland, Jr. 1991. Phylogenetic analyses of the 
> correlated evolution of continuous characters: a simulation study. Evolution 
> 45:534-557.
> 
> 
> 
> Cheers,
> Ted
> 
> Theodore Garland, Jr., Professor
> Department of Biology
> University of California, Riverside
> Riverside, CA 92521
> Office Phone:  (951) 827-3524
> Wet Lab Phone:  (951) 827-5724
> Dry Lab Phone:  (951) 827-4026
> Home Phone:  (951) 328-0820
> Skype:  theodoregarland
> Facsimile:  (951) 827-4286 = Dept. office (not confidential)
> Email:  tgarl...@ucr.edu
> http://www.biology.ucr.edu/people/faculty/Garland.html
> http://scholar.google.com/citations?hl=en&user=iSSbrhwJ
> 
> Inquiry-based Middle School Lesson Plan:
> "Born to Run: Artificial Selection Lab"
> http://www.indiana.edu/~ensiweb/lessons/BornToRun.html
> 
> From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] 
> on behalf of Tom Schoenemann [t...@indiana.edu]
> Sent: Friday, July 26, 2013 3:21 PM
> To: Tom Schoenemann
> Cc: r-sig-phylo@r-project.org
> Subject: Re: [R-sig-phylo] PGLS vs lm
> 
> OK, so I haven't gotten any responses that convince me that PGLS isn't 
> biologically suspect. At the risk of thinking out loud to myself here, I 
> wonder if my finding might have to do with the method detecting phylogenetic 
> signal in the error (residuals?):
> 
> From:
> Revell, L. J. (2010). Phylogenetic signal and linear regression on species 
> data. Methods in Ecology and Evolution, 1(4), 319-329.
> 
> I note the following: "...the suitability of a phylogenetic regression should 
> actually be diagnosed by estimating phylogenetic signal in the residual 
> deviations of Y given our predictors (X1, X2, etc.)."
> 
> Let's say one variable, "A", has a strong evolutionary signal, but the other, 
> variable "B", does not. Would we expect this to affect a PGLS differently if 
> we use A to predict B, vs. using B to predict A?  
> 
> If so, it would explain my findings. However, given the difference, I can 
> have no confidence that there is, or is not, a significant covariance between 
> A and B independent of phylogeny. Doesn't this finding call into question the 
> method itself?
> 
> More directly, how is one to interpret such a finding? Is there, or is there 
> not, a significant biological association?
> 
> -Tom
> 
> 
> On Jul 21, 2013, at 11:47 PM, Tom Schoenemann  wrote:
> 
> > Thanks Liam,
> > 
> > A couple of questions: 
> > 
> > How does one do a hypothesis test on a regression, controlling for 
> > phylogeny, if not using PGLS as I am doing?  I realize one could use 
> > independent contrasts, though I was led to believe that is equivalent to a 
> > PGLS with lambda = 1.  
> > 
> > I take it from what you wrote that the PGLS in caper does a ML of lambda 
> > only on y, when doing the regression? Isn't this patently wrong, 
> > biologically speaking? Phylogenetic effects could have been operating on 
> > both x and y - we can't assume that it would only be relevant to y. 
> > Shouldn't phylogenetic methods account for both?
> > 
> > You say you aren't sure it is a good idea to jointly optimize lambda for x 
> > & y.  Can you expand on this?  What would be a better solution (if there is 
> > one)?
> > 
> > Am I wrong that it makes no evolutionary biological sense 

Re: [R-sig-phylo] PGLS vs lm

2013-07-26 Thread Theodore Garland Jr
Hi Tom,

So far I have resisted jumping in here, but maybe this will help.
Come up with a model for how you think your traits of interest might evolve 
together in a correlated fashion along a phylogenetic tree.
Now implement it in a computer simulation along a phylogenetic tree.
Also implement the model with no correlation between the traits.
Analyze the data with whatever methods you choose.
Check the Type I error rate and then the power of each method.  Also check the 
bias and means squared error for the parameter you are trying to estimate.
See what method works best.
Use that method for your data if you have some confidence that the model you 
used to simulate trait evolution is reasonable, based on your understanding 
(and intuition) about the biology involved.

Lots of us have done this sort of thing, e.g., check this:

Martins, E. P., and T. Garland, Jr. 1991. Phylogenetic analyses of the 
correlated evolution of continuous characters: a simulation study. Evolution 
45:534-557.

Cheers,
Ted

Theodore Garland, Jr., Professor
Department of Biology
University of California, Riverside
Riverside, CA 92521
Office Phone:  (951) 827-3524
Wet Lab Phone:  (951) 827-5724
Dry Lab Phone:  (951) 827-4026
Home Phone:  (951) 328-0820
Skype:  theodoregarland
Facsimile:  (951) 827-4286 = Dept. office (not confidential)
Email:  tgarl...@ucr.edu
http://www.biology.ucr.edu/people/faculty/Garland.html
http://scholar.google.com/citations?hl=en&user=iSSbrhwJ

Inquiry-based Middle School Lesson Plan:
"Born to Run: Artificial Selection Lab"
http://www.indiana.edu/~ensiweb/lessons/BornToRun.html


From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on 
behalf of Tom Schoenemann [t...@indiana.edu]
Sent: Friday, July 26, 2013 3:21 PM
To: Tom Schoenemann
Cc: r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] PGLS vs lm

OK, so I haven't gotten any responses that convince me that PGLS isn't 
biologically suspect. At the risk of thinking out loud to myself here, I wonder 
if my finding might have to do with the method detecting phylogenetic signal in 
the error (residuals?):

From:
Revell, L. J. (2010). Phylogenetic signal and linear regression on species 
data. Methods in Ecology and Evolution, 1(4), 319-329.

I note the following: "...the suitability of a phylogenetic regression should 
actually be diagnosed by estimating phylogenetic signal in the residual 
deviations of Y given our predictors (X1, X2, etc.)."

Let's say one variable, "A", has a strong evolutionary signal, but the other, 
variable "B", does not. Would we expect this to affect a PGLS differently if we 
use A to predict B, vs. using B to predict A?

If so, it would explain my findings. However, given the difference, I can have 
no confidence that there is, or is not, a significant covariance between A and 
B independent of phylogeny. Doesn't this finding call into question the method 
itself?

More directly, how is one to interpret such a finding? Is there, or is there 
not, a significant biological association?

-Tom


On Jul 21, 2013, at 11:47 PM, Tom Schoenemann  wrote:

> Thanks Liam,
>
> A couple of questions:
>
> How does one do a hypothesis test on a regression, controlling for phylogeny, 
> if not using PGLS as I am doing?  I realize one could use independent 
> contrasts, though I was led to believe that is equivalent to a PGLS with 
> lambda = 1.
>
> I take it from what you wrote that the PGLS in caper does a ML of lambda only 
> on y, when doing the regression? Isn't this patently wrong, biologically 
> speaking? Phylogenetic effects could have been operating on both x and y - we 
> can't assume that it would only be relevant to y. Shouldn't phylogenetic 
> methods account for both?
>
> You say you aren't sure it is a good idea to jointly optimize lambda for x & 
> y.  Can you expand on this?  What would be a better solution (if there is 
> one)?
>
> Am I wrong that it makes no evolutionary biological sense to use a method 
> that gives different estimates of the probability of a relationship based on 
> the direction in which one looks at the relationship? Doesn't the fact that 
> the method gives different answers in this way invalidate the method for 
> taking phylogeny into account when assessing relationships among biological 
> taxa?  How could it be biologically meaningful for phylogeny to have a 
> greater influence when x is predicting y, than when y is predicting x?  Maybe 
> I'm missing something here.
>
> -Tom
>
>
> On Jul 21, 2013, at 8:59 PM, Liam J. Revell  wrote:
>
>> Hi Tom.
>>
>> Joe pointed out that if we assume that our variables are multivariate 
>> normal, then a hypothesis test on the regression is the same as a test that 
>> cov(x,y) is

Re: [R-sig-phylo] PGLS vs lm

2013-07-26 Thread Tom Schoenemann
OK, so I haven't gotten any responses that convince me that PGLS isn't 
biologically suspect. At the risk of thinking out loud to myself here, I wonder 
if my finding might have to do with the method detecting phylogenetic signal in 
the error (residuals?):

From:
Revell, L. J. (2010). Phylogenetic signal and linear regression on species 
data. Methods in Ecology and Evolution, 1(4), 319-329.

I note the following: "...the suitability of a phylogenetic regression should 
actually be diagnosed by estimating phylogenetic signal in the residual 
deviations of Y given our predictors (X1, X2, etc.)."

Let's say one variable, "A", has a strong evolutionary signal, but the other, 
variable "B", does not. Would we expect this to affect a PGLS differently if we 
use A to predict B, vs. using B to predict A?  

If so, it would explain my findings. However, given the difference, I can have 
no confidence that there is, or is not, a significant covariance between A and 
B independent of phylogeny. Doesn't this finding call into question the method 
itself?

More directly, how is one to interpret such a finding? Is there, or is there 
not, a significant biological association?

-Tom


On Jul 21, 2013, at 11:47 PM, Tom Schoenemann  wrote:

> Thanks Liam,
> 
> A couple of questions: 
> 
> How does one do a hypothesis test on a regression, controlling for phylogeny, 
> if not using PGLS as I am doing?  I realize one could use independent 
> contrasts, though I was led to believe that is equivalent to a PGLS with 
> lambda = 1.  
> 
> I take it from what you wrote that the PGLS in caper does a ML of lambda only 
> on y, when doing the regression? Isn't this patently wrong, biologically 
> speaking? Phylogenetic effects could have been operating on both x and y - we 
> can't assume that it would only be relevant to y. Shouldn't phylogenetic 
> methods account for both?
> 
> You say you aren't sure it is a good idea to jointly optimize lambda for x & 
> y.  Can you expand on this?  What would be a better solution (if there is 
> one)?
> 
> Am I wrong that it makes no evolutionary biological sense to use a method 
> that gives different estimates of the probability of a relationship based on 
> the direction in which one looks at the relationship? Doesn't the fact that 
> the method gives different answers in this way invalidate the method for 
> taking phylogeny into account when assessing relationships among biological 
> taxa?  How could it be biologically meaningful for phylogeny to have a 
> greater influence when x is predicting y, than when y is predicting x?  Maybe 
> I'm missing something here.
> 
> -Tom 
> 
> 
> On Jul 21, 2013, at 8:59 PM, Liam J. Revell  wrote:
> 
>> Hi Tom.
>> 
>> Joe pointed out that if we assume that our variables are multivariate 
>> normal, then a hypothesis test on the regression is the same as a test that 
>> cov(x,y) is different from zero.
>> 
>> If you insist on using lambda, one logical extension to this might be to 
>> jointly optimize lambda for x & y (following Freckleton et al. 2002) and 
>> then fix the value of lambda at its joint MLE during GLS. This would at 
>> least have the property of guaranteeing that the P-values for y~x and x~y 
>> are the same
>> 
>> I previously posted code for joint estimation of lambda on my blog here: 
>> http://blog.phytools.org/2012/09/joint-estimation-of-pagels-for-multiple.html.
>> 
>> With this code to fit joint lambda, our analysis would then look something 
>> like this:
>> 
>> require(phytools)
>> require(nlme)
>> lambda<-joint.lambda(tree,cbind(x,y))$lambda
>> fit1<-gls(y~x,data=data.frame(x,y),correlation=corPagel(lambda,tree,fixed=TRUE))
>> fit2<-gls(x~y,data=data.frame(x,y),correlation=corPagel(lambda,tree,fixed=TRUE))
>> 
>> I'm not sure that this is a good idea - but it is possible
>> 
>> - Liam
>> 
>> Liam J. Revell, Assistant Professor of Biology
>> University of Massachusetts Boston
>> web: http://faculty.umb.edu/liam.revell/
>> email: liam.rev...@umb.edu
>> blog: http://blog.phytools.org
>> 
>> On 7/21/2013 6:15 PM, Tom Schoenemann wrote:
>>> Hi all,
>>> 
>>> I'm still unsure of how I should interpret results given that using PGLS
>>> to predict group size from brain size gives different significance
>>> levels and lambda estimates than when I do the reverse (i.e., predict
>>> brain size from group size).  Biologically, I don't think this makes any
>>> sense.  If lambda is an estimate of the phylogenetic signal, what
>>> possible evolutionary and biological sense are we to make if the
>>> estimates of lambda are significantly different depending on which way
>>> the association is assessed? I understand the mathematics may allow
>>> this, but if I can't make sense of this biologically, then doesn't it
>>> call into question the use of this method for these kinds of questions
>>> in the first place?  What am I missing here?
>>> 
>>> Here is some results from data I have that illustrate this (notice that
>>> the lambda values 

Re: [R-sig-phylo] PGLS vs lm

2013-07-22 Thread Tom Schoenemann
Dear Santiago,

I agree that evolving traits might have all sorts of complicated relationships, 
but that doesn't mean we shouldn't rule out simple relationships first. And 
besides, the most basic question one can ask - really the first question to ask 
- is whether there is any association at all between two variables. If we are 
trying to find out if such an association exists, independent of phylogeny, 
then we need a method that gives the same results regardless of whether which 
variable we look at.  Of course the slope of any relationship will be 
different, depending on whether we are trying to predict x from y, or y from x. 
But that shouldn't biologically affect the covariance between the two 
variables. The covariance by definition is not a measure of x specifically from 
y, or vice-versa, it is a measure of how they both covary (there is no 
directionality to this). So any method that suggests one degree of confidence 
in this covariance if we look at x from y, and a different degree of confidence 
if we look at y from x, is simply not biologically valid for assessing 
covariance.

To put it in the context of brain and group size: Is group size covarying 
significantly with brain size or not?  Well, if you try to predict group size 
from brain size, then PGLS says the confidence we should have of this 
covariance is higher than if you try to predict brain size from group size. 
This makes no biological sense, and I maintain this makes PGLS invalid for 
assessing the significance of covariance between two variables.

-Tom

 
On Jul 22, 2013, at 2:02 AM, Santiago Claramunt  wrote:

> Dear Tom,
> 
> If your concept of 'relationship' is a simple correlation analysis, then it 
> may not make sense to get different estimates of the 'probability of the 
> relationship'. But in evolutionary biology things are always more complicated 
> than a simple correlation model. Things are not linear, causality is 
> indirect, and, yes, observations are not independent because of phylogen (and 
> space). We clearly need methods that are more sophisticated than a simple 
> correlation analysis.
> 
> Brain size and groups size are variables of very different nature, and their 
> relationship may be the product of natural selection acting on lineages over 
> evolutionary time, which form phylogenies. I don't see any problem in 
> obtaining somewhat different results depending on how the relationship is 
> modeled.
> 
> Santiago
> 
> 
> On Jul 21, 2013, at 11:47 PM, Tom Schoenemann wrote:
> 
>> Thanks Liam,
>> 
>> A couple of questions: 
>> 
>> How does one do a hypothesis test on a regression, controlling for 
>> phylogeny, if not using PGLS as I am doing?  I realize one could use 
>> independent contrasts, though I was led to believe that is equivalent to a 
>> PGLS with lambda = 1.  
>> 
>> I take it from what you wrote that the PGLS in caper does a ML of lambda 
>> only on y, when doing the regression? Isn't this patently wrong, 
>> biologically speaking? Phylogenetic effects could have been operating on 
>> both x and y - we can't assume that it would only be relevant to y. 
>> Shouldn't phylogenetic methods account for both?
>> 
>> You say you aren't sure it is a good idea to jointly optimize lambda for x & 
>> y.  Can you expand on this?  What would be a better solution (if there is 
>> one)?
>> 
>> Am I wrong that it makes no evolutionary biological sense to use a method 
>> that gives different estimates of the probability of a relationship based on 
>> the direction in which one looks at the relationship? Doesn't the fact that 
>> the method gives different answers in this way invalidate the method for 
>> taking phylogeny into account when assessing relationships among biological 
>> taxa?  How could it be biologically meaningful for phylogeny to have a 
>> greater influence when x is predicting y, than when y is predicting x?  
>> Maybe I'm missing something here.
>> 
>> -Tom 
>> 
>> 
>> On Jul 21, 2013, at 8:59 PM, Liam J. Revell  wrote:
>> 
>>> Hi Tom.
>>> 
>>> Joe pointed out that if we assume that our variables are multivariate 
>>> normal, then a hypothesis test on the regression is the same as a test that 
>>> cov(x,y) is different from zero.
>>> 
>>> If you insist on using lambda, one logical extension to this might be to 
>>> jointly optimize lambda for x & y (following Freckleton et al. 2002) and 
>>> then fix the value of lambda at its joint MLE during GLS. This would at 
>>> least have the property of guaranteeing that the P-values for y~x and x~y 
>>> are the same
>>> 
>>> I previously posted code for joint estimation of lambda on my blog here: 
>>> http://blog.phytools.org/2012/09/joint-estimation-of-pagels-for-multiple.html.
>>> 
>>> With this code to fit joint lambda, our analysis would then look something 
>>> like this:
>>> 
>>> require(phytools)
>>> require(nlme)
>>> lambda<-joint.lambda(tree,cbind(x,y))$lambda
>>> fit1<-gls(y~x,data=data.frame(x,y),correlation=corPagel(lambda,tree,f

Re: [R-sig-phylo] PGLS vs lm

2013-07-21 Thread Tom Schoenemann
Thanks Liam,

A couple of questions: 

How does one do a hypothesis test on a regression, controlling for phylogeny, 
if not using PGLS as I am doing?  I realize one could use independent 
contrasts, though I was led to believe that is equivalent to a PGLS with lambda 
= 1.  

I take it from what you wrote that the PGLS in caper does a ML of lambda only 
on y, when doing the regression? Isn't this patently wrong, biologically 
speaking? Phylogenetic effects could have been operating on both x and y - we 
can't assume that it would only be relevant to y. Shouldn't phylogenetic 
methods account for both?

You say you aren't sure it is a good idea to jointly optimize lambda for x & y. 
 Can you expand on this?  What would be a better solution (if there is one)?

Am I wrong that it makes no evolutionary biological sense to use a method that 
gives different estimates of the probability of a relationship based on the 
direction in which one looks at the relationship? Doesn't the fact that the 
method gives different answers in this way invalidate the method for taking 
phylogeny into account when assessing relationships among biological taxa?  How 
could it be biologically meaningful for phylogeny to have a greater influence 
when x is predicting y, than when y is predicting x?  Maybe I'm missing 
something here.

-Tom 


On Jul 21, 2013, at 8:59 PM, Liam J. Revell  wrote:

> Hi Tom.
> 
> Joe pointed out that if we assume that our variables are multivariate normal, 
> then a hypothesis test on the regression is the same as a test that cov(x,y) 
> is different from zero.
> 
> If you insist on using lambda, one logical extension to this might be to 
> jointly optimize lambda for x & y (following Freckleton et al. 2002) and then 
> fix the value of lambda at its joint MLE during GLS. This would at least have 
> the property of guaranteeing that the P-values for y~x and x~y are the 
> same
> 
> I previously posted code for joint estimation of lambda on my blog here: 
> http://blog.phytools.org/2012/09/joint-estimation-of-pagels-for-multiple.html.
> 
> With this code to fit joint lambda, our analysis would then look something 
> like this:
> 
> require(phytools)
> require(nlme)
> lambda<-joint.lambda(tree,cbind(x,y))$lambda
> fit1<-gls(y~x,data=data.frame(x,y),correlation=corPagel(lambda,tree,fixed=TRUE))
> fit2<-gls(x~y,data=data.frame(x,y),correlation=corPagel(lambda,tree,fixed=TRUE))
> 
> I'm not sure that this is a good idea - but it is possible
> 
> - Liam
> 
> Liam J. Revell, Assistant Professor of Biology
> University of Massachusetts Boston
> web: http://faculty.umb.edu/liam.revell/
> email: liam.rev...@umb.edu
> blog: http://blog.phytools.org
> 
> On 7/21/2013 6:15 PM, Tom Schoenemann wrote:
>> Hi all,
>> 
>> I'm still unsure of how I should interpret results given that using PGLS
>> to predict group size from brain size gives different significance
>> levels and lambda estimates than when I do the reverse (i.e., predict
>> brain size from group size).  Biologically, I don't think this makes any
>> sense.  If lambda is an estimate of the phylogenetic signal, what
>> possible evolutionary and biological sense are we to make if the
>> estimates of lambda are significantly different depending on which way
>> the association is assessed? I understand the mathematics may allow
>> this, but if I can't make sense of this biologically, then doesn't it
>> call into question the use of this method for these kinds of questions
>> in the first place?  What am I missing here?
>> 
>> Here is some results from data I have that illustrate this (notice that
>> the lambda values are significantly different from each other):
>> 
>> Group size predicted by brain size:
>> 
>>> model.group.by.brain<-pgls(log(GroupSize) ~ log(AvgBrainWt), data = 
>>> primate_tom, lambda='ML')
>>> summary(model.group.by.brain)
>> 
>> Call:
>> pgls(formula = log(GroupSize) ~ log(AvgBrainWt), data = primate_tom,
>> lambda = "ML")
>> 
>> Residuals:
>>  Min   1Q   Median   3Q  Max
>> -0.27196 -0.07638  0.00399  0.10107  0.43852
>> 
>> Branch length transformations:
>> 
>> kappa  [Fix]  : 1.000
>> lambda [ ML]  : 0.759
>>lower bound : 0.000, p = 4.6524e-08
>>upper bound : 1.000, p = 2.5566e-10
>>95.0% CI   : (0.485, 0.904)
>> delta  [Fix]  : 1.000
>> 
>> Coefficients:
>>  Estimate Std. Error t value Pr(>|t|)
>> (Intercept) -0.080099   0.610151 -0.1313 0.895825
>> log(AvgBrainWt)  0.483366   0.136694  3.5361 0.000622 ***
>> ---
>> Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � 
>> � 1
>> 
>> Residual standard error: 0.1433 on 98 degrees of freedom
>> Multiple R-squared: 0.1132, Adjusted R-squared: 0.1041
>> F-statistic:  12.5 on 2 and 98 DF,  p-value: 1.457e-05
>> 
>> 
>> Brain size predicted by group size:
>> 
>>> model.brain.by.group<-pgls(log(AvgBrainWt) ~ log(GroupSize), data = 
>>> primate_tom, lambda='ML')
>>> summary(model.brain.by.group)
>> 
>> Call:

Re: [R-sig-phylo] PGLS vs lm

2013-07-21 Thread Liam J. Revell

Hi Tom.

Joe pointed out that if we assume that our variables are multivariate 
normal, then a hypothesis test on the regression is the same as a test 
that cov(x,y) is different from zero.


If you insist on using lambda, one logical extension to this might be to 
jointly optimize lambda for x & y (following Freckleton et al. 2002) and 
then fix the value of lambda at its joint MLE during GLS. This would at 
least have the property of guaranteeing that the P-values for y~x and 
x~y are the same


I previously posted code for joint estimation of lambda on my blog here: 
http://blog.phytools.org/2012/09/joint-estimation-of-pagels-for-multiple.html.


With this code to fit joint lambda, our analysis would then look 
something like this:


require(phytools)
require(nlme)
lambda<-joint.lambda(tree,cbind(x,y))$lambda
fit1<-gls(y~x,data=data.frame(x,y),correlation=corPagel(lambda,tree,fixed=TRUE))
fit2<-gls(x~y,data=data.frame(x,y),correlation=corPagel(lambda,tree,fixed=TRUE))

I'm not sure that this is a good idea - but it is possible

- Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 7/21/2013 6:15 PM, Tom Schoenemann wrote:

Hi all,

I'm still unsure of how I should interpret results given that using PGLS
to predict group size from brain size gives different significance
levels and lambda estimates than when I do the reverse (i.e., predict
brain size from group size).  Biologically, I don't think this makes any
sense.  If lambda is an estimate of the phylogenetic signal, what
possible evolutionary and biological sense are we to make if the
estimates of lambda are significantly different depending on which way
the association is assessed? I understand the mathematics may allow
this, but if I can't make sense of this biologically, then doesn't it
call into question the use of this method for these kinds of questions
in the first place?  What am I missing here?

Here is some results from data I have that illustrate this (notice that
the lambda values are significantly different from each other):

Group size predicted by brain size:


model.group.by.brain<-pgls(log(GroupSize) ~ log(AvgBrainWt), data = 
primate_tom, lambda='ML')
summary(model.group.by.brain)


Call:
pgls(formula = log(GroupSize) ~ log(AvgBrainWt), data = primate_tom,
 lambda = "ML")

Residuals:
  Min   1Q   Median   3Q  Max
-0.27196 -0.07638  0.00399  0.10107  0.43852

Branch length transformations:

kappa  [Fix]  : 1.000
lambda [ ML]  : 0.759
lower bound : 0.000, p = 4.6524e-08
upper bound : 1.000, p = 2.5566e-10
95.0% CI   : (0.485, 0.904)
delta  [Fix]  : 1.000

Coefficients:
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.080099   0.610151 -0.1313 0.895825
log(AvgBrainWt)  0.483366   0.136694  3.5361 0.000622 ***
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

Residual standard error: 0.1433 on 98 degrees of freedom
Multiple R-squared: 0.1132, Adjusted R-squared: 0.1041
F-statistic:  12.5 on 2 and 98 DF,  p-value: 1.457e-05


Brain size predicted by group size:


model.brain.by.group<-pgls(log(AvgBrainWt) ~ log(GroupSize), data = 
primate_tom, lambda='ML')
summary(model.brain.by.group)


Call:
pgls(formula = log(AvgBrainWt) ~ log(GroupSize), data = primate_tom,
 lambda = "ML")

Residuals:
  Min   1Q   Median   3Q  Max
-0.38359 -0.08216  0.00902  0.05609  0.27443

Branch length transformations:

kappa  [Fix]  : 1.000
lambda [ ML]  : 1.000
lower bound : 0.000, p = < 2.22e-16
upper bound : 1.000, p = 1
95.0% CI   : (0.992, NA)
delta  [Fix]  : 1.000

Coefficients:
Estimate Std. Error t value  Pr(>|t|)
(Intercept)2.740932   0.446943  6.1326 1.824e-08 ***
log(GroupSize) 0.050780   0.043363  1.17100.2444
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

Residual standard error: 0.122 on 98 degrees of freedom
Multiple R-squared: 0.0138, Adjusted R-squared: 0.003737
F-statistic: 1.371 on 2 and 98 DF,  p-value: 0.2586


On Jul 14, 2013, at 6:18 AM, Emmanuel Paradis 
wrote:


Hi all,

I would like to react a bit on this issue.

Probably one problem is that the distinction "correlation vs. regression" is 
not the same for independent data and for phylogenetic data.

Consider the case of independent observations first. Suppose we are interested 
in the relationship y = b x + a, where x is an environmental variable, say 
latitude. We can get estimates of b and a by moving to 10 well-chosen 
locations, sampling 10 observations  of y (they are independent) and analyse 
the 100 data points with OLS.

Here we cannot say anything about the correlation between x and y
because we controlled the distribution of x. In practice, even if x is
not controlled, this approach is still valid as long as the observations
are independent.


With phylogenetic data, x is n

Re: [R-sig-phylo] PGLS vs lm

2013-07-21 Thread Tom Schoenemann
Hi all,

I'm still unsure of how I should interpret results given that using PGLS to 
predict group size from brain size gives different significance levels and 
lambda estimates than when I do the reverse (i.e., predict brain size from 
group size).  Biologically, I don't think this makes any sense.  If lambda is 
an estimate of the phylogenetic signal, what possible evolutionary and 
biological sense are we to make if the estimates of lambda are significantly 
different depending on which way the association is assessed? I understand the 
mathematics may allow this, but if I can't make sense of this biologically, 
then doesn't it call into question the use of this method for these kinds of 
questions in the first place?  What am I missing here?

Here is some results from data I have that illustrate this (notice that the 
lambda values are significantly different from each other):

Group size predicted by brain size:

> model.group.by.brain<-pgls(log(GroupSize) ~ log(AvgBrainWt), data = 
> primate_tom, lambda='ML')
> summary(model.group.by.brain)

Call:
pgls(formula = log(GroupSize) ~ log(AvgBrainWt), data = primate_tom, 
lambda = "ML")

Residuals:
 Min   1Q   Median   3Q  Max 
-0.27196 -0.07638  0.00399  0.10107  0.43852 

Branch length transformations:

kappa  [Fix]  : 1.000
lambda [ ML]  : 0.759
   lower bound : 0.000, p = 4.6524e-08
   upper bound : 1.000, p = 2.5566e-10
   95.0% CI   : (0.485, 0.904)
delta  [Fix]  : 1.000

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.080099   0.610151 -0.1313 0.895825
log(AvgBrainWt)  0.483366   0.136694  3.5361 0.000622 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1433 on 98 degrees of freedom
Multiple R-squared: 0.1132, Adjusted R-squared: 0.1041 
F-statistic:  12.5 on 2 and 98 DF,  p-value: 1.457e-05 


Brain size predicted by group size:

> model.brain.by.group<-pgls(log(AvgBrainWt) ~ log(GroupSize), data = 
> primate_tom, lambda='ML')
> summary(model.brain.by.group)

Call:
pgls(formula = log(AvgBrainWt) ~ log(GroupSize), data = primate_tom, 
lambda = "ML")

Residuals:
 Min   1Q   Median   3Q  Max 
-0.38359 -0.08216  0.00902  0.05609  0.27443 

Branch length transformations:

kappa  [Fix]  : 1.000
lambda [ ML]  : 1.000
   lower bound : 0.000, p = < 2.22e-16
   upper bound : 1.000, p = 1
   95.0% CI   : (0.992, NA)
delta  [Fix]  : 1.000

Coefficients:
   Estimate Std. Error t value  Pr(>|t|)
(Intercept)2.740932   0.446943  6.1326 1.824e-08 ***
log(GroupSize) 0.050780   0.043363  1.17100.2444
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.122 on 98 degrees of freedom
Multiple R-squared: 0.0138, Adjusted R-squared: 0.003737 
F-statistic: 1.371 on 2 and 98 DF,  p-value: 0.2586


On Jul 14, 2013, at 6:18 AM, Emmanuel Paradis  wrote:

> Hi all,
> 
> I would like to react a bit on this issue.
> 
> Probably one problem is that the distinction "correlation vs. regression" is 
> not the same for independent data and for phylogenetic data.
> 
> Consider the case of independent observations first. Suppose we are 
> interested in the relationship y = b x + a, where x is an environmental 
> variable, say latitude. We can get estimates of b and a by moving to 10 
> well-chosen locations, sampling 10 observations of y (they are independent) 
> and analyse the 100 data points with OLS. Here we cannot say anything about 
> the correlation between x and y because we controlled the distribution of x. 
> In practice, even if x is not controlled, this approach is still valid as 
> long as the observations are independent.
> 
> With phylogenetic data, x is not controlled if it is measured "on the 
> species" -- in other words it's an evolving trait (or intrinsic variable). x 
> may be controlled if it is measured "outside the species" (extrinsic 
> variable) such as latitude. So the case of using regression or correlation is 
> not the same than above. Combining intrinsic and extinsic variables has 
> generated a lot of debate in the literature.
> 
> I don't think it's a problem of using a method and not another, but rather to 
> use a method keeping in mind what it does (and its assumptions). Apparently, 
> Hansen and Bartoszek consider a range of models including regression models 
> where, by contrast to GLS, the evolution of the predictors is modelled 
> explicitly.
> 
> If we want to progress in our knowledge on how evolution works, I think we 
> have to not limit ourselves to assess whether there is a relationship, but to 
> test more complex models. The case presented by Tom is particularly relevant 
> here (at least to me): testing whether group size affects brain size or the 
> opposite (or both) is an important question. There's been also a lot of 
> debate whether comparative data can answer this question. Maybe what we need 
> here is an approach

Re: [R-sig-phylo] PGLS vs lm

2013-07-14 Thread Rafael Maia
hello everyone,

in case it might be useful to anyone, there are a couple attempts at 
phylogenetic path analysis / structural equation modeling in the literature. I 
haven't looked closely at them and I'm not sure whether or where they are 
implemented, but perhaps others on the list might have more information:

Juan C. Santos and David C. Cannatella. 2011. Phenotypic integration emerges 
from aposematism and scale in poison frogs. PNAS 108(15) 6175-6180 
http://www.pnas.org/content/108/15/6175.full

Juan C. Santos. The implementation of phylogenetic structural equation modeling 
for biological data from variance-covariance matrices, phylogenies, and 
comparative analyses. (Thesis) 
http://repositories.lib.utexas.edu/handle/2152/ETD-UT-2009-12-459

Achaz von Hardenberg and Alejandro Gonzalez-Voyer. 2013. Disentangling 
evolutionary cause-effect relationships with phylogenetic confirmatory path 
analysis. Evolution 67(2) 378-387 
http://onlinelibrary.wiley.com/doi/10./j.1558-5646.2012.01790.x/abstract

HTH

Abraços,
Rafael Maia
---
http://www.rafaelmaia.net/
PhD Candidate, Integrated Bioscience
University of Akron
"A little learning is a dangerous thing; drink deep, or taste not the Pierian 
spring." (A. Pope)

On Jul 14, 2013, at 6:25 AM, Theodore Garland Jr  
wrote:

> "Maybe what we need here is an approach based on 
> simultaneous equations (aka structural equation models), but I'm not 
> aware whether this exists in a phylogenetic framework."
> 
> Exactly!  And it will need to incorporate "measurement error" in all 
> variables as well as, eventually, uncertainly in the phylogenetic topology 
> and branch lengths.
> 
> Good luck,
> Ted
> 
> From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] 
> on behalf of Emmanuel Paradis [emmanuel.para...@ird.fr]
> Sent: Sunday, July 14, 2013 3:18 AM
> To: Joe Felsenstein
> Cc: r-sig-phylo@r-project.org
> Subject: Re: [R-sig-phylo] PGLS vs lm
> 
> Hi all,
> 
> I would like to react a bit on this issue.
> 
> Probably one problem is that the distinction "correlation vs.
> regression" is not the same for independent data and for phylogenetic data.
> 
> Consider the case of independent observations first. Suppose we are
> interested in the relationship y = b x + a, where x is an environmental
> variable, say latitude. We can get estimates of b and a by moving to 10
> well-chosen locations, sampling 10 observations of y (they are
> independent) and analyse the 100 data points with OLS. Here we cannot
> say anything about the correlation between x and y because we controlled
> the distribution of x. In practice, even if x is not controlled, this
> approach is still valid as long as the observations are independent.
> 
> With phylogenetic data, x is not controlled if it is measured "on the
> species" -- in other words it's an evolving trait (or intrinsic
> variable). x may be controlled if it is measured "outside the species"
> (extrinsic variable) such as latitude. So the case of using regression
> or correlation is not the same than above. Combining intrinsic and
> extinsic variables has generated a lot of debate in the literature.
> 
> I don't think it's a problem of using a method and not another, but
> rather to use a method keeping in mind what it does (and its
> assumptions). Apparently, Hansen and Bartoszek consider a range of
> models including regression models where, by contrast to GLS, the
> evolution of the predictors is modelled explicitly.
> 
> If we want to progress in our knowledge on how evolution works, I think
> we have to not limit ourselves to assess whether there is a
> relationship, but to test more complex models. The case presented by Tom
> is particularly relevant here (at least to me): testing whether group
> size affects brain size or the opposite (or both) is an important
> question. There's been also a lot of debate whether comparative data can
> answer this question. Maybe what we need here is an approach based on
> simultaneous equations (aka structural equation models), but I'm not
> aware whether this exists in a phylogenetic framework. The approach by
> Hansen and Bartoszek could be a step in this direction.
> 
> Best,
> 
> Emmanuel
> 
> Le 13/07/2013 02:59, Joe Felsenstein a écrit :
>> 
>> Tom Schoenemann asked me:
>> 
>>> With respect to your crankiness, is this the paper by Hansen that you are 
>>> referring to?:
>>> 
>>> Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., & Hansen, T. F. 
>>> (2012). A phylogenetic comparative method for studying multivariate 
>>> adaptation. Journal of Theoretical Biology, 314(0), 204-215.
>>> 
&g

Re: [R-sig-phylo] PGLS vs lm

2013-07-14 Thread Theodore Garland Jr
"Maybe what we need here is an approach based on 
simultaneous equations (aka structural equation models), but I'm not 
aware whether this exists in a phylogenetic framework."

Exactly!  And it will need to incorporate "measurement error" in all variables 
as well as, eventually, uncertainly in the phylogenetic topology and branch 
lengths.

Good luck,
Ted

From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on 
behalf of Emmanuel Paradis [emmanuel.para...@ird.fr]
Sent: Sunday, July 14, 2013 3:18 AM
To: Joe Felsenstein
Cc: r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] PGLS vs lm

Hi all,

I would like to react a bit on this issue.

Probably one problem is that the distinction "correlation vs.
regression" is not the same for independent data and for phylogenetic data.

Consider the case of independent observations first. Suppose we are
interested in the relationship y = b x + a, where x is an environmental
variable, say latitude. We can get estimates of b and a by moving to 10
well-chosen locations, sampling 10 observations of y (they are
independent) and analyse the 100 data points with OLS. Here we cannot
say anything about the correlation between x and y because we controlled
the distribution of x. In practice, even if x is not controlled, this
approach is still valid as long as the observations are independent.

With phylogenetic data, x is not controlled if it is measured "on the
species" -- in other words it's an evolving trait (or intrinsic
variable). x may be controlled if it is measured "outside the species"
(extrinsic variable) such as latitude. So the case of using regression
or correlation is not the same than above. Combining intrinsic and
extinsic variables has generated a lot of debate in the literature.

I don't think it's a problem of using a method and not another, but
rather to use a method keeping in mind what it does (and its
assumptions). Apparently, Hansen and Bartoszek consider a range of
models including regression models where, by contrast to GLS, the
evolution of the predictors is modelled explicitly.

If we want to progress in our knowledge on how evolution works, I think
we have to not limit ourselves to assess whether there is a
relationship, but to test more complex models. The case presented by Tom
is particularly relevant here (at least to me): testing whether group
size affects brain size or the opposite (or both) is an important
question. There's been also a lot of debate whether comparative data can
answer this question. Maybe what we need here is an approach based on
simultaneous equations (aka structural equation models), but I'm not
aware whether this exists in a phylogenetic framework. The approach by
Hansen and Bartoszek could be a step in this direction.

Best,

Emmanuel

Le 13/07/2013 02:59, Joe Felsenstein a écrit :
>
> Tom Schoenemann asked me:
>
>> With respect to your crankiness, is this the paper by Hansen that you are 
>> referring to?:
>>
>> Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., & Hansen, T. F. 
>> (2012). A phylogenetic comparative method for studying multivariate 
>> adaptation. Journal of Theoretical Biology, 314(0), 204-215.
>>
>> I wrote Bartoszek to see if I could get his R code to try the method 
>> mentioned in there. If I can figure out how to apply it to my data, that 
>> will be great. I agree that it is clearly a mistake to assume one variable 
>> is responding evolutionarily only to the current value of the other 
>> (predictor variables).
>
> I'm glad to hear that *somebody* here thinks it is a mistake (because it 
> really is).  I keep mentioning it here, and Hansen has published extensively 
> on it, but everyone keeps saying "Well, my friend used it, and he got tenure, 
> so it must be OK".
>
> The paper I saw was this one:
>
> Hansen, Thomas F & Bartoszek, Krzysztof (2012). Interpreting the evolutionary 
> regression: The interplay between observational and biological errors in 
> phylogenetic comparative studies. Systematic Biology  61 (3): 413-425.  ISSN 
> 1063-5157.
>
> J.F.
> 
> Joe Felsenstein j...@gs.washington.edu
>   Department of Genome Sciences and Department of Biology,
>   University of Washington, Box 355065, Seattle, WA 98195-5065 USA
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
>

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS vs lm

2013-07-14 Thread Emmanuel Paradis

Hi all,

I would like to react a bit on this issue.

Probably one problem is that the distinction "correlation vs. 
regression" is not the same for independent data and for phylogenetic data.


Consider the case of independent observations first. Suppose we are 
interested in the relationship y = b x + a, where x is an environmental 
variable, say latitude. We can get estimates of b and a by moving to 10 
well-chosen locations, sampling 10 observations of y (they are 
independent) and analyse the 100 data points with OLS. Here we cannot 
say anything about the correlation between x and y because we controlled 
the distribution of x. In practice, even if x is not controlled, this 
approach is still valid as long as the observations are independent.


With phylogenetic data, x is not controlled if it is measured "on the 
species" -- in other words it's an evolving trait (or intrinsic 
variable). x may be controlled if it is measured "outside the species" 
(extrinsic variable) such as latitude. So the case of using regression 
or correlation is not the same than above. Combining intrinsic and 
extinsic variables has generated a lot of debate in the literature.


I don't think it's a problem of using a method and not another, but 
rather to use a method keeping in mind what it does (and its 
assumptions). Apparently, Hansen and Bartoszek consider a range of 
models including regression models where, by contrast to GLS, the 
evolution of the predictors is modelled explicitly.


If we want to progress in our knowledge on how evolution works, I think 
we have to not limit ourselves to assess whether there is a 
relationship, but to test more complex models. The case presented by Tom 
is particularly relevant here (at least to me): testing whether group 
size affects brain size or the opposite (or both) is an important 
question. There's been also a lot of debate whether comparative data can 
answer this question. Maybe what we need here is an approach based on 
simultaneous equations (aka structural equation models), but I'm not 
aware whether this exists in a phylogenetic framework. The approach by 
Hansen and Bartoszek could be a step in this direction.


Best,

Emmanuel

Le 13/07/2013 02:59, Joe Felsenstein a écrit :


Tom Schoenemann asked me:


With respect to your crankiness, is this the paper by Hansen that you are 
referring to?:

Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., & Hansen, T. F. (2012). 
A phylogenetic comparative method for studying multivariate adaptation. Journal of 
Theoretical Biology, 314(0), 204-215.

I wrote Bartoszek to see if I could get his R code to try the method mentioned 
in there. If I can figure out how to apply it to my data, that will be great. I 
agree that it is clearly a mistake to assume one variable is responding 
evolutionarily only to the current value of the other (predictor variables).


I'm glad to hear that *somebody* here thinks it is a mistake (because it really is).  I 
keep mentioning it here, and Hansen has published extensively on it, but everyone keeps 
saying "Well, my friend used it, and he got tenure, so it must be OK".

The paper I saw was this one:

Hansen, Thomas F & Bartoszek, Krzysztof (2012). Interpreting the evolutionary 
regression: The interplay between observational and biological errors in 
phylogenetic comparative studies. Systematic Biology  61 (3): 413-425.  ISSN 
1063-5157.

J.F.

Joe Felsenstein j...@gs.washington.edu
  Department of Genome Sciences and Department of Biology,
  University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS vs lm

2013-07-12 Thread Eliot Miller
link to code here: http://www.math.chalmers.se/~krzbar/GLSME/GLSME.R


On Fri, Jul 12, 2013 at 1:59 PM, Joe Felsenstein wrote:

>
> Tom Schoenemann asked me:
>
> > With respect to your crankiness, is this the paper by Hansen that you
> are referring to?:
> >
> > Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., & Hansen, T. F.
> (2012). A phylogenetic comparative method for studying multivariate
> adaptation. Journal of Theoretical Biology, 314(0), 204-215.
> >
> > I wrote Bartoszek to see if I could get his R code to try the method
> mentioned in there. If I can figure out how to apply it to my data, that
> will be great. I agree that it is clearly a mistake to assume one variable
> is responding evolutionarily only to the current value of the other
> (predictor variables).
>
> I'm glad to hear that *somebody* here thinks it is a mistake (because it
> really is).  I keep mentioning it here, and Hansen has published
> extensively on it, but everyone keeps saying "Well, my friend used it, and
> he got tenure, so it must be OK".
>
> The paper I saw was this one:
>
> Hansen, Thomas F & Bartoszek, Krzysztof (2012). Interpreting the
> evolutionary regression: The interplay between observational and biological
> errors in phylogenetic comparative studies. Systematic Biology  61 (3):
> 413-425.  ISSN 1063-5157.
>
> J.F.
> 
> Joe Felsenstein j...@gs.washington.edu
>  Department of Genome Sciences and Department of Biology,
>  University of Washington, Box 355065, Seattle, WA 98195-5065 USA
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>
>

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS vs lm

2013-07-12 Thread Joe Felsenstein

Tom Schoenemann asked me:

> With respect to your crankiness, is this the paper by Hansen that you are 
> referring to?:
> 
> Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., & Hansen, T. F. 
> (2012). A phylogenetic comparative method for studying multivariate 
> adaptation. Journal of Theoretical Biology, 314(0), 204-215.
> 
> I wrote Bartoszek to see if I could get his R code to try the method 
> mentioned in there. If I can figure out how to apply it to my data, that will 
> be great. I agree that it is clearly a mistake to assume one variable is 
> responding evolutionarily only to the current value of the other (predictor 
> variables). 

I'm glad to hear that *somebody* here thinks it is a mistake (because it really 
is).  I keep mentioning it here, and Hansen has published extensively on it, 
but everyone keeps saying "Well, my friend used it, and he got tenure, so it 
must be OK". 

The paper I saw was this one:

Hansen, Thomas F & Bartoszek, Krzysztof (2012). Interpreting the evolutionary 
regression: The interplay between observational and biological errors in 
phylogenetic comparative studies. Systematic Biology  61 (3): 413-425.  ISSN 
1063-5157.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS vs lm

2013-07-12 Thread Tom Schoenemann
Thanks Liam,

OK, I'm starting to understand this better. But I'm not sure what now to do. 
Given that the mathematics are such that a PGLS gives significance in one 
direction, but not in another, what is the most convincing way to show that the 
two variables really ARE associated (at some level of probability) independent 
of phylogeny?

Ultimately I want to investigate the following: Given 2 (or more) behavioral 
measures, what is the probability that they are independently associated with 
brain size in my sample, controlling for phylogeny.

I'd also like to create a prediction model that allows me to estimate what the 
behavioral values would be for a given brain size (of course with confidence 
intervals, so I could assess whether the model is really actually useful at all 
for prediction).

Thanks for any suggestions,

-Tom
 
On Jul 11, 2013, at 5:23 PM, Liam J. Revell  wrote:

> Hi Tom.
> 
> This is actually not a property of GLS - but of using different correlation 
> structures when fitting y~x vs. x~y. When you set 
> correlation=corPagel(...,fixed=FALSE) (the default for corPagel), gls will 
> fit Pagel's lambda model to the residual error in y|x. The fitted value of 
> lambda will almost always be different between y|x and x|y. Since the fitted 
> correlation structure of the residual error is used to calculate our standard 
> error for beta, this will affect any hypothesis test about beta.
> 
> By contrast, if we assume a fixed error structure (OLS, as in lm; or 
> correlation=corBrownian(...) - the latter being the same as contrasts 
> regression), we will find that the P values are the same for y~x vs. x~y.
> 
> library(phytools)
> library(nlme)
> tree<-pbtree(n=100)
> x<-fastBM(tree)
> # note I have intentionally simulated y without phylogenetic signal
> y<-setNames(rnorm(n=100),names(x))
> fit.a<-gls(y~x,data.frame(x,y),correlation=corBrownian(1,tree))
> summary(fit.a)
> fit.b<-gls(x~y,data.frame(x,y),correlation=corBrownian(1,tree))
> summary(fit.b)
> # fit.a & fit.b should have the same P-values
> fit.c<-gls(y~x,data.frame(x,y),correlation=corPagel(1,tree))
> summary(fit.c)
> fit.d<-gls(x~y,data.frame(x,y),correlation=corPagel(1,tree))
> summary(fit.d)
> # fit.c & fit.d will most likely have different P-values
> 
> All the best, Liam
> 
> Liam J. Revell, Assistant Professor of Biology
> University of Massachusetts Boston
> web: http://faculty.umb.edu/liam.revell/
> email: liam.rev...@umb.edu
> blog: http://blog.phytools.org
> 
> On 7/11/2013 12:03 AM, Tom Schoenemann wrote:
>> Hi all,
>> 
>> I ran a PGLS with two variables, call them VarA and VarB, using a 
>> phylogenetic tree and corPagel. When I try to predict VarA from VarB, I get 
>> a significant coefficient for VarB.  However, if I invert this and try to 
>> predict VarB from VarA, I do NOT get a significant coefficient for VarA. 
>> Shouldn't these be both significant, or both insignificant (the actual 
>> outputs and calls are pasted below)?
>> 
>> If I do a simple lm for these, I get the same significance level for the 
>> coefficients either way (i.e., lm(VarA ~ VarB) vs. lm(VarB ~ VarA), though 
>> the values of the coefficients of course differ.
>> 
>> Can someone help me understand why the PGLS would not necessarily be 
>> symmetric in this same way?
>> 
>> Thanks,
>> 
>> -Tom
>> 
>>> outTree_group_by_brain_LambdaEst_redo1 <- gls(log_group_size_data ~ 
>>> log_brain_weight_data, correlation = bm.t.100species_lamEst_redo1,data = 
>>> DF.brain.repertoire.group, method= "ML")
>>> summary(outTree_group_by_brain_LambdaEst_redo1)
>> Generalized least squares fit by maximum likelihood
>>   Model: log_group_size_data ~ log_brain_weight_data
>>   Data: DF.brain.repertoire.group
>>AIC BIClogLik
>>   89.45152 99.8722 -40.72576
>> Correlation Structure: corPagel
>>  Formula: ~1
>>  Parameter estimate(s):
>>lambda
>> 0.7522738
>> Coefficients:
>>Value Std.Error   t-value p-value
>> (Intercept)   -0.0077276 0.2628264 -0.029402  0.9766
>> log_brain_weight_data  0.4636859 0.1355499  3.420778  0.0009
>> 
>>  Correlation:
>>   (Intr)
>> log_brain_weight_data -0.637
>> Standardized residuals:
>>Min Q1Med Q3Max
>> -1.7225003 -0.1696079  0.5753531  1.0705308  3.0685637
>> Residual standard error: 0.5250319
>> Degrees of freedom: 100 total; 98 residual
>> 
>> 
>> Here is the inverse:
>> 
>>> outTree_brain_by_group_LambdaEst_redo1 <- gls(log_brain_weight_data ~ 
>>> log_group_size_data, correlation = bm.t.100species_lamEst_redo1,data = 
>>> DF.brain.repertoire.group, method= "ML")
>>> summary(outTree_brain_by_group_LambdaEst_redo1)
>> Generalized least squares fit by maximum likelihood
>>   Model: log_brain_weight_data ~ log_group_size_data
>>   Data: DF.brain.repertoire.group
>> AIC   BIC   logLik
>>   -39.45804 -29.03736 23.72902
>> Correlation Structure: corPagel
>>  Formula: ~1
>>  Parameter estimate(s):
>>   lambd

Re: [R-sig-phylo] PGLS vs lm

2013-07-12 Thread Tom Schoenemann
With respect to your crankiness, is this the paper by Hansen that you are 
referring to?:

Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., & Hansen, T. F. (2012). 
A phylogenetic comparative method for studying multivariate adaptation. Journal 
of Theoretical Biology, 314(0), 204-215.

I wrote Bartoszek to see if I could get his R code to try the method mentioned 
in there. If I can figure out how to apply it to my data, that will be great. I 
agree that it is clearly a mistake to assume one variable is responding 
evolutionarily only to the current value of the other (predictor variables). 

Regarding your comments:

> If the "regressions" are being done in a model which implies 
> that the two variables are multivariate normal, then we can 
> simply estimate the parameters of that joint distribution, 
> which are of course the two means and the three elements of the 
> covariance matrix.
> 
> If we then test whether  Cov(X,Y) is different from zero, that 
> should be equivalent to a test of significance of either 
> regression.

I'm not clear on what you are suggesting I do here. Isn't PGLS essentially 
testing Cov(X,Y) taking the phylogeny into account?  And are you saying there 
is a way to show that my variables are significantly associated with each other 
even though PGLS shows different things depending on which way I run the 
associations?  

-Tom

On Jul 11, 2013, at 5:46 PM, Joe Felsenstein  wrote:

> 
> If the "regressions" are being done in a model which implies 
> that the two variables are multivariate normal, then we can 
> simply estimate the parameters of that joint distribution, 
> which are of course the two means and the three elements of the 
> covariance matrix.
> 
> If we then test whether  Cov(X,Y) is different from zero, that 
> should be equivalent to a test of significance of either 
> regression.
> 
> /* crankiness on */
> Note of course that most "phylogenetic" regressions are being 
> done wrong: if they assume that Y responds to the current value 
> of X, but when the value of Y may actually be the result of 
> optimum selection which is affected by past values of X which 
> we do not observe directly.
> 
> I've complained about this here in the past, to no avail,  
> Thomas Hansen, in a recent paper, made the same point, with 
> evidence too.
> /* crankiness off */
> 
> J.F.
> 
> Joe Felsenstein j...@gs.washington.edu
> Department of Genome Sciences and Department of Biology,
> University of Washington, Box 355065, Seattle, WA 98195-5065 USA

_
P. Thomas Schoenemann

Associate Professor
Department of Anthropology
Cognitive Science Program
Indiana University
Bloomington, IN  47405
Phone: 812-855-8800
E-mail: t...@indiana.edu

Open Research Scan Archive (ORSA) Co-Director
Consulting Scholar
Museum of Archaeology and Anthropology
University of Pennsylvania

http://www.indiana.edu/~brainevo











[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS vs lm

2013-07-12 Thread Tom Schoenemann
OK, I started going through the Ives et al. paper - thanks for that.  Note that 
my data is not brain size vs. body size, but brain size vs. social group size 
(not a measure for which brain size is a subset).

For our particular dataset, I believe we were not able to find much in the way 
of within-species variation for one of the variables - typically one report per 
species, and usually no variation given (but I'm not sure on that - I'll have 
to check). 

Regarding what exactly we want to do:

1) is there a significant association between brain size and two other 
behavioral dimensions (reported in the literature), after taking into account 
(as best we can) phylogeny.  This is why I was trying PGLS. We probably also 
want to look at the relationship within clades (is there a phylogenetically 
appropriate version of ANCOVA?).

2) are these two other behavioral measures independently associated with brain 
size (after controlling for the other) - I'm assuming this would be a 
phylogenetically appropriate version of multiple regression

But my issue is that, if I use PGLS, I get significant coefficients if I do it 
one direction, and not in the other. This makes me skeptical that there is a 
significant association in the first place.

-Tom


On Jul 11, 2013, at 4:32 PM, Theodore Garland Jr  
wrote:

> I think the issue is largely one of conceptualizing the problem.
> People often view body size as an "independent variable" when analyzing brain 
> size, but obviously this is a serious oversimplificaiton -- usually done for 
> statistical convenience -- that does not reflect the biology (yes, I have 
> also done this!).  Moreover, brain mass is part of body mass, so if you use 
> body mass per se as an independent variable then you have potential 
> part-whole correlation statistical issues.
> 
> I would think carefully about what you are really wanting to do (e.g., 
> regression vs. correlation vs. ANCOVA), and check over this paper:
> 
> Ives, A. R., P. E. Midford, and T. Garland, Jr. 2007. Within-species 
> variation and measurement error in phylogenetic comparative methods. 
> Systematic Biology 56:252-270.
> 
> 
> And maybe this one:
> 
> Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. 
> Phylogenetic analysis of covariance by computer simulation. Systematic 
> Biology 42:265-292.
> 
> 
> Cheers,
> Ted
> 
> Theodore Garland, Jr., Professor
> Department of Biology
> University of California, Riverside
> Riverside, CA 92521
> Office Phone:  (951) 827-3524
> Wet Lab Phone:  (951) 827-5724
> Dry Lab Phone:  (951) 827-4026
> Home Phone:  (951) 328-0820
> Skype:  theodoregarland
> Facsimile:  (951) 827-4286 = Dept. office (not confidential)
> Email:  tgarl...@ucr.edu
> http://www.biology.ucr.edu/people/faculty/Garland.html
> http://scholar.google.com/citations?hl=en&user=iSSbrhwJ
> 
> Inquiry-based Middle School Lesson Plan:
> "Born to Run: Artificial Selection Lab"
> http://www.indiana.edu/~ensiweb/lessons/BornToRun.html
> 
> From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] 
> on behalf of Tom Schoenemann [t...@indiana.edu]
> Sent: Thursday, July 11, 2013 11:19 AM
> To: Emmanuel Paradis
> Cc: r-sig-phylo@r-project.org
> Subject: Re: [R-sig-phylo] PGLS vs lm
> 
> Thanks Emmanuel,
> 
> OK, so this makes sense in terms of the math involved. However, from a 
> practical, interpretive perspective, shouldn't I assume this to mean that we 
> actually cannot say (from this data) whether VarA and VarB ARE actually 
> associated with each other? In the real world, if VarA is causally related to 
> VarB, then by definition they will be associated. Doesn't this type of 
> situation - where the associations are judged to be statistically significant 
> in one direction but not in the other - suggest that we actually DON'T have 
> confidence that - independent of phylogeny - VarA is associated with VarB?  
> Putting this in the context of the actual variables involved, doesn't this 
> mean that we actually can't be sure brain size is associated with social 
> group size (in this dataset) independent of phylogeny?
> 
> I notice that the maximum likelihood lambda estimates are different (though 
> I'm not sure they are significantly so). I understand this could 
> mathematically be so, but I'm concerned with how to interpret this. In the 
> real world, how could phylogenetic relatedness affect group size predicting 
> brain size, more than brain size predicting group size? Isn't this a logical 
> problem (for interpretation - not for the math)? In other words, in 
> evolutionary history, shouldn't phylogeny affect the relationship between two 
> variables in only one w

Re: [R-sig-phylo] PGLS vs lm

2013-07-11 Thread Liam J. Revell
Thanks Joe. That's a very clear way of explaining why models that assume 
a fixed & common correlation structure (OLS in lm, contrasts regression, 
or gls::corBrownian) are 'symmetric' (i.e., the same P-value is obtained 
by fitting y~x vs. x~y); whereas models that do not (e.g., corPagel) are 
not. All the best, Liam


Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 7/11/2013 5:46 PM, Joe Felsenstein wrote:


If the "regressions" are being done in a model which implies
that the two variables are multivariate normal, then we can
simply estimate the parameters of that joint distribution,
which are of course the two means and the three elements of the
covariance matrix.

If we then test whether  Cov(X,Y) is different from zero, that
should be equivalent to a test of significance of either
regression.

/* crankiness on */
Note of course that most "phylogenetic" regressions are being
done wrong: if they assume that Y responds to the current value
of X, but when the value of Y may actually be the result of
optimum selection which is affected by past values of X which
we do not observe directly.

I've complained about this here in the past, to no avail,
Thomas Hansen, in a recent paper, made the same point, with
evidence too.
/* crankiness off */

J.F.

Joe Felsenstein j...@gs.washington.edu
  Department of Genome Sciences and Department of Biology,
  University of Washington, Box 355065, Seattle, WA 98195-5065 USA



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS vs lm

2013-07-11 Thread Joe Felsenstein

If the "regressions" are being done in a model which implies 
that the two variables are multivariate normal, then we can 
simply estimate the parameters of that joint distribution, 
which are of course the two means and the three elements of the 
covariance matrix.

If we then test whether  Cov(X,Y) is different from zero, that 
should be equivalent to a test of significance of either 
regression.

/* crankiness on */
Note of course that most "phylogenetic" regressions are being 
done wrong: if they assume that Y responds to the current value 
of X, but when the value of Y may actually be the result of 
optimum selection which is affected by past values of X which 
we do not observe directly.

I've complained about this here in the past, to no avail,  
Thomas Hansen, in a recent paper, made the same point, with 
evidence too.
/* crankiness off */

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS vs lm

2013-07-11 Thread Liam J. Revell

Hi Tom.

This is actually not a property of GLS - but of using different 
correlation structures when fitting y~x vs. x~y. When you set 
correlation=corPagel(...,fixed=FALSE) (the default for corPagel), gls 
will fit Pagel's lambda model to the residual error in y|x. The fitted 
value of lambda will almost always be different between y|x and x|y. 
Since the fitted correlation structure of the residual error is used to 
calculate our standard error for beta, this will affect any hypothesis 
test about beta.


By contrast, if we assume a fixed error structure (OLS, as in lm; or 
correlation=corBrownian(...) - the latter being the same as contrasts 
regression), we will find that the P values are the same for y~x vs. x~y.


library(phytools)
library(nlme)
tree<-pbtree(n=100)
x<-fastBM(tree)
# note I have intentionally simulated y without phylogenetic signal
y<-setNames(rnorm(n=100),names(x))
fit.a<-gls(y~x,data.frame(x,y),correlation=corBrownian(1,tree))
summary(fit.a)
fit.b<-gls(x~y,data.frame(x,y),correlation=corBrownian(1,tree))
summary(fit.b)
# fit.a & fit.b should have the same P-values
fit.c<-gls(y~x,data.frame(x,y),correlation=corPagel(1,tree))
summary(fit.c)
fit.d<-gls(x~y,data.frame(x,y),correlation=corPagel(1,tree))
summary(fit.d)
# fit.c & fit.d will most likely have different P-values

All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 7/11/2013 12:03 AM, Tom Schoenemann wrote:

Hi all,

I ran a PGLS with two variables, call them VarA and VarB, using a phylogenetic 
tree and corPagel. When I try to predict VarA from VarB, I get a significant 
coefficient for VarB.  However, if I invert this and try to predict VarB from 
VarA, I do NOT get a significant coefficient for VarA. Shouldn't these be both 
significant, or both insignificant (the actual outputs and calls are pasted 
below)?

If I do a simple lm for these, I get the same significance level for the 
coefficients either way (i.e., lm(VarA ~ VarB) vs. lm(VarB ~ VarA), though the 
values of the coefficients of course differ.

Can someone help me understand why the PGLS would not necessarily be symmetric 
in this same way?

Thanks,

-Tom


outTree_group_by_brain_LambdaEst_redo1 <- gls(log_group_size_data ~ 
log_brain_weight_data, correlation = bm.t.100species_lamEst_redo1,data = 
DF.brain.repertoire.group, method= "ML")
summary(outTree_group_by_brain_LambdaEst_redo1)

Generalized least squares fit by maximum likelihood
   Model: log_group_size_data ~ log_brain_weight_data
   Data: DF.brain.repertoire.group
AIC BIClogLik
   89.45152 99.8722 -40.72576
Correlation Structure: corPagel
  Formula: ~1
  Parameter estimate(s):
lambda
0.7522738
Coefficients:
Value Std.Error   t-value p-value
(Intercept)   -0.0077276 0.2628264 -0.029402  0.9766
log_brain_weight_data  0.4636859 0.1355499  3.420778  0.0009

  Correlation:
   (Intr)
log_brain_weight_data -0.637
Standardized residuals:
Min Q1Med Q3Max
-1.7225003 -0.1696079  0.5753531  1.0705308  3.0685637
Residual standard error: 0.5250319
Degrees of freedom: 100 total; 98 residual


Here is the inverse:


outTree_brain_by_group_LambdaEst_redo1 <- gls(log_brain_weight_data ~ 
log_group_size_data, correlation = bm.t.100species_lamEst_redo1,data = 
DF.brain.repertoire.group, method= "ML")
summary(outTree_brain_by_group_LambdaEst_redo1)

Generalized least squares fit by maximum likelihood
   Model: log_brain_weight_data ~ log_group_size_data
   Data: DF.brain.repertoire.group
 AIC   BIC   logLik
   -39.45804 -29.03736 23.72902
Correlation Structure: corPagel
  Formula: ~1
  Parameter estimate(s):
   lambda
1.010277
Coefficients:
  Value  Std.Error   t-value p-value
(Intercept)  1.2244133 0.20948634  5.844836  0.
log_group_size_data -0.0234525 0.03723828 -0.629796  0.5303
  Correlation:
 (Intr)
log_group_size_data -0.095
Standardized residuals:
Min Q1Med Q3Max
-2.0682836 -0.3859688  1.1515176  1.5908565  3.1163377
Residual standard error: 0.4830596
Degrees of freedom: 100 total; 98 residual

_
P. Thomas Schoenemann

Associate Professor
Department of Anthropology
Cognitive Science Program
Indiana University
Bloomington, IN  47405
Phone: 812-855-8800
E-mail: t...@indiana.edu

Open Research Scan Archive (ORSA) Co-Director
Consulting Scholar
Museum of Archaeology and Anthropology
University of Pennsylvania

http://www.indiana.edu/~brainevo











[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig

Re: [R-sig-phylo] PGLS vs lm

2013-07-11 Thread Theodore Garland Jr
I think the issue is largely one of conceptualizing the problem.
People often view body size as an "independent variable" when analyzing brain 
size, but obviously this is a serious oversimplificaiton -- usually done for 
statistical convenience -- that does not reflect the biology (yes, I have also 
done this!).  Moreover, brain mass is part of body mass, so if you use body 
mass per se as an independent variable then you have potential part-whole 
correlation statistical issues.

I would think carefully about what you are really wanting to do (e.g., 
regression vs. correlation vs. ANCOVA), and check over this paper:

Ives, A. R., P. E. Midford, and T. Garland, Jr. 2007. Within-species variation 
and measurement error in phylogenetic comparative methods. Systematic Biology 
56:252-270.

And maybe this one:

Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. 
Phylogenetic analysis of covariance by computer simulation. Systematic Biology 
42:265-292.

Cheers,
Ted

Theodore Garland, Jr., Professor
Department of Biology
University of California, Riverside
Riverside, CA 92521
Office Phone:  (951) 827-3524
Wet Lab Phone:  (951) 827-5724
Dry Lab Phone:  (951) 827-4026
Home Phone:  (951) 328-0820
Skype:  theodoregarland
Facsimile:  (951) 827-4286 = Dept. office (not confidential)
Email:  tgarl...@ucr.edu
http://www.biology.ucr.edu/people/faculty/Garland.html
http://scholar.google.com/citations?hl=en&user=iSSbrhwJ

Inquiry-based Middle School Lesson Plan:
"Born to Run: Artificial Selection Lab"
http://www.indiana.edu/~ensiweb/lessons/BornToRun.html


From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on 
behalf of Tom Schoenemann [t...@indiana.edu]
Sent: Thursday, July 11, 2013 11:19 AM
To: Emmanuel Paradis
Cc: r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] PGLS vs lm

Thanks Emmanuel,

OK, so this makes sense in terms of the math involved. However, from a 
practical, interpretive perspective, shouldn't I assume this to mean that we 
actually cannot say (from this data) whether VarA and VarB ARE actually 
associated with each other? In the real world, if VarA is causally related to 
VarB, then by definition they will be associated. Doesn't this type of 
situation - where the associations are judged to be statistically significant 
in one direction but not in the other - suggest that we actually DON'T have 
confidence that - independent of phylogeny - VarA is associated with VarB?  
Putting this in the context of the actual variables involved, doesn't this mean 
that we actually can't be sure brain size is associated with social group size 
(in this dataset) independent of phylogeny?

I notice that the maximum likelihood lambda estimates are different (though I'm 
not sure they are significantly so). I understand this could mathematically be 
so, but I'm concerned with how to interpret this. In the real world, how could 
phylogenetic relatedness affect group size predicting brain size, more than 
brain size predicting group size? Isn't this a logical problem (for 
interpretation - not for the math)? In other words, in evolutionary history, 
shouldn't phylogeny affect the relationship between two variables in only one 
way, which would show up whichever way we approached the association? Again, I 
understand the math may allow it, I just don't understand how it could actually 
be true over evolutionary time.

Thanks in advance for helping me understand this better,

-Tom


On Jul 11, 2013, at 5:12 AM, Emmanuel Paradis  wrote:

> Hi Tom,
>
> In an OLS regression, the residuals from both regressions (varA ~ varB and 
> varB ~ varA) are different but their distributions are (more or less) 
> symmetric. So, because the residuals are independent (ie, their covariance is 
> null), the residual standard error will be the same (or very close in 
> practice).
>
> In GLS, the residuals are not independent, so this difference in the 
> distribution of the residuals affects the estimation of the residual standard 
> errors (because we need to estimate the covaraince of the residuals), and 
> consequently the associated tests.
>
> Best,
> Emmanuel
>
> Le 11/07/2013 11:03, Tom Schoenemann a �crit :
>> Hi all,
>>
>> I ran a PGLS with two variables, call them VarA and VarB, using a 
>> phylogenetic tree and corPagel. When I try to predict VarA from VarB, I get 
>> a significant coefficient for VarB.  However, if I invert this and try to 
>> predict VarB from VarA, I do NOT get a significant coefficient for VarA. 
>> Shouldn't these be both significant, or both insignificant (the actual 
>> outputs and calls are pasted below)?
>>
>> If I do a simple lm for these, I get the same significance level for the 
>> coefficients either way (i.e., lm(Var

Re: [R-sig-phylo] PGLS vs lm

2013-07-11 Thread Tom Schoenemann
Thanks Emmanuel,

OK, so this makes sense in terms of the math involved. However, from a 
practical, interpretive perspective, shouldn't I assume this to mean that we 
actually cannot say (from this data) whether VarA and VarB ARE actually 
associated with each other? In the real world, if VarA is causally related to 
VarB, then by definition they will be associated. Doesn't this type of 
situation - where the associations are judged to be statistically significant 
in one direction but not in the other - suggest that we actually DON'T have 
confidence that - independent of phylogeny - VarA is associated with VarB?  
Putting this in the context of the actual variables involved, doesn't this mean 
that we actually can't be sure brain size is associated with social group size 
(in this dataset) independent of phylogeny?

I notice that the maximum likelihood lambda estimates are different (though I'm 
not sure they are significantly so). I understand this could mathematically be 
so, but I'm concerned with how to interpret this. In the real world, how could 
phylogenetic relatedness affect group size predicting brain size, more than 
brain size predicting group size? Isn't this a logical problem (for 
interpretation - not for the math)? In other words, in evolutionary history, 
shouldn't phylogeny affect the relationship between two variables in only one 
way, which would show up whichever way we approached the association? Again, I 
understand the math may allow it, I just don't understand how it could actually 
be true over evolutionary time.

Thanks in advance for helping me understand this better,

-Tom


On Jul 11, 2013, at 5:12 AM, Emmanuel Paradis  wrote:

> Hi Tom,
> 
> In an OLS regression, the residuals from both regressions (varA ~ varB and 
> varB ~ varA) are different but their distributions are (more or less) 
> symmetric. So, because the residuals are independent (ie, their covariance is 
> null), the residual standard error will be the same (or very close in 
> practice).
> 
> In GLS, the residuals are not independent, so this difference in the 
> distribution of the residuals affects the estimation of the residual standard 
> errors (because we need to estimate the covaraince of the residuals), and 
> consequently the associated tests.
> 
> Best,
> 
> Emmanuel
> 
> Le 11/07/2013 11:03, Tom Schoenemann a écrit :
>> Hi all,
>> 
>> I ran a PGLS with two variables, call them VarA and VarB, using a 
>> phylogenetic tree and corPagel. When I try to predict VarA from VarB, I get 
>> a significant coefficient for VarB.  However, if I invert this and try to 
>> predict VarB from VarA, I do NOT get a significant coefficient for VarA. 
>> Shouldn't these be both significant, or both insignificant (the actual 
>> outputs and calls are pasted below)?
>> 
>> If I do a simple lm for these, I get the same significance level for the 
>> coefficients either way (i.e., lm(VarA ~ VarB) vs. lm(VarB ~ VarA), though 
>> the values of the coefficients of course differ.
>> 
>> Can someone help me understand why the PGLS would not necessarily be 
>> symmetric in this same way?
>> 
>> Thanks,
>> 
>> -Tom
>> 
>>> outTree_group_by_brain_LambdaEst_redo1 <- gls(log_group_size_data ~ 
>>> log_brain_weight_data, correlation = bm.t.100species_lamEst_redo1,data = 
>>> DF.brain.repertoire.group, method= "ML")
>>> summary(outTree_group_by_brain_LambdaEst_redo1)
>> Generalized least squares fit by maximum likelihood
>>   Model: log_group_size_data ~ log_brain_weight_data
>>   Data: DF.brain.repertoire.group
>>AIC BIClogLik
>>   89.45152 99.8722 -40.72576
>> Correlation Structure: corPagel
>>  Formula: ~1
>>  Parameter estimate(s):
>>lambda
>> 0.7522738
>> Coefficients:
>>Value Std.Error   t-value p-value
>> (Intercept)   -0.0077276 0.2628264 -0.029402  0.9766
>> log_brain_weight_data  0.4636859 0.1355499  3.420778  0.0009
>> 
>>  Correlation:
>>   (Intr)
>> log_brain_weight_data -0.637
>> Standardized residuals:
>>Min Q1Med Q3Max
>> -1.7225003 -0.1696079  0.5753531  1.0705308  3.0685637
>> Residual standard error: 0.5250319
>> Degrees of freedom: 100 total; 98 residual
>> 
>> 
>> Here is the inverse:
>> 
>>> outTree_brain_by_group_LambdaEst_redo1 <- gls(log_brain_weight_data ~ 
>>> log_group_size_data, correlation = bm.t.100species_lamEst_redo1,data = 
>>> DF.brain.repertoire.group, method= "ML")
>>> summary(outTree_brain_by_group_LambdaEst_redo1)
>> Generalized least squares fit by maximum likelihood
>>   Model: log_brain_weight_data ~ log_group_size_data
>>   Data: DF.brain.repertoire.group
>> AIC   BIC   logLik
>>   -39.45804 -29.03736 23.72902
>> Correlation Structure: corPagel
>>  Formula: ~1
>>  Parameter estimate(s):
>>   lambda
>> 1.010277
>> Coefficients:
>>  Value  Std.Error   t-value p-value
>> (Intercept)  1.2244133 0.20948634  5.844836  0.
>> log_group_siz

Re: [R-sig-phylo] PGLS vs lm

2013-07-11 Thread Emmanuel Paradis

Hi Tom,

In an OLS regression, the residuals from both regressions (varA ~ varB 
and varB ~ varA) are different but their distributions are (more or 
less) symmetric. So, because the residuals are independent (ie, their 
covariance is null), the residual standard error will be the same (or 
very close in practice).


In GLS, the residuals are not independent, so this difference in the 
distribution of the residuals affects the estimation of the residual 
standard errors (because we need to estimate the covaraince of the 
residuals), and consequently the associated tests.


Best,

Emmanuel

Le 11/07/2013 11:03, Tom Schoenemann a écrit :

Hi all,

I ran a PGLS with two variables, call them VarA and VarB, using a phylogenetic 
tree and corPagel. When I try to predict VarA from VarB, I get a significant 
coefficient for VarB.  However, if I invert this and try to predict VarB from 
VarA, I do NOT get a significant coefficient for VarA. Shouldn't these be both 
significant, or both insignificant (the actual outputs and calls are pasted 
below)?

If I do a simple lm for these, I get the same significance level for the 
coefficients either way (i.e., lm(VarA ~ VarB) vs. lm(VarB ~ VarA), though the 
values of the coefficients of course differ.

Can someone help me understand why the PGLS would not necessarily be symmetric 
in this same way?

Thanks,

-Tom


outTree_group_by_brain_LambdaEst_redo1 <- gls(log_group_size_data ~ 
log_brain_weight_data, correlation = bm.t.100species_lamEst_redo1,data = 
DF.brain.repertoire.group, method= "ML")
summary(outTree_group_by_brain_LambdaEst_redo1)

Generalized least squares fit by maximum likelihood
   Model: log_group_size_data ~ log_brain_weight_data
   Data: DF.brain.repertoire.group
AIC BIClogLik
   89.45152 99.8722 -40.72576
Correlation Structure: corPagel
  Formula: ~1
  Parameter estimate(s):
lambda
0.7522738
Coefficients:
Value Std.Error   t-value p-value
(Intercept)   -0.0077276 0.2628264 -0.029402  0.9766
log_brain_weight_data  0.4636859 0.1355499  3.420778  0.0009

  Correlation:
   (Intr)
log_brain_weight_data -0.637
Standardized residuals:
Min Q1Med Q3Max
-1.7225003 -0.1696079  0.5753531  1.0705308  3.0685637
Residual standard error: 0.5250319
Degrees of freedom: 100 total; 98 residual


Here is the inverse:


outTree_brain_by_group_LambdaEst_redo1 <- gls(log_brain_weight_data ~ 
log_group_size_data, correlation = bm.t.100species_lamEst_redo1,data = 
DF.brain.repertoire.group, method= "ML")
summary(outTree_brain_by_group_LambdaEst_redo1)

Generalized least squares fit by maximum likelihood
   Model: log_brain_weight_data ~ log_group_size_data
   Data: DF.brain.repertoire.group
 AIC   BIC   logLik
   -39.45804 -29.03736 23.72902
Correlation Structure: corPagel
  Formula: ~1
  Parameter estimate(s):
   lambda
1.010277
Coefficients:
  Value  Std.Error   t-value p-value
(Intercept)  1.2244133 0.20948634  5.844836  0.
log_group_size_data -0.0234525 0.03723828 -0.629796  0.5303
  Correlation:
 (Intr)
log_group_size_data -0.095
Standardized residuals:
Min Q1Med Q3Max
-2.0682836 -0.3859688  1.1515176  1.5908565  3.1163377
Residual standard error: 0.4830596
Degrees of freedom: 100 total; 98 residual

_
P. Thomas Schoenemann

Associate Professor
Department of Anthropology
Cognitive Science Program
Indiana University
Bloomington, IN  47405
Phone: 812-855-8800
E-mail: t...@indiana.edu

Open Research Scan Archive (ORSA) Co-Director
Consulting Scholar
Museum of Archaeology and Anthropology
University of Pennsylvania

http://www.indiana.edu/~brainevo











[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] PGLS vs lm

2013-07-10 Thread Tom Schoenemann
Hi all,

I ran a PGLS with two variables, call them VarA and VarB, using a phylogenetic 
tree and corPagel. When I try to predict VarA from VarB, I get a significant 
coefficient for VarB.  However, if I invert this and try to predict VarB from 
VarA, I do NOT get a significant coefficient for VarA. Shouldn't these be both 
significant, or both insignificant (the actual outputs and calls are pasted 
below)?

If I do a simple lm for these, I get the same significance level for the 
coefficients either way (i.e., lm(VarA ~ VarB) vs. lm(VarB ~ VarA), though the 
values of the coefficients of course differ. 

Can someone help me understand why the PGLS would not necessarily be symmetric 
in this same way?

Thanks,

-Tom

> outTree_group_by_brain_LambdaEst_redo1 <- gls(log_group_size_data ~ 
> log_brain_weight_data, correlation = bm.t.100species_lamEst_redo1,data = 
> DF.brain.repertoire.group, method= "ML")
> summary(outTree_group_by_brain_LambdaEst_redo1)
Generalized least squares fit by maximum likelihood
  Model: log_group_size_data ~ log_brain_weight_data 
  Data: DF.brain.repertoire.group 
   AIC BIClogLik
  89.45152 99.8722 -40.72576
Correlation Structure: corPagel
 Formula: ~1 
 Parameter estimate(s):
   lambda 
0.7522738 
Coefficients:
   Value Std.Error   t-value p-value
(Intercept)   -0.0077276 0.2628264 -0.029402  0.9766
log_brain_weight_data  0.4636859 0.1355499  3.420778  0.0009

 Correlation: 
  (Intr)
log_brain_weight_data -0.637
Standardized residuals:
   Min Q1Med Q3Max 
-1.7225003 -0.1696079  0.5753531  1.0705308  3.0685637 
Residual standard error: 0.5250319 
Degrees of freedom: 100 total; 98 residual


Here is the inverse:

> outTree_brain_by_group_LambdaEst_redo1 <- gls(log_brain_weight_data ~ 
> log_group_size_data, correlation = bm.t.100species_lamEst_redo1,data = 
> DF.brain.repertoire.group, method= "ML")
> summary(outTree_brain_by_group_LambdaEst_redo1)
Generalized least squares fit by maximum likelihood
  Model: log_brain_weight_data ~ log_group_size_data 
  Data: DF.brain.repertoire.group 
AIC   BIC   logLik
  -39.45804 -29.03736 23.72902
Correlation Structure: corPagel
 Formula: ~1 
 Parameter estimate(s):
  lambda 
1.010277 
Coefficients:
 Value  Std.Error   t-value p-value
(Intercept)  1.2244133 0.20948634  5.844836  0.
log_group_size_data -0.0234525 0.03723828 -0.629796  0.5303
 Correlation: 
(Intr)
log_group_size_data -0.095
Standardized residuals:
   Min Q1Med Q3Max 
-2.0682836 -0.3859688  1.1515176  1.5908565  3.1163377 
Residual standard error: 0.4830596 
Degrees of freedom: 100 total; 98 residual
 
_
P. Thomas Schoenemann

Associate Professor
Department of Anthropology
Cognitive Science Program
Indiana University
Bloomington, IN  47405
Phone: 812-855-8800
E-mail: t...@indiana.edu

Open Research Scan Archive (ORSA) Co-Director
Consulting Scholar
Museum of Archaeology and Anthropology
University of Pennsylvania

http://www.indiana.edu/~brainevo











[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/