Re: [R-sig-phylo] multi-state categorical predictor variables in PGLS

Marguerite Butler Mon, 07 Mar 2011 21:00:00 -0800

Hi Andrew, 
>> 
>>  Does this sidestep the degrees of
>> freedom problem discussed by Garland et al.?  Can anybody point me to
>> references discussing the mechanics of this process and why this is an
>> appropriate thing to do?


> 

Others on this list will disagree with me, but it's not a "degrees of freedom" 
problem. What phylogeny introduces is not a reduction in the number of degrees 
of freedom, but rather covariance in the traits at the end of the evolutionary 
process. The number of species is the number of species. Period. But they are 
sometimes similar to different degrees because of shared history. Statisticians 
call this problem "correlated errors", in this case, the phylogeny is the 
error:) that is obscuring the "true" relationship between ecology and your 
trait of interest, for example. It is simple to remove the "correlated error" 
by accounting for this shared covariance structure, which is what the PGLS code 
does. It is basically like normalizing variances in univariate statistics by 
dividing by the variance. In this case the variance is a multivariate 
covariance matrix, calculated from the phylogeny (expected similarity based on 
shared ancestry). 

The dummy code is just a way to do include a categorical (multistate) variable 
in the ANOVA, and is a standard parametric statistic procedure. We describe how 
to do effect coding by hand in an old paper (see appendix). Doing it by hand 
helps to understand what is going on:

Butler M.A. Schoener T.W., and Losos J.B. (2000)  The relationship between 
habitat type and sexual size dimorphism in Greater Antillean Anolis lizards.  
Evolution 54(1):259-272.

Anyway, yes, this should take care of the phylogenetic analysis. (even though I 
wouldn't call it a degrees of freedom problem).

Marguerite

On Mar 7, 2011, at 7:06 AM, Alejandro Gonzalez V wrote:

> Hi Andrew,
> 
>> As I understand it, gls() is doing a multiple generalized LS
>> regression with as many dummy variables as there are factor levels.
>> Is this a correct characterization?
> 
> I think you'd get one dummy variable less than factor levels in your 
> characterization (at least in regards to the number of levels for which 
> parameters are estimated), as the gls "sets" one of the levels as the point 
> of comparison with all other levels. Thus you'd get n-1 dummy variables for 
> which the parameters are estimated.
> 
> Having such a low value of alpha the results of the phylogenetic gls should 
> be similar (if not identical) to results not taking phylogeny into account, 
> as this suggests you don't have phylogenetic signal in the residuals of your 
> relationship.
> There is a good paper on this issue by Liam Revell in Methods in Ecology and 
> Evolution.
> There is also a function in geiger which allows you to run phylogenetic 
> ANOVAs, but if I am not mistaken, the p-value is estimated based on 
> simulations assuming traits evolve via Brownian motion (is this correct?).
> I've also seen lambda values below 0 in ape, theoretically lambda is 
> described as being bounded between 0 and 1, but it could take values outside 
> the bounds. I would be interested in hearing the thoughts of others in the 
> list regarding whether lambda values for the Phylogenetic gls should be 
> forced to be bounded between 0 and 1. This would more closely follow what has 
> been proposed in the literature wouldn't it?
> 
> Cheers
> 
> Alejandro
> 
> __________________________________
> 
> Alejandro Gonzalez Voyer
> Post-doc
> 
> Estación Biológica de Doñana (CSIC)
> Avenida Américo Vespucio s/n
> 41092 Sevilla 
> Spain
> 
> Tel: +34- 954 466700, ext 1749
> 
> E-mail: alejandro.gonza...@ebd.csic.es
> 
> Web-site: https://docs.google.com/View?id=dfs328dh_14gwwqsxcg
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 7, Mar 2011, at 5:52 PM, Andrew Barr wrote:
> 
>> Hi everyone,
>> 
>> I am trying to piece together the current best-practices for
>> "phylogenetic ANOVA" with multi-state predictors.
>> 
>> In my dataset, my four-level factor is non-random with respect to
>> phylogeny.  That is, if I know which higher level clade an species
>> belongs to, I can predict with pretty good success which factor level
>> it will be in.  My understanding is that this situation likely
>> overinflates my degrees of freedom and makes traditional F-tests
>> inappropriate. I came across this paper (Garland et al 1993.
>> Phylogenetic Analysis of Covariance by Computer Simulation. Systematic
>> Biology 42:265 -292.) where the authors empirically recalculate
>> critical values for F-ratios using computer simulations, tree
>> topology, and a model of character evolution.
>> 
>> I also have found that I can use PGLS (with ape and nlme) and specify
>> my model like this.
>> 
>> gls(myVar~myFactor,corr=corPagel(val=1,phy=myTree,fixed=F),data=myDF)
>> 
>> As I understand it, gls() is doing a multiple generalized LS
>> regression with as many dummy variables as there are factor levels.
>> Is this a correct characterization?  Does this sidestep the degrees of
>> freedom problem discussed by Garland et al.?  Can anybody point me to
>> references discussing the mechanics of this process and why this is an
>> appropriate thing to do?
>> 
>> Finally, I get a negative value for estimated lambda.  Any ideas on
>> what that means?
>> 
>> Thanks to everyone for any advice/references/.
>> 
>> Andrew Barr
>> PhD Student
>> University of Texas at Austin
>> 
>> ####results from my model
>> Generalized least squares fit by REML
>> Model: LIWI ~ Hab
>> Data: aggast
>>       AIC       BIC   logLik
>> -65.61627 -56.28418 38.80814
>> 
>> Correlation Structure: corPagel
>> Formula: ~1
>> Parameter estimate(s):
>>   lambda
>> -0.1480891
>> 
>> Coefficients:
>>                Value  Std.Error  t-value p-value
>> (Intercept)  1.4492742 0.01876415 77.23635  0.0000
>> HabH        -0.0224975 0.03149986 -0.71421  0.4798
>> HabL        -0.0668761 0.03066232 -2.18105  0.0360
>> HabO        -0.1630386 0.02567505 -6.35008  0.0000
>> 
>> Correlation:
>>    (Intr) HabH   HabL
>> HabH -0.686
>> HabL -0.794  0.485
>> HabO -0.936  0.594  0.542
>> 
>> Standardized residuals:
>>       Min          Q1         Med          Q3         Max
>> -2.17865325 -0.60297897 -0.09760938  0.41995284  2.91201671
>> 
>> Residual standard error: 0.06913702
>> Degrees of freedom: 39 total; 35 residual
>> 
>> _______________________________________________
>> R-sig-phylo mailing list
>> R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> 
> 
>       [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-phylo mailing list
> R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

____________________________________________
Marguerite A. Butler
Associate Professor
Department of Zoology
University of Hawaii
2538 McCarthy Mall, Edmondson 259
Honolulu, HI  96822

FAX:   808-956-9812
Dept: 808-956-8617
http://www2.hawaii.edu/~mbutler
http://www.hawaii.edu/zoology/


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] multi-state categorical predictor variables in PGLS

Reply via email to