Hi Russell, And, for a fully PGLS and Bayesian model, see this: https://pubmed.ncbi.nlm.nih.gov/17344851/
Best, Chris On Thu, Jul 1, 2021 at 1:10 AM Theodore Garland <theodore.garl...@ucr.edu> wrote: > > Russell, > Please read this paper: > https://pubmed.ncbi.nlm.nih.gov/10718731/ > Cheers > Ted > > > On Wed, Jun 30, 2021, 9:21 PM Russell Engelman <neovenatori...@gmail.com> > wrote: > > > Dear All, > > > > What you see is the large uncertainty in “ancestral” states, which is part > >> of the intercept here. The linear relationship that you overlaid on top of > >> your data is the relationship predicted at the root of the tree (as if such > >> a thing existed!). There is a lot of uncertainty about the intercept, but > >> much less uncertainty in the slope. It looks like the slope is not affected > >> by the inclusion or exclusion of monotremes. (for one possible reference on > >> the greater precision in the slope versus the intercept, there’s this: > >> http://dx.doi.org/10.1214/13-AOS1105 for the BM). > > > > > > Yes, that sounds right from the other data I have. The line approximates > > what would be expected for the root of Mammalia, and the signal in the PGLS > > is more due to shifts in the y-intercept than shifts in slope, which in > > turn is supported by the anatomy of the proxy. > > > > My second cent is that the phylogenetic predictions should be stable. The > >> uncertainty in the intercept —and the large effect of including monotremes > >> on the intercept— should not affect predictions, so long as you know for > >> which species you want to make a prediction. If you want to make prediction > >> for a species in a small clade “far” from monotremes, say, then the > >> prediction is probably quite stable, even if you include monotremes: this > >> is because the phylogenetic prediction should use the phylogenetic > >> relationships for the species to be predicted. A prediction that uses the > >> linear relationship at the root and ignores the placement of the species > >> would be the worst-case scenario: for a mammal species with a completely > >> unknown placement within mammals. > > > > > > This is what I'm a bit confused about. I was always told (and it seemingly > > implies this in some of the PGLS literature I read like Rohlf 2011 and > > Smaers and Rohlf 2016) that it isn't possible to include phylogenetic data > > from the new data points into the prediction in order to improve > > predictions. I'm a little confused as to whether it's possible or not (see > > below). > > > > There’s probably a number of software that do phylogenetic prediction. I > >> know of Rphylopars and PhyloNetworks. > > > > > > I will take a look into those. > > > > I think that Cécile' and Theodore' point is important and too often > >> overlooked. Using GLS models, the BLUP (Best Linear Unbiased Prediction) is > >> not simply obtained from the fitted line but should incorporates > >> information from the (evolutionary here) model. > > > > > > There’s a way to impute phylogenetic signal back into a PGLS model? I > > am super surprised at that. I’ve talked to at least three different > > colleagues who use PGLS about this issue, and all of them had told me that > > there is no way to input phylogenetic signal back into the model for new > > data points and I should just go with the single regression line the model > > gives me (i.e., the regression line for the ancestral node). > > > > I tried looking around to see what previous researchers used when > > using PCM on body mass (Esteban-Trivigno and Köhler 2011, Campione and > > Evans 2012, Yapuncich 2017 thesis) and it looks like all of them just went > > with the best fit line with the ancestral node, i.e., looking at their > > reported results they give a simple trait~predictor equation that does not > > include phylogeny when calculating new data. Campion and Evans 2012 used > > PIC versus PGLS, which I know are technically equivalent but it doesn't > > seem like they included phylogenetic information when they predicted new > > data: they used their equations on dinosaurs but there are no dinosaurs in > > the tree they used. I know that it’s possible to incorporate phylogenetic > > signal into the new data using PVR but PVR has been criticized for other > > reasons. > > > > This is something that seems really, really concerning because if > > there is a method of using phylogenetic covariance to adjust the position > > of new data points it seems like a lot of workers don’t know these methods > > exist, to the point that even published papers overlook it. This was > > something I was hoping to highlight in a later paper on the data, but it > > sounds like people might have discussed it already. I remember talking with > > my colleagues a lot about "isn't there some way to incorporate phylogenetic > > information back into the model to improve accuracy of the prediction if we > > know where the taxon is positioned?" and they just thought there wasn't a > > way. > > > > Regarding the model comparison, I would simply avoid it (or limit it) by > >> fitting models flexible enough to accommodate between your BM and OLS case > >> and summarize the results obtained across all the trees… > > > > > > I am not entirely sure what is meant here. Do you mean fitting both an OLS > > and BM model and comparing both models? I am reporting both, but my concern > > is about which model I report is the best one to use going forward, since > > the BM model is seemingly less accurate (though I am just taking the fitted > > values from the PGLS model, which I don't think include phylogenetic > > information). The two models I use produce dramatically different results, > > for example the BM model produces body mass estimates which are 25% larger > > than OLS. > > > > Right now PGLS is something I would avoid if I had the option (if for no > > other reason than not put all of the analyses in a single, overloaded > > manuscript [the manuscript is already about 90 pages] and deviate from the > > scope of the study), but I'm sure you know that most regression analyses > > nowadays require some sort of preliminary PCM to be acceptable. > > > > Sincerely, > > Russell > > > > On Wed, Jun 30, 2021 at 10:24 AM Julien Clavel <julien.cla...@hotmail.fr> > > wrote: > > > >> I think that Cécile' and Theodore' point is important and too often > >> overlooked. Using GLS models, the BLUP (Best Linear Unbiased Prediction) is > >> not simply obtained from the fitted line but should incorporates > >> information from the (evolutionary here) model. > >> > >> For multivariate linear model you can also do it by specifying a tree > >> including both the species used to build the model and the ones you want to > >> predict using the “predict” function in mvMORPH (I think that Rphylopars > >> can deal with multivariate phylogenetic regression too). > >> > >> Regarding the model comparison, I would simply avoid it (or limit it) by > >> fitting models flexible enough to accommodate between your BM and OLS case > >> and summarize the results obtained across all the trees… > >> > >> Julien > >> > >> > >> De : R-sig-phylo <r-sig-phylo-boun...@r-project.org> de la part de > >> Theodore Garland <theodore.garl...@ucr.edu> > >> Envoyé : mercredi 30 juin 2021 03:26 > >> À : Cecile Ane <cecile....@wisc.edu> > >> Cc : mailman, r-sig-phylo <r-sig-phylo@r-project.org>; > >> neovenatori...@gmail.com <neovenatori...@gmail.com> > >> Objet : Re: [R-sig-phylo] Model Selection and PGLS > >> > >> All true. I would just add two things. First, always graph your data and > >> do ordinary OLS analyses as a reality check. > >> > >> Second, I think this is the original paper for phylogenetic prediction: > >> Garland, Jr., T., and A. R. Ives. 2000. Using the past to predict the > >> present: confidence intervals for regression equations in phylogenetic > >> comparative methods. American Naturalist 155:346–364. > >> There, we talk about the Equivalency of the Independent-Contrasts and > >> Generalized Least Squares Approaches. > >> > >> Cheers, > >> Ted > >> > >> > >> On Tue, Jun 29, 2021 at 5:01 PM Cecile Ane <cecile....@wisc.edu> wrote: > >> > >> > Hi Russel, > >> > > >> > What you see is the large uncertainty in “ancestral” states, which is > >> part > >> > of the intercept here. The linear relationship that you overlaid on top > >> of > >> > your data is the relationship predicted at the root of the tree (as if > >> such > >> > a thing existed!). There is a lot of uncertainty about the intercept, > >> but > >> > much less uncertainty in the slope. It looks like the slope is not > >> affected > >> > by the inclusion or exclusion of monotremes. (for one possible > >> reference on > >> > the greater precision in the slope versus the intercept, there’s this: > >> > http://dx.doi.org/10.1214/13-AOS1105 for the BM). > >> > > >> > My second cent is that the phylogenetic predictions should be stable. > >> The > >> > uncertainty in the intercept —and the large effect of including > >> monotremes > >> > on the intercept— should not affect predictions, so long as you know for > >> > which species you want to make a prediction. If you want to make > >> prediction > >> > for a species in a small clade “far” from monotremes, say, then the > >> > prediction is probably quite stable, even if you include monotremes: > >> this > >> > is because the phylogenetic prediction should use the phylogenetic > >> > relationships for the species to be predicted. A prediction that uses > >> the > >> > linear relationship at the root and ignores the placement of the species > >> > would be the worst-case scenario: for a mammal species with a completely > >> > unknown placement within mammals. > >> > > >> > There’s probably a number of software that do phylogenetic prediction. I > >> > know of Rphylopars and PhyloNetworks. > >> > > >> > my 2 cents… > >> > Cecile > >> > > >> > --- > >> > Cécile Ané, Professor (she/her) > >> > H. I. Romnes Faculty Fellow > >> > Departments of Statistics and of Botany > >> > University of Wisconsin - Madison > >> > www.stat.wisc.edu/~ane/<http://www.stat.wisc.edu/~ane/> > >> > > >> > CALS statistical consulting lab: > >> > https://calslab.cals.wisc.edu/stat-consulting/ > >> > > >> > > >> > > >> > On Jun 29, 2021, at 5:37 PM, neovenatori...@gmail.com<mailto: > >> > neovenatori...@gmail.com> wrote: > >> > > >> > Dear All, > >> > > >> > So this is the main problem I'm facing (see attached figure, which > >> should > >> > be small enough to post). When I calculate the best-fit line under a > >> > Brownian model, this produces a best-fit line that more or less bypasses > >> > the distribution of the data altogether. I did some testing and found > >> that > >> > this result was driven solely by the presence of Monotremata, resulting > >> in > >> > the model heavily downweighting all of the phylogenetic variation within > >> > Theria in favor of the deep divergence between Monotremata and Theria. > >> > Excluding Monotremata produces a PGLS fit that's comparable enough to > >> the > >> > OLS and OU model fit to be justifiable (though I can't just throw out > >> > Monotremata for the sake of throwing it out). > >> > > >> > I am planning to do a more theoretical investigation into the effect of > >> > Monotremata on the PGLS fit in a future study, but right now what I am > >> > trying to do is perform a study in which I use this data to construct a > >> > regression model that can be used to predict new data. Which is why I am > >> > trying to use AIC to potentially justify going with OLS or an OU model > >> over > >> > a Brownian model. From a practical perspective the Brownian model is > >> almost > >> > unusable because it produces systematically biased estimates with high > >> > error rates when applied to new data (error rate is roughly double that > >> of > >> > both the OLS and OU model). This is especially the case because the data > >> > must be back-transformed into an arithmetic scale to be useable, and > >> thus a > >> > seemingly minor difference in regression models results in a massive > >> > difference in predicted values. However, I need some objective test to > >> show > >> > that OLS fits the data better than the Brownian model, hence why I was > >> > going with AIC. Overall, OLS does seem to outperform the Brownian model > >> on > >> > average, but the variation in AIC is so high it is hard to interpret > >> this. > >> > > >> > This is kind of why I am leery of assuming a null Brownian model. A > >> > Brownian model, if anything, does not seem to accurately model the > >> > relationship between variables. > >> > > >> > This is why I am having trouble figuring out how to do model selection. > >> > Just going with accuracy statistics like percent error or standard > >> error of > >> > the estimate OLS is better from a purely practical sense (it doesn't > >> work > >> > for the monotreme taxa, but it turns out that estimate error in the > >> > monotremes is only decreased by 10% in a Brownian model when it > >> > overestimates mass by nearly 75%, so the improvement really isn't worth > >> it > >> > and using this for monotremes isn't recommended in the first place), but > >> > the reviewers are expressing skepticism over the fact that the Brownian > >> > model produces less useable results. And I'm not entirely sure the best > >> way > >> > to go about the PGLS if using one of the birth-death trees isn't ideal, > >> > perhaps what Dr. Upham says about using the DNA tree might work better. > >> > > >> > Ironically, an OU model might be argued to better fit the data, despite > >> > the concerns that Dr. Bapst mentioned. Looking at the distribution of > >> > signal even though signal is not random, it is more accurately > >> described as > >> > most taxa hewing to a stable equilibrium with rapid, high magnitude > >> shifts > >> > at certain evolutionary nodes, rather than the covariation between the > >> two > >> > traits evolving in a Brownian fashion. I did some experiments with a PSR > >> > curve and the results seem to favor an OU model or other models with > >> uneven > >> > rates of evolution rather than a pure Brownian model. > >> > > >> > Of course, the broader issue I am facing is trying to deal with PGLS > >> > succinctly; the scope of the study isn't necessarily an in-depth > >> comparison > >> > between different regression models, it's more looking at how this > >> variable > >> > correlates with body mass for practical purposes (for which considering > >> > phylogeny is one part of that). It's definitely something to consider > >> but I > >> > am trying to avoid manuscript bloat. > >> > > >> > Sincerely, > >> > Russell > >> > > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > _______________________________________________ > >> > R-sig-phylo mailing list - R-sig-phylo@r-project.org > >> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > >> > Searchable archive at > >> > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > >> > > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> R-sig-phylo mailing list - R-sig-phylo@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > >> Searchable archive at > >> http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/