The other possible null model would be a "star" phylogeny with no hierarchical structure, equal-length branches, and also Brownian motion. But that's generally viewed as outside of the range of reasonable possibilities. Cheers, Ted
On Tue, Jun 29, 2021 at 12:05 PM Nathan Upham <nathan.up...@asu.edu> wrote: > Hi Russell and all, sounds good. > > I’d suggest that the “null model” for fitting trait data to a phylogeny > should be single-rate Brownian motion, i.e., you’re assuming that given > data on the ancestor-to-descendant relationships of the species (and timing > of divergences), and assuming the trait is heritable, it is evolving at the > same random rate along each branch. The burden of proof is on rejecting > that null hypothesis (not “accepting it”— sorry for earlier writing that > incorrectly!). So if you do your AIC fitting across the 100 trees, > summarize the results, and find no clear signal of a model being obviously > better than single-rate Brownian, then that is what you should use for > subsequent analyses. > > If anyone has a different perspective on this, please chime in. The above > assumes that you’ve established heritability of the trait, for example by > doing a test for phylogenetic ’signal’. > > Does that help? All the best > —nate > > > > > On Jun 28, 2021, at 1:25 PM, Russell Engelman <neovenatori...@gmail.com> > wrote: > > > > Dear Dr. Upham (and All), > > > > Please don't take my initial message the wrong way, this is not meant to > be a dig at your 2019 study. I don’t think this is due to the birth-death > tree specifically but would be present in any study where there are > multiple phylogenetic trees to choose from or some measure of uncertainty > in the tip dates. I definitely agree with you that there is almost > certainly going to be variation in model support values if there is any > difference in the underlying phylogeny, however, I was surprised that AIC > would vary this much in a dataset where the trait data, number of tips, and > branching topology used to compute the model are more or less constant > between trees. > > > > My question is more along the lines of "given that it is logical to > expect AIC to vary based on differences between trees, how would one go > about determining which regression model is the "optimal" one to use for > further analysis"? You mentioned taking the 95% confidence intervals of the > models and seeing if they don't overlap, would this be just taking the > singular AIC from the OLS model and comparing it to the PGLS one, since OLS > seemingly doesn't produce a confidence interval of AIC values? And if the > confidence intervals do overlap, is the OLS or PGLS considered the null > hypothesis? In my case the AIC for OLS is within the 95% confidence > intervals for the PGLS, but is much lower than the mean value (it's close > to the lower first standard deviation of the AIC values). > > > > Sincerely, > > Russell > > > > On Mon, Jun 28, 2021 at 2:46 PM Nathan Upham <nathan.up...@asu.edu > <mailto:nathan.up...@asu.edu>> wrote: > > Hi Russell and all: > > > > I’ll respond here since the answer is related to the intended purpose of > the VertLife mammal trees — i.e, capturing full uncertainty in node ages > and phylogenetic relationships was one of the motivators for building the > mammal trees in the way we did. This approach contrasts to wanting to > obtain the single “best tree”, since methods of phylogenetic reconstruction > will always just be approximations of the “true tree” anyway rather than > ever being equal to that tree. To only use a single consensus tree in > comparative phylogenetic analyses assumes that we know the true tree, which > again, we don’t ever in an empirical context (only for simulations). Those > points were summarized well by Huelsenbeck et al. (2000: > http://science.sciencemag.org/content/288/5475/2349 < > https://urldefense.com/v3/__http://science.sciencemag.org/content/288/5475/2349__;!!IKRxdwAv5BmarQ!NHbBAhu7TSMQWrZivqX9-FnG8bIhVy_zP2y-3oDjm_A6NttbEbrXO3uQoJczwsIVKQ$>), > but nevertheless are still not standard practice in PCMs. > > > > To the point of AIC varying across the 100 trees, this is to be > expected. Any 1 tree of 100 trees from the credible set is not very > meaningful; the entire 100 trees need to be analyzed and then the estimate > +/- SE from each tree can be summarized as a distribution of values. If > the 95% CI on the distribution of values excludes your hypothesis, then > you’ve learned something; if not, you accept the null hypothesis. See the > animated gifs here (http://vertlife.org/data/mammals/ < > https://urldefense.com/v3/__http://vertlife.org/data/mammals/__;!!IKRxdwAv5BmarQ!NHbBAhu7TSMQWrZivqX9-FnG8bIhVy_zP2y-3oDjm_A6NttbEbrXO3uQoJdB22hcow$>) > for a better conception of why this phylogenetic uncertainty is important > to consider when doing model fitting or other PCMs. > > > > That said, if a single ‘best tree’ is the target, then the DNA-only MCC > tree of 4098 species is a reasonable thing to analyze, more analogous to > how mainstream phylogenetics has presented trees for re-use ( > https://github.com/n8upham/MamPhy_v1/blob/master/_DATA/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_NDexp_MCC_v2_target.tre > < > https://urldefense.com/v3/__https://github.com/n8upham/MamPhy_v1/blob/master/_DATA/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_NDexp_MCC_v2_target.tre__;!!IKRxdwAv5BmarQ!NHbBAhu7TSMQWrZivqX9-FnG8bIhVy_zP2y-3oDjm_A6NttbEbrXO3uQoJc5io9AXA$>). > But again, while the MCC tree is appropriate, 1 of 100 trees from the > credible set is not. > > > > Hope that helps. All the best, > > —nate > > > > > > > > > ============================================================================== > > Nathan S. Upham, Ph.D. (he/him) > > Assistant Research Professor & Associate Curator of Mammals > > Arizona State University, School of Life Sciences, Biodiversity > Knowledge Integration Center (BioKIC <https://biokic.asu.edu/>) > > ~> Check out the new Mammal Tree of Life < > https://urldefense.com/v3/__http://vertlife.org/data/mammals/__;!!IKRxdwAv5BmarQ!NHbBAhu7TSMQWrZivqX9-FnG8bIhVy_zP2y-3oDjm_A6NttbEbrXO3uQoJdB22hcow$> > and the Mammal Diversity Database < > https://urldefense.com/v3/__https://mammaldiversity.org/__;!!IKRxdwAv5BmarQ!NHbBAhu7TSMQWrZivqX9-FnG8bIhVy_zP2y-3oDjm_A6NttbEbrXO3uQoJeGe2v38Q$ > > > > > > Research Associate, Yale University (Ecology and Evolutionary Biology) > > Research Associate, Field Museum of Natural History (Negaunee > Integrative Research Center) > > Chair, Biodiversity Committee, American Society of Mammalogists > > Taxonomy Advisor, IUCN/SSC Small Mammal Specialist Group > > > > personal web: n8u.org < > https://urldefense.com/v3/__http://n8u.org__;!!IKRxdwAv5BmarQ!NHbBAhu7TSMQWrZivqX9-FnG8bIhVy_zP2y-3oDjm_A6NttbEbrXO3uQoJeaYbzGDw$> > | Google Scholar < > https://urldefense.com/v3/__https://scholar.google.com/citations?hl=en&user=zIn4NoUAAAAJ&view_op=list_works&gmla=AJsN-F6ybkfthmTdjTpow6sgMhWKn1EKcfNtmIF_wzZcev7yeHuEu5_aolFS85rWiVRHpiQgbwg43i6eS6kArrabLdFL4bntzUSRmlRP2CW4lbZqeEcColw__;!!IKRxdwAv5BmarQ!NHbBAhu7TSMQWrZivqX9-FnG8bIhVy_zP2y-3oDjm_A6NttbEbrXO3uQoJdLueKsKQ$> > | ASU profile <https://isearch.asu.edu/profile/3682356> > > e: nathan.up...@asu.edu <mailto:nathan.up...@asu.edu> | Skype: > nate_upham | Twitter: @n8_upham < > https://urldefense.com/v3/__https://twitter.com/n8_upham__;!!IKRxdwAv5BmarQ!NHbBAhu7TSMQWrZivqX9-FnG8bIhVy_zP2y-3oDjm_A6NttbEbrXO3uQoJdnOqPBaw$> > > > > ============================================================================= > > > > > > > >> On Jun 28, 2021, at 10:47 AM, Russell Engelman < > neovenatori...@gmail.com <mailto:neovenatori...@gmail.com>> wrote: > >> > >> Dear R-Sig-Phylo Mailing List, > >> > >> I ran into a rather unusual problem. I was doing an analysis using the > >> mammal trees from Upham et al. (2019) downloaded off of the VertLife > site. > >> The model statistics for my data initially suggested that the OLS model > was > >> better supported than a PGLS model based on Akaike Information Criterion > >> (AIC). The reviewers for the paper wanted me to add more taxa, so I > >> re-downloaded a set of trees from VertLife and reran the analysis, but > when > >> I did I found that suddenly the AIC values for the PGLS equation were > >> dramatically different, to the point that it favored a Brownian PGLS > model > >> over all other models. This was despite the fact that previously I found > >> that an OLS model and an OU model had a better model fit than a Brownian > >> model, and the other accuracy statistics of interest (like percent > error, > >> this being a model intended for use in predicting new data) also found > OLS > >> and OU models to fit better than a Brownian PGLS model. The regression > line > >> for a Brownian model doesn't even fit the data at all due to being > biased > >> by a basal clade. The model also has a high amount of phylogenetic > inertia > >> which again would seemingly make an OU model a better option. > >> > >> I used drop.tip to remove the additional taxa to see if I could > replicate > >> my previous results, but it turns out I still couldn't replicate the > >> results. That's when I realized what was causing the change in AIC > values > >> wasn't the taxon selection, but the tree I was using. If I used the old > >> VertLife tree I could replicate the results, but the new VertLife tree > >> produced radically different results despite using the same tips. So > what I > >> decided to do is rerun the analysis for all 100 trees I had available, > and > >> it turned out there was a massive amount of variation in AIC depending > on > >> what tree was chosen. I tried including an html data printout to show > the > >> precise results and how I got them, but I couldn't attach them because > the > >> mailer daemon kept saying they were too large. The AIC values between > trees > >> vary by almost 200 points after excluding extreme outliers, when model > >> differences of 2 or more are often considered to represent statistically > >> detectable differences. The unusually low AIC I got when I first ran the > >> analysis happened to be because the first tree in the 100 trees merely > >> happened to produce a lower-than-average AIC than the whole sample. The > >> average AIC out of the 100 trees was higher than for the OLS model, > which > >> again makes sense given the distribution of the data. > >> > >> However, and this is where my problem comes in, how do I make > appropriate > >> model selections for PGLS if there is such a massive amount of > variation in > >> AIC? Especially given that between the trees in the sample there is > enough > >> variation that it can cause one model to be favored over another? Just > >> picking one tree and going with that seems counterintuitive, because > it's > >> not very objective and theoretically someone could pick a specific tree > to > >> get the results they want, or accidentally pick a tree that might > support > >> the wrong model as seen here. On top of that the tree topologies are > more > >> or less identical: the same 404 taxa are present in all trees and the > trees > >> have nearly identical topologies, the only real differences between > trees > >> are branch lengths. But given this, how can I justify which AIC value I > >> report, which in turn means which model is best supported? > >> > >> I did try looking at the phylo_lm function in the sensiphy package, but > >> that function doesn't seem to provide any method of performing model > >> selection between different regression models. It does seemingly report > >> AIC, but the AIC the function reported was dramatically different from > the > >> aic I got using the gls function in ape and nlme. > >> > >> Sincerely, > >> Russell > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> R-sig-phylo mailing list - R-sig-phylo@r-project.org <mailto: > R-sig-phylo@r-project.org> > >> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-phylo__;!!IKRxdwAv5BmarQ!J5VHDyumBg-_TLx239V3qrIJkgNlKLzuB6l9A_5abdDzeSOOXpHUKardHpGvRdojLg$ > < > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-phylo__;!!IKRxdwAv5BmarQ!J5VHDyumBg-_TLx239V3qrIJkgNlKLzuB6l9A_5abdDzeSOOXpHUKardHpGvRdojLg$> > > >> Searchable archive at > https://urldefense.com/v3/__http://www.mail-archive.com/r-sig-phylo@r-project.org/__;!!IKRxdwAv5BmarQ!J5VHDyumBg-_TLx239V3qrIJkgNlKLzuB6l9A_5abdDzeSOOXpHUKardHpF9l3tadg$ > < > https://urldefense.com/v3/__http://www.mail-archive.com/r-sig-phylo@r-project.org/__;!!IKRxdwAv5BmarQ!J5VHDyumBg-_TLx239V3qrIJkgNlKLzuB6l9A_5abdDzeSOOXpHUKardHpF9l3tadg$> > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/