Re: [R-sig-phylo] Highlighting Specific Clades in a Lineage-Through-Time Plot
My apologies, I just noticed that the cumulative plot was already on the page, I had misinterpreted what that plot meant. Russell On Sat, Oct 28, 2023 at 10:56 PM Russell Engelman wrote: > Dear Liam, > > This is great, almost exactly what I needed. I will have to get our tree > out and see if I can get it to work in R. > > One question, is it possible to do a cumulative tree like this ( > https://proopnarine.files.wordpress.com/2018/11/evolfaunas.png)? This is > a good example of the graph I am trying to produce. The goal here is to > show not only the diversity of each clade (which the current example on > your blog does well) but also show that after a certain point all surviving > lineages belong to these clades, and "basal" uncolored lineages are gone. > > Sincerely, > Russell > > On Sat, Oct 28, 2023 at 9:09 PM Liam J. Revell > wrote: > >> I like Will's suggestion. Unfortunately, *phytools::ltt.simmap* method >> in *phytools* on CRAN does not support ultrametric trees. >> >> I have updated it on GitHub & also posted a workflow to my blog that may >> capture what Russell is going for (if I understand it correctly): >> http://blog.phytools.org/2023/10/visualizing-lineage-accumulation-on.html >> . >> >> All the best, Liam >> Liam J. Revell >> Professor of Biology, University of Massachusetts Boston >> Web: http://faculty.umb.edu/liam.revell/ >> Book: Phylogenetic Comparative Methods in R >> <https://press.princeton.edu/books/phylogenetic-comparative-methods-in-r> >> (*Princeton University Press*, 2022) >> >> >> On 10/28/2023 6:56 PM, William Gearty wrote: >> >> CAUTION: EXTERNAL SENDER >> >> The output from Liam's function should be pretty useable for ggplot, too. I >> think the function has a plot argument that you can just set to False. >> >> On Sat, Oct 28, 2023, 6:37 PM Russell Engelman >> >> wrote: >> >> >> It looks like what Liam wrote (specifically, the final figure in the post) >> is pretty close to what I am looking for. >> >> Emmanuel's suggestion of using ltt.plots.coords might also work, and it >> might be easier to input that data into ggplot if I could find some way to >> merge the various matrices. >> >> Russell >> >> On Fri, Oct 27, 2023 at 10:04 AM William Gearty >> >> wrote: >> >> >> You could also map your states (in this case, taxonomic groupings) onto >> your tree, then make an ltt plot with phytools::ltt(). Liam has a great >> blog post about it >> here:https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.phytools.org%2F2022%2F08%2Flineage-through-time-plots-for.html=05%7C01%7Cliam.revell%40umb.edu%7C9cff979c168d4a52a6b208dbd8092b8a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C638341306150821485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=cCbfHtMBYBYoFjct9FzpZ78ZpUkRwrZLPH8gs68KnCM%3D=0 >> <http://blog.phytools.org/2022/08/lineage-through-time-plots-for.html>. >> >> Best, >> Will >> >> -- >> *William Gearty* >> *Lerner-Gray Postdoctoral Research Fellow* >> Division of Paleontology >> American Museum of Natural Historywilliamgearty.com >> >> >> >> On Thu, Oct 26, 2023 at 9:55 PM Emmanuel Paradis >> >> wrote: >> >> >> Hi Russell, >> >> There are several implementations of LTT plots among several packages, >> so the details certainly differ depending on which one(s) you use. >> >> Maybe you can use the ape function ltt.plot.coords() which returns a >> matrix with 2 columns giving the number of lineages for each node time of >> the tree. I guess you'll need to do a bit of programming to combine the >> different outputs from different trees. >> >> Best, >> >> Emmanuel >> >> - Le 27 Oct 23, à 8:05, Russell Engelman >> >> a écrit : >> >> >> Dear R-Sig-Phylo, >> I was wondering if there was any way to color-code a >> >> lineage-through-time plot, >> >> to highlight the proportion of taxa at specific intervals that belong >> >> to a >> >> particular clade. I.e., an LTT plot of tetrapod diversity through >> >> time, and I >> >> want to highlight the number of lineages at any one point in time that >> >> are >> >> chondrichthyans, sarcopterygians, lissamphibians, etc. (see attached >> >> fig. as an >> >> example). I figured I can do this by ask
Re: [R-sig-phylo] Highlighting Specific Clades in a Lineage-Through-Time Plot
Dear Liam, This is great, almost exactly what I needed. I will have to get our tree out and see if I can get it to work in R. One question, is it possible to do a cumulative tree like this ( https://proopnarine.files.wordpress.com/2018/11/evolfaunas.png)? This is a good example of the graph I am trying to produce. The goal here is to show not only the diversity of each clade (which the current example on your blog does well) but also show that after a certain point all surviving lineages belong to these clades, and "basal" uncolored lineages are gone. Sincerely, Russell On Sat, Oct 28, 2023 at 9:09 PM Liam J. Revell wrote: > I like Will's suggestion. Unfortunately, *phytools::ltt.simmap* method in > *phytools* on CRAN does not support ultrametric trees. > > I have updated it on GitHub & also posted a workflow to my blog that may > capture what Russell is going for (if I understand it correctly): > http://blog.phytools.org/2023/10/visualizing-lineage-accumulation-on.html. > > All the best, Liam > Liam J. Revell > Professor of Biology, University of Massachusetts Boston > Web: http://faculty.umb.edu/liam.revell/ > Book: Phylogenetic Comparative Methods in R > <https://press.princeton.edu/books/phylogenetic-comparative-methods-in-r> > (*Princeton University Press*, 2022) > > > On 10/28/2023 6:56 PM, William Gearty wrote: > > CAUTION: EXTERNAL SENDER > > The output from Liam's function should be pretty useable for ggplot, too. I > think the function has a plot argument that you can just set to False. > > On Sat, Oct 28, 2023, 6:37 PM Russell Engelman > > wrote: > > > It looks like what Liam wrote (specifically, the final figure in the post) > is pretty close to what I am looking for. > > Emmanuel's suggestion of using ltt.plots.coords might also work, and it > might be easier to input that data into ggplot if I could find some way to > merge the various matrices. > > Russell > > On Fri, Oct 27, 2023 at 10:04 AM William Gearty > > wrote: > > > You could also map your states (in this case, taxonomic groupings) onto > your tree, then make an ltt plot with phytools::ltt(). Liam has a great > blog post about it > here:https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.phytools.org%2F2022%2F08%2Flineage-through-time-plots-for.html=05%7C01%7Cliam.revell%40umb.edu%7C9cff979c168d4a52a6b208dbd8092b8a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C638341306150821485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=cCbfHtMBYBYoFjct9FzpZ78ZpUkRwrZLPH8gs68KnCM%3D=0 > <http://blog.phytools.org/2022/08/lineage-through-time-plots-for.html>. > > Best, > Will > > -- > *William Gearty* > *Lerner-Gray Postdoctoral Research Fellow* > Division of Paleontology > American Museum of Natural Historywilliamgearty.com > > > > On Thu, Oct 26, 2023 at 9:55 PM Emmanuel Paradis > > wrote: > > > Hi Russell, > > There are several implementations of LTT plots among several packages, > so the details certainly differ depending on which one(s) you use. > > Maybe you can use the ape function ltt.plot.coords() which returns a > matrix with 2 columns giving the number of lineages for each node time of > the tree. I guess you'll need to do a bit of programming to combine the > different outputs from different trees. > > Best, > > Emmanuel > > - Le 27 Oct 23, à 8:05, Russell Engelman > > a écrit : > > > Dear R-Sig-Phylo, > I was wondering if there was any way to color-code a > > lineage-through-time plot, > > to highlight the proportion of taxa at specific intervals that belong > > to a > > particular clade. I.e., an LTT plot of tetrapod diversity through > > time, and I > > want to highlight the number of lineages at any one point in time that > > are > > chondrichthyans, sarcopterygians, lissamphibians, etc. (see attached > > fig. as an > > example). I figured I can do this by asking R to return all lineages > > that are > > descendants of a specific node, but am not sure what functions I can > > use to > > convert the dated tree into an object that can be read into ggplot. > > Sincerely, > Russell > > ___ > R-sig-phylo mailing list - > R-sig-phylo@r-project.orghttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo=05%7C01%7Cliam.revell%40umb.edu%7C9cff979c168d4a52a6b208dbd8092b8a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C638341306150821485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
Re: [R-sig-phylo] Highlighting Specific Clades in a Lineage-Through-Time Plot
It looks like what Liam wrote (specifically, the final figure in the post) is pretty close to what I am looking for. Emmanuel's suggestion of using ltt.plots.coords might also work, and it might be easier to input that data into ggplot if I could find some way to merge the various matrices. Russell On Fri, Oct 27, 2023 at 10:04 AM William Gearty wrote: > You could also map your states (in this case, taxonomic groupings) onto > your tree, then make an ltt plot with phytools::ltt(). Liam has a great > blog post about it here: > http://blog.phytools.org/2022/08/lineage-through-time-plots-for.html. > > Best, > Will > > -- > *William Gearty* > *Lerner-Gray Postdoctoral Research Fellow* > Division of Paleontology > American Museum of Natural History > williamgearty.com > > > > On Thu, Oct 26, 2023 at 9:55 PM Emmanuel Paradis > wrote: > >> Hi Russell, >> >> There are several implementations of LTT plots among several packages, so >> the details certainly differ depending on which one(s) you use. >> >> Maybe you can use the ape function ltt.plot.coords() which returns a >> matrix with 2 columns giving the number of lineages for each node time of >> the tree. I guess you'll need to do a bit of programming to combine the >> different outputs from different trees. >> >> Best, >> >> Emmanuel >> >> - Le 27 Oct 23, à 8:05, Russell Engelman >> a écrit : >> >> > Dear R-Sig-Phylo, >> > I was wondering if there was any way to color-code a >> lineage-through-time plot, >> > to highlight the proportion of taxa at specific intervals that belong >> to a >> > particular clade. I.e., an LTT plot of tetrapod diversity through time, >> and I >> > want to highlight the number of lineages at any one point in time that >> are >> > chondrichthyans, sarcopterygians, lissamphibians, etc. (see attached >> fig. as an >> > example). I figured I can do this by asking R to return all lineages >> that are >> > descendants of a specific node, but am not sure what functions I can >> use to >> > convert the dated tree into an object that can be read into ggplot. >> >> > Sincerely, >> > Russell >> >> > ___ >> > R-sig-phylo mailing list - R-sig-phylo@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> > Searchable archive at >> http://www.mail-archive.com/r-sig-phylo@r-project.org/ >> >> [[alternative HTML version deleted]] >> >> ___ >> R-sig-phylo mailing list - R-sig-phylo@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at >> http://www.mail-archive.com/r-sig-phylo@r-project.org/ >> > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Highlighting Specific Clades in a Lineage-Through-Time Plot
Dear R-Sig-Phylo, I was wondering if there was any way to color-code a lineage-through-time plot, to highlight the proportion of taxa at specific intervals that belong to a particular clade. I.e., an LTT plot of tetrapod diversity through time, and I want to highlight the number of lineages at any one point in time that are chondrichthyans, sarcopterygians, lissamphibians, etc. (see attached fig. as an example). I figured I can do this by asking R to return all lineages that are descendants of a specific node, but am not sure what functions I can use to convert the dated tree into an object that can be read into ggplot. Sincerely, Russell ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Model Selection and PGLS
Dear All, Just sending a message to see if my last email went through. The code for the PGLS still doesn't seem to work: it's giving results contradictory to what would be expected based on the data (e.g., mu should be extremely negative for Monotremata, but it's very variable among the three species and many other species exhibit higher mus in that direction, adding the correction factor actually makes the prediction worse despite high Pagel's lambda). I attached a reprex version of my code with the data that reproduced the result, but I was concerned it didn't go through because when I've tried to attach files to the r-sig-phylo list previously they are often unable to go through. I checked the r-sig-phylo archive to see if it successfully went through, but it isn't listed there, which is one reason I am concerned my message did not get sent properly. Just emailing to ask, sorry if this is a bother. Sincerely, Russell On Fri, Jul 2, 2021 at 8:52 PM Simone Blomberg wrote: > I knew there would be a mistake somewhere. Here is the correction: > > ## C <- 1+(lambda-1)*(1-mat) ## Lambda correction WRONG!! > C <- (lambda*mat)+(1-lambda)*diag(diag(mat)) ## correct, I think.. > > I tested it on a toy example of a tree with 5 species and it works fine.. > > I hope there are no other mistakes! See other comments below. > > Cheers, > > Simone. > On 3/7/21 8:40 am, Russell Engelman wrote: > > Dear All, > > Okay, so I tried the code that Dr. Blomberg provided, and it doesn't seem > to work. I rewrote the code slightly in order for it to try and predict the > body mass of the taxa used to calculate the line, rather than some new > taxon, because I need to do that in order to calculate prediction error and > be able to compare it to previous studies. I got a lambda of about 0.9, > which is approximately the same as what I got previously. So on the surface > it seems like the method should work. > > The problem comes when I try to apply this to the data itself to calculate > mu. This is the modified code I used if that helps, though I don't think I > can send the data or tree to make it replicable because the message > wouldn't go through last time. > > tree.trimmed <- drop.tip(trees[[1]], which(is.na(b$bm))) > b.trimmed <- b[complete.cases(b$bm),] > fit <- gls(log(bm)~I(log(ocw)^(2/3)), corPagel(0.5, phy=tree.trimmed, > form=~species), >data=b.trimmed) > (lambda <- as.numeric(fit$modelStruct)) > model.matFULL <- model.matrix(~I(log(ocw)^(2/3)), data=b) > preds <- predict(fit,newdata=b) > mat <- vcv.phylo(trees[[1]]) > C <- 1+(lambda-1)*(1-mat) ## Lambda correction > XnoMean <- scale(model.matFULL, scale=FALSE) > mu <- C %*% solve(C) %*% XnoMean ## Replaced CCompletedata and cihmat > with C, since both seem to be subsets of that data. This is the only place > I could think things might go wrong. > > C is the full variance-covariance matrix, made from the tree that includes > both the known and unknown species. CCompletedata is the > variance-covariance matrix WITHOUT the unknown species. This corresponds to > C in Ted and Tony's paper. cihmat is the matrix of relationships of the > unknown species to the known species, one row for each unknown species. So > you cannot replace cihmat with C. > > mu2<-(mu[match(b$species,row.names(mu)),])[,2] ## Have to re-sort mu based > on the data sheet. Mu is reported in phylogenetic order, whereas > predict.gls will always report the data in alphabetical order of the row > names even if a phylogeny is present. > > predict.gls reports the predicted values in the order that is present in > the newdata argument. You shouldn't have to re-order anything in my example > code. YMMV however if you adapt my code. > > exp(preds+mu2) ## "correct" prediction accounting for bias due to > correlations with other species. Exp() is because data were log-transformed. > > So when I check the results of this equation the predicted values are > actually *less *accurate than the fitted values reported from gls without > correction. Normally the accuracy is about +/- 64% of the actual value, but > adding the correction here results in the data being +/- 84% of the actual > value. What's more, I took a look at the values of mu and they don't seem > to make sense when compared on the phylogeny. Namely, if mu were being > calculated right, one would expect monotremes to have a very negative value > compared to all other mammals given the deep divergence in trait states > between monotremes and all other taxa in the figure I previously attached. > But when I check the mu values, I get a value of 0.35 for *Zaglossus*, > -0.58 for *Tachyglossus*, and 0.59 for *Ornithorhynchus*. Which is not > what would be expected if m
Re: [R-sig-phylo] Model Selection and PGLS
dicted.orig-data.check$actual)/data.check$predicted.orig) #%PE with uncorrected PGLS mean(abs(data.check$predicted.corr-data.check$actual)/data.check$predicted.corr) #%PE with corrected PGLS Sincerely, Russell On Fri, Jul 2, 2021 at 8:52 PM Simone Blomberg wrote: > I knew there would be a mistake somewhere. Here is the correction: > > ## C <- 1+(lambda-1)*(1-mat) ## Lambda correction WRONG!! > C <- (lambda*mat)+(1-lambda)*diag(diag(mat)) ## correct, I think.. > > I tested it on a toy example of a tree with 5 species and it works fine.. > > I hope there are no other mistakes! See other comments below. > > Cheers, > > Simone. > On 3/7/21 8:40 am, Russell Engelman wrote: > > Dear All, > > Okay, so I tried the code that Dr. Blomberg provided, and it doesn't seem > to work. I rewrote the code slightly in order for it to try and predict the > body mass of the taxa used to calculate the line, rather than some new > taxon, because I need to do that in order to calculate prediction error and > be able to compare it to previous studies. I got a lambda of about 0.9, > which is approximately the same as what I got previously. So on the surface > it seems like the method should work. > > The problem comes when I try to apply this to the data itself to calculate > mu. This is the modified code I used if that helps, though I don't think I > can send the data or tree to make it replicable because the message > wouldn't go through last time. > > tree.trimmed <- drop.tip(trees[[1]], which(is.na(b$bm))) > b.trimmed <- b[complete.cases(b$bm),] > fit <- gls(log(bm)~I(log(ocw)^(2/3)), corPagel(0.5, phy=tree.trimmed, > form=~species), >data=b.trimmed) > (lambda <- as.numeric(fit$modelStruct)) > model.matFULL <- model.matrix(~I(log(ocw)^(2/3)), data=b) > preds <- predict(fit,newdata=b) > mat <- vcv.phylo(trees[[1]]) > C <- 1+(lambda-1)*(1-mat) ## Lambda correction > XnoMean <- scale(model.matFULL, scale=FALSE) > mu <- C %*% solve(C) %*% XnoMean ## Replaced CCompletedata and cihmat > with C, since both seem to be subsets of that data. This is the only place > I could think things might go wrong. > > C is the full variance-covariance matrix, made from the tree that includes > both the known and unknown species. CCompletedata is the > variance-covariance matrix WITHOUT the unknown species. This corresponds to > C in Ted and Tony's paper. cihmat is the matrix of relationships of the > unknown species to the known species, one row for each unknown species. So > you cannot replace cihmat with C. > > mu2<-(mu[match(b$species,row.names(mu)),])[,2] ## Have to re-sort mu based > on the data sheet. Mu is reported in phylogenetic order, whereas > predict.gls will always report the data in alphabetical order of the row > names even if a phylogeny is present. > > predict.gls reports the predicted values in the order that is present in > the newdata argument. You shouldn't have to re-order anything in my example > code. YMMV however if you adapt my code. > > exp(preds+mu2) ## "correct" prediction accounting for bias due to > correlations with other species. Exp() is because data were log-transformed. > > So when I check the results of this equation the predicted values are > actually *less *accurate than the fitted values reported from gls without > correction. Normally the accuracy is about +/- 64% of the actual value, but > adding the correction here results in the data being +/- 84% of the actual > value. What's more, I took a look at the values of mu and they don't seem > to make sense when compared on the phylogeny. Namely, if mu were being > calculated right, one would expect monotremes to have a very negative value > compared to all other mammals given the deep divergence in trait states > between monotremes and all other taxa in the figure I previously attached. > But when I check the mu values, I get a value of 0.35 for *Zaglossus*, > -0.58 for *Tachyglossus*, and 0.59 for *Ornithorhynchus*. Which is not > what would be expected if mu were modelling a phylogenetic correction > factor (if nothing else, the range of mu should probably not span zero). So > it suggests to me I'm not doing something right. I know that PGLS doesn't > always result in better fits than OLS, but I'm surprised adding signal back > in made the results worse when the fit was so poor when it was fitted > through the ancestral node. > > Sincerely, > Russell > > -- > Simone Blomberg, BSc (Hons), PhD, MAppStat (she/her) > Senior Lecturer and Consultant Statistician > School of Biological Sciences > The University of Queensland > St. Lucia Queensland 4072 > Australia > T: +61 7 3365 2506 > email: S.Blomberg1_at_uq.edu.au > Twitter:
Re: [R-sig-phylo] Model Selection and PGLS
Dear All, Okay, so I tried the code that Dr. Blomberg provided, and it doesn't seem to work. I rewrote the code slightly in order for it to try and predict the body mass of the taxa used to calculate the line, rather than some new taxon, because I need to do that in order to calculate prediction error and be able to compare it to previous studies. I got a lambda of about 0.9, which is approximately the same as what I got previously. So on the surface it seems like the method should work. The problem comes when I try to apply this to the data itself to calculate mu. This is the modified code I used if that helps, though I don't think I can send the data or tree to make it replicable because the message wouldn't go through last time. tree.trimmed <- drop.tip(trees[[1]], which(is.na(b$bm))) b.trimmed <- b[complete.cases(b$bm),] fit <- gls(log(bm)~I(log(ocw)^(2/3)), corPagel(0.5, phy=tree.trimmed, form=~species), data=b.trimmed) (lambda <- as.numeric(fit$modelStruct)) model.matFULL <- model.matrix(~I(log(ocw)^(2/3)), data=b) preds <- predict(fit,newdata=b) mat <- vcv.phylo(trees[[1]]) C <- 1+(lambda-1)*(1-mat) ## Lambda correction XnoMean <- scale(model.matFULL, scale=FALSE) mu <- C %*% solve(C) %*% XnoMean ## Replaced CCompletedata and cihmat with C, since both seem to be subsets of that data. This is the only place I could think things might go wrong. mu2<-(mu[match(b$species,row.names(mu)),])[,2] ## Have to re-sort mu based on the data sheet. Mu is reported in phylogenetic order, whereas predict.gls will always report the data in alphabetical order of the row names even if a phylogeny is present. exp(preds+mu2) ## "correct" prediction accounting for bias due to correlations with other species. Exp() is because data were log-transformed. So when I check the results of this equation the predicted values are actually *less *accurate than the fitted values reported from gls without correction. Normally the accuracy is about +/- 64% of the actual value, but adding the correction here results in the data being +/- 84% of the actual value. What's more, I took a look at the values of mu and they don't seem to make sense when compared on the phylogeny. Namely, if mu were being calculated right, one would expect monotremes to have a very negative value compared to all other mammals given the deep divergence in trait states between monotremes and all other taxa in the figure I previously attached. But when I check the mu values, I get a value of 0.35 for *Zaglossus*, -0.58 for *Tachyglossus*, and 0.59 for *Ornithorhynchus*. Which is not what would be expected if mu were modelling a phylogenetic correction factor (if nothing else, the range of mu should probably not span zero). So it suggests to me I'm not doing something right. I know that PGLS doesn't always result in better fits than OLS, but I'm surprised adding signal back in made the results worse when the fit was so poor when it was fitted through the ancestral node. Sincerely, Russell [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Model Selection and PGLS
Dear All, Okay, I've read all of the papers that people have linked, and I have an understanding of the theory. However, where I am struggling now is trying to figure out how to actually perform a PGLS that incorporates signal when predicting new data in R. I tested a bunch of gls functions in different R packages, including *pgls* in *caper, phylolm* in *phylolm*, and *gls* in *nlme*. I tried mvgls in *mvMORPH *but I got the error message saying to try *gls *in *nlme* because *mvMORPH *requires multivariate data. In all of these cases it doesn't appear as though the functions incorporate phylogenetic information back into the PGLS model to make predictions. For body mass estimation equations model accuracy is usually evaluated by calculating the percent error between the fitted values for the data and the actual data values, since prediction is the goal and log-transformation can make other parameters like r2 more difficult to use (though I know r2 isn't usually a thing when it comes to PGLS). However, when I calculated percent error, I noticed that the fitted values of the function seem to be predicted through the ancestral node, *not *by adjusting the prediction using phylogenetic information. This appears to be the case regardless of whether fitted values were called directly from the function (e.g., "fit.gls$fitted.values") or by predicting values of the original data using *predict *(i.e., "predict(fit.gls,data)"). The results of the two methods were identical when I used *phylolm *and *gls*, but differed slightly between using predict or fitted values in *caper *(but not enough to conclude that one method did incorporate phylogenetic signal, just enough that it's clear they aren't identical). Furthermore, I also tried manually estimating point values using the line fit through the ancestral node and the values produced were identical to the fitted values reported in these other methods. Whatever I'm doing, it's not incorporating phylogenetic signal to produce estimates. I haven't tried the Bayesian methodology in Organ et al. (2007) yet, mostly because I'm not familiar with BayesTraits and am trying to keep all my analyses in R if possible as I have an RMarkdown document that I am documenting my method as supplementary information. I may have to try BayesTraits if there aren't any other options. I'm also not sure if PIC would be easier than trying to mess with Bayesian models, though I know PIC and PGLS produce more or less identical results. I am surprised that researchers don’t know about these methods: they are > basically just like ancestral state reconstruction, but for new tips in the > tree. Everyone knows about “ancestral state reconstruction”. But if few > people recognize that the same methodology can be applied to new taxa, then > it’s a problem indeed. I had numerous debates with colleagues and other researchers in my department about this exact topic, and it's nice to know the arguments I made of "if you already have a variance- covariance matrix with known taxa, why can't you just adjust estimated based on phylogenetic data if you have a resolved phylogeny" weren't completely nuts. I think a lot of it is there don't seem to be a lot of studies where point estimates are the intended goal (more looking at broader regression models), and at least in body mass the only study I've seen that does include phylogenetic signal back into the data uses PVR. Sincerely, Russell On Thu, Jul 1, 2021 at 10:12 AM Cecile Ane wrote: > > > On Jun 30, 2021, at 11:20 PM, neovenatori...@gmail.com wrote: > > This is something that seems really, really concerning because if there is > a method of using phylogenetic covariance to adjust the position of new > data points it seems like a lot of workers don’t know these methods exist, > to the point that even published papers overlook it. > > > I am surprised that researchers don’t know about these methods: they are > basically just like ancestral state reconstruction, but for new tips in the > tree. Everyone knows about “ancestral state reconstruction”. But if few > people recognize that the same methodology can be applied to new taxa, then > it’s a problem indeed. > > Ancestral state “reconstruction” could be better called ancestral state > “estimation” or “prediction”. In a likelihood framework, we can also give > prediction intervals, for internal (ancestral) nodes in the tree as well > as new tips. Not all software handle new taxa, because it requires > extending the tree, or accepting a tree in which not all taxa have data. > Taxa with missing data can then be predicted, like any other node. > > Cecile. > > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Model Selection and PGLS
Dear All, What you see is the large uncertainty in “ancestral” states, which is part > of the intercept here. The linear relationship that you overlaid on top of > your data is the relationship predicted at the root of the tree (as if such > a thing existed!). There is a lot of uncertainty about the intercept, but > much less uncertainty in the slope. It looks like the slope is not affected > by the inclusion or exclusion of monotremes. (for one possible reference on > the greater precision in the slope versus the intercept, there’s this: > http://dx.doi.org/10.1214/13-AOS1105 for the BM). Yes, that sounds right from the other data I have. The line approximates what would be expected for the root of Mammalia, and the signal in the PGLS is more due to shifts in the y-intercept than shifts in slope, which in turn is supported by the anatomy of the proxy. My second cent is that the phylogenetic predictions should be stable. The > uncertainty in the intercept —and the large effect of including monotremes > on the intercept— should not affect predictions, so long as you know for > which species you want to make a prediction. If you want to make prediction > for a species in a small clade “far” from monotremes, say, then the > prediction is probably quite stable, even if you include monotremes: this > is because the phylogenetic prediction should use the phylogenetic > relationships for the species to be predicted. A prediction that uses the > linear relationship at the root and ignores the placement of the species > would be the worst-case scenario: for a mammal species with a completely > unknown placement within mammals. This is what I'm a bit confused about. I was always told (and it seemingly implies this in some of the PGLS literature I read like Rohlf 2011 and Smaers and Rohlf 2016) that it isn't possible to include phylogenetic data from the new data points into the prediction in order to improve predictions. I'm a little confused as to whether it's possible or not (see below). There’s probably a number of software that do phylogenetic prediction. I > know of Rphylopars and PhyloNetworks. I will take a look into those. I think that Cécile' and Theodore' point is important and too often > overlooked. Using GLS models, the BLUP (Best Linear Unbiased Prediction) is > not simply obtained from the fitted line but should incorporates > information from the (evolutionary here) model. There’s a way to impute phylogenetic signal back into a PGLS model? I am super surprised at that. I’ve talked to at least three different colleagues who use PGLS about this issue, and all of them had told me that there is no way to input phylogenetic signal back into the model for new data points and I should just go with the single regression line the model gives me (i.e., the regression line for the ancestral node). I tried looking around to see what previous researchers used when using PCM on body mass (Esteban-Trivigno and Köhler 2011, Campione and Evans 2012, Yapuncich 2017 thesis) and it looks like all of them just went with the best fit line with the ancestral node, i.e., looking at their reported results they give a simple trait~predictor equation that does not include phylogeny when calculating new data. Campion and Evans 2012 used PIC versus PGLS, which I know are technically equivalent but it doesn't seem like they included phylogenetic information when they predicted new data: they used their equations on dinosaurs but there are no dinosaurs in the tree they used. I know that it’s possible to incorporate phylogenetic signal into the new data using PVR but PVR has been criticized for other reasons. This is something that seems really, really concerning because if there is a method of using phylogenetic covariance to adjust the position of new data points it seems like a lot of workers don’t know these methods exist, to the point that even published papers overlook it. This was something I was hoping to highlight in a later paper on the data, but it sounds like people might have discussed it already. I remember talking with my colleagues a lot about "isn't there some way to incorporate phylogenetic information back into the model to improve accuracy of the prediction if we know where the taxon is positioned?" and they just thought there wasn't a way. Regarding the model comparison, I would simply avoid it (or limit it) by > fitting models flexible enough to accommodate between your BM and OLS case > and summarize the results obtained across all the trees… I am not entirely sure what is meant here. Do you mean fitting both an OLS and BM model and comparing both models? I am reporting both, but my concern is about which model I report is the best one to use going forward, since the BM model is seemingly less accurate (though I am just taking the fitted values from the PGLS model, which I don't think include phylogenetic information). The two models I use produce dramatically different
Re: [R-sig-phylo] Model Selection and PGLS
Dear Dr. Upham (and All), Please don't take my initial message the wrong way, this is not meant to be a dig at your 2019 study. I don’t think this is due to the birth-death tree specifically but would be present in any study where there are multiple phylogenetic trees to choose from or some measure of uncertainty in the tip dates. I definitely agree with you that there is almost certainly going to be variation in model support values if there is any difference in the underlying phylogeny, however, I was surprised that AIC would vary this much in a dataset where the trait data, number of tips, and branching topology used to compute the model are more or less constant between trees. My question is more along the lines of "given that it is logical to expect AIC to vary based on differences between trees, how would one go about determining which regression model is the "optimal" one to use for further analysis"? You mentioned taking the 95% confidence intervals of the models and seeing if they don't overlap, would this be just taking the singular AIC from the OLS model and comparing it to the PGLS one, since OLS seemingly doesn't produce a confidence interval of AIC values? And if the confidence intervals do overlap, is the OLS or PGLS considered the null hypothesis? In my case the AIC for OLS is within the 95% confidence intervals for the PGLS, but is much lower than the mean value (it's close to the lower first standard deviation of the AIC values). Sincerely, Russell On Mon, Jun 28, 2021 at 2:46 PM Nathan Upham wrote: > Hi Russell and all: > > I’ll respond here since the answer is related to the intended purpose of > the VertLife mammal trees — i.e, capturing full uncertainty in node ages > and phylogenetic relationships was one of the motivators for building the > mammal trees in the way we did. This approach contrasts to wanting to > obtain the single “best tree”, since methods of phylogenetic reconstruction > will always just be approximations of the “true tree” anyway rather than > ever being equal to that tree. To only use a single consensus tree in > comparative phylogenetic analyses assumes that we know the true tree, which > again, we don’t ever in an empirical context (only for simulations). Those > points were summarized well by Huelsenbeck et al. (2000: > http://science.sciencemag.org/content/288/5475/2349), but nevertheless > are still not standard practice in PCMs. > > To the point of AIC varying across the 100 trees, this is to be expected. > Any 1 tree of 100 trees from the credible set is not very meaningful; the > entire 100 trees need to be analyzed and then the estimate +/- SE from each > tree can be summarized as a distribution of values. If the 95% CI on the > distribution of values excludes your hypothesis, then you’ve learned > something; if not, you accept the null hypothesis. See the animated gifs > here (http://vertlife.org/data/mammals/) for a better conception of why > this phylogenetic uncertainty is important to consider when doing model > fitting or other PCMs. > > That said, if a single ‘best tree’ is the target, then the DNA-only MCC > tree of 4098 species is a reasonable thing to analyze, more analogous to > how mainstream phylogenetics has presented trees for re-use ( > https://github.com/n8upham/MamPhy_v1/blob/master/_DATA/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_NDexp_MCC_v2_target.tre). > But again, while the MCC tree is appropriate, 1 of 100 trees from the > credible set is not. > > Hope that helps. All the best, > —nate > > > > > == > Nathan S. Upham, Ph.D. (he/him) > Assistant Research Professor & Associate Curator of Mammals > Arizona State University, School of Life Sciences, Biodiversity Knowledge > Integration Center (BioKIC <https://biokic.asu.edu/>) > ~> Check out the new Mammal Tree of Life > <http://vertlife.org/data/mammals/> and the Mammal Diversity Database > <https://mammaldiversity.org/> > > Research Associate, Yale University (Ecology and Evolutionary Biology) > Research Associate, Field Museum of Natural History (Negaunee Integrative > Research Center) > Chair, Biodiversity Committee, American Society of Mammalogists > Taxonomy Advisor, IUCN/SSC Small Mammal Specialist Group > > personal web: n8u.org | Google Scholar > <https://scholar.google.com/citations?hl=en=zIn4NoUJ_op=list_works=AJsN-F6ybkfthmTdjTpow6sgMhWKn1EKcfNtmIF_wzZcev7yeHuEu5_aolFS85rWiVRHpiQgbwg43i6eS6kArrabLdFL4bntzUSRmlRP2CW4lbZqeEcColw> > | ASU profile <https://isearch.asu.edu/profile/3682356> > e: nathan.up...@asu.edu | Skype: nate_upham | Twitter: @n8_upham > <https://twitter.com/n8_upham> > > =
[R-sig-phylo] Model Selection and PGLS
Dear R-Sig-Phylo Mailing List, I ran into a rather unusual problem. I was doing an analysis using the mammal trees from Upham et al. (2019) downloaded off of the VertLife site. The model statistics for my data initially suggested that the OLS model was better supported than a PGLS model based on Akaike Information Criterion (AIC). The reviewers for the paper wanted me to add more taxa, so I re-downloaded a set of trees from VertLife and reran the analysis, but when I did I found that suddenly the AIC values for the PGLS equation were dramatically different, to the point that it favored a Brownian PGLS model over all other models. This was despite the fact that previously I found that an OLS model and an OU model had a better model fit than a Brownian model, and the other accuracy statistics of interest (like percent error, this being a model intended for use in predicting new data) also found OLS and OU models to fit better than a Brownian PGLS model. The regression line for a Brownian model doesn't even fit the data at all due to being biased by a basal clade. The model also has a high amount of phylogenetic inertia which again would seemingly make an OU model a better option. I used drop.tip to remove the additional taxa to see if I could replicate my previous results, but it turns out I still couldn't replicate the results. That's when I realized what was causing the change in AIC values wasn't the taxon selection, but the tree I was using. If I used the old VertLife tree I could replicate the results, but the new VertLife tree produced radically different results despite using the same tips. So what I decided to do is rerun the analysis for all 100 trees I had available, and it turned out there was a massive amount of variation in AIC depending on what tree was chosen. I tried including an html data printout to show the precise results and how I got them, but I couldn't attach them because the mailer daemon kept saying they were too large. The AIC values between trees vary by almost 200 points after excluding extreme outliers, when model differences of 2 or more are often considered to represent statistically detectable differences. The unusually low AIC I got when I first ran the analysis happened to be because the first tree in the 100 trees merely happened to produce a lower-than-average AIC than the whole sample. The average AIC out of the 100 trees was higher than for the OLS model, which again makes sense given the distribution of the data. However, and this is where my problem comes in, how do I make appropriate model selections for PGLS if there is such a massive amount of variation in AIC? Especially given that between the trees in the sample there is enough variation that it can cause one model to be favored over another? Just picking one tree and going with that seems counterintuitive, because it's not very objective and theoretically someone could pick a specific tree to get the results they want, or accidentally pick a tree that might support the wrong model as seen here. On top of that the tree topologies are more or less identical: the same 404 taxa are present in all trees and the trees have nearly identical topologies, the only real differences between trees are branch lengths. But given this, how can I justify which AIC value I report, which in turn means which model is best supported? I did try looking at the phylo_lm function in the sensiphy package, but that function doesn't seem to provide any method of performing model selection between different regression models. It does seemingly report AIC, but the AIC the function reported was dramatically different from the aic I got using the gls function in ape and nlme. Sincerely, Russell [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object
Dear Everyone, Sorry for the late reply. I had a crunch with several manuscripts that had a looming due date, and am coming back to trying to fix the issue I am having with the phylogeny I emailed about before. I've been trying several things but I've still been unable to resolve the issue with the error message "Error in bind.tree(tree, tip, where = where, position = pp) : 'position' is larger than the branch length" despite the fact that the length of the terminal sister taxon to its nearest node is much greater than the new branch length (i.e., there should be plenty of length to attach a taxon). In many of the cases where I am getting an error I am attaching a sister taxon to a terminal taxon, so it doesn't seem like a lack of monophyly across the trees is the problem. The only thing I could think of is trying to manually adjust the next node up to give it enough room to attach the new terminal taxon, but I am unsure what the easiest way to do that in R would be. I've never tried to mess with branch lengths in a tree before. With regards to what Dr. Miller said, I agree that birth-death trees seem to suggest much younger divergence times than other studies, however I am more struggling with the practical side of things. What I am trying to do with this dataset is include enough taxa to perform a PGLS analysis, and all I need is a phylogenetic tree to use as a backbone. The Upham et al. 2019 trees are about the only easily accessible trees I have been able to find for Mammalia, I know there was an older one that Dr. Upham suggested that was published in 2014 but I don't recall if that one was easily available. Sincerely, Russell On Fri, Jun 4, 2021 at 3:27 PM Eliot Miller wrote: > Russell, > > If you're trying to do this across a lot of species, you're going to bump > into a lot of complexities like this. I got the impression you were trying > to do this for many species. For example, there's no reason that the taxon > you're adding is, even if we assume monophyly of the clade (genus in this > example) in question, nested somewhere in the crown clade. It might be > sister to the clade. So you should theoretically be considering that node > as a possible place to add the taxon. Adding the taxon in a polytomy at the > MRCA and then randomly resolving that should do that (I assume, though do > those operations also consider moving stemwards?). You can also randomly > choose a random valid node that doesn't render the clade polyphyletic, > split on that node and use a birth-death model, or any number of other > rules (midpoint, polytomy, uniform distribution) to decide how to divvy up > branch lengths and keep the tree ultrametric. This is what's happening with > addTaxa. Also, if you hit a situation where you're trying to add a taxon to > a set of taxa that are not actually monophyletic, you'll get weird results > with your current approach. If that's not your situation, no problems > there, but you ought to check (or just use addTaxa which will make sure it > doesn't make the clade any "more" polyphyletic, whatever that means). FWIW, > I'm leery of using birth-death models to add lots of missing species > because I think it biases the resulting tree towards increasing > diversification in the present. At least it does with addTaxa. This might > be a programming error on my part, but I actually suspect it's a systematic > bias introduced by problems described in recent papers by Stilianos Louca > and Matthew Pennell, e.g. > https://www.sciencedirect.com/science/article/abs/pii/S0960982221006138?dgcid=coauthor > Figuring out if incorporating new birth-death estimators (that don't cap > extinction at 0) could resolve some of that bias was on my to-do list for > addTaxa, and is why I've never gotten around to more formally publishing > tests of it. Also, I'm fairly sure Jonathan Chang et al. wrote a relevant R > package here which also automates most of this, and used it for a big fish > phylogeny. That'd be worth a look if you're trying to do this across > hundreds or thousands of species. > > Eliot > > On Fri, Jun 4, 2021 at 2:59 PM Russell Engelman > wrote: > >> Dear Everyone, >> >> So this is really strange. I tried what Dr. Revell suggested, and for the >> two examples I gave it worked for *Cercopithecus albogularis*, but when >> I applied it to *Cricetomys ansorgei *I got the message... >> >> "Error in bind.tree(tree, tip, where = where, position = pp) : >> 'position' is larger than the branch length" >> >> I know that usually this means the new node position is usually further >> back in time than the point where the branch diverges from its nearest >> neighbor. However when I went to check how long the terminal branch leading >> to *C. ansorgei*'s
[R-sig-phylo] Degrees of Freedom in PGLS
Dear R-Sig-Phylo Group, I had a question regarding the calculation of regression statistics using PGLS. When I obtain the degrees of freedom from a PGLS object using ape in R, I notice that it returns the same number of degrees of freedom as under OLS, and the AIC of a bivariate function is calculated under the assumption that k=2 (i.e., that the same number of degrees of freedom are used as in the OLS). Does anyone know why this is? I would have thought that a PGLS would have eaten up a greater number of degrees of freedom than OLS, if for no other reason than it constrains the motion of the system by adding phylogenetic covariation in as an additional variable, and thus would be expected to eat up at least one additional degree of freedom in the same way that adding additional categorical or quantitative variables would? Sincerely, Russell [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object
mb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=10u8o7U8dqC1pyK6GteNycoSACTOXAO2VaHNJmTeddk%3Dreserved=0 > >> Hope that helps, > >> —nate > >> > >> Nathan S. Upham, Ph.D. (he/him) > >> Assistant Research Professor & Associate Curator of Mammals > >> Arizona State University, School of Life Sciences > >> ~> Check out the new Mammal Tree of Life < > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvertlife.org%2Fdata%2Fmammals%2Fdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DbfdSFyjoCtDqUWuVoRs%2BC8Toe3v2BuFHgIpBTim9M0%3Dreserved=0> > and the Mammal Diversity Database < > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmammaldiversity.org%2Fdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=sYqcCTEhWozvCAIC1vdxbBf5qeJq%2FBsV54zDRuOKABE%3Dreserved=0 > > > >> Research Associate, Yale University (Ecology and Evolutionary Biology) > >> Research Associate, Field Museum of Natural History (Negaunee > Integrative Research Center) > >> Chair, Biodiversity Committee, American Society of Mammalogists > >> Taxonomy Advisor, IUCN/SSC Small Mammal Specialist Group > >> personal web: n8u.org | Google Scholar < > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.google.com%2Fcitations%3Fhl%3Den%26user%3DzIn4NoUJ%26view_op%3Dlist_works%26gmla%3DAJsN-F6ybkfthmTdjTpow6sgMhWKn1EKcfNtmIF_wzZcev7yeHuEu5_aolFS85rWiVRHpiQgbwg43i6eS6kArrabLdFL4bntzUSRmlRP2CW4lbZqeEcColwdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=sa9VBUNzHzzPUecVRmknFoa36gEQA7hySS7uFuxXh9o%3Dreserved=0> > | ASU profile < > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fisearch.asu.edu%2Fprofile%2F3682356data=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=D9p3TXN7AdorpSqBqnydAvKNoPXOGZ0wR7ESw8bB4H0%3Dreserved=0 > > > >> e: nathan.up...@asu.edu | Skype: nate_upham | Twitter: @n8_upham < > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fn8_uphamdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674848184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=OM3Q8KIzJ%2BpAvHy6Z5B74%2FD%2Bsy7LkDGVCSAotBNwjnU%3Dreserved=0 > > > >> > >>> On Jun 2, 2021, at 4:19 PM, Eliot Miller > wrote: > >>> > >>> Hi Russell, > >>> > >>> A package I wrote a while back should be able to do that fairly easily. > >>> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2Feliotmiller%2FaddTaxa__%3B!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQrzxl2aIw%24data=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674848184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=kgDYJXKFXRFIM8mv1x8fViY6QDiEjTQ8qP5mQ0SUq3k%3Dreserved=0 > The only paper it's described in > >>> remains > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fbsapubs.onlinelibrary.wiley.com%2Fdoi%2Ffull%2F10.3732%2Fajb.1500195__%3B!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQp7PnRRHg%24data=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674848184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DmHEuwIvxP0uiRhdBPZ%2BuUWUoVpIQ5PW8Xfrjyv9Nyg%3Dreserved=0 > >>> It's a wrapper for bind.tip
Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object
mb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=10u8o7U8dqC1pyK6GteNycoSACTOXAO2VaHNJmTeddk%3Dreserved=0 > >> Hope that helps, > >> —nate > >> > >> Nathan S. Upham, Ph.D. (he/him) > >> Assistant Research Professor & Associate Curator of Mammals > >> Arizona State University, School of Life Sciences > >> ~> Check out the new Mammal Tree of Life < > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvertlife.org%2Fdata%2Fmammals%2Fdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DbfdSFyjoCtDqUWuVoRs%2BC8Toe3v2BuFHgIpBTim9M0%3Dreserved=0> > and the Mammal Diversity Database < > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmammaldiversity.org%2Fdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=sYqcCTEhWozvCAIC1vdxbBf5qeJq%2FBsV54zDRuOKABE%3Dreserved=0 > > > >> Research Associate, Yale University (Ecology and Evolutionary Biology) > >> Research Associate, Field Museum of Natural History (Negaunee > Integrative Research Center) > >> Chair, Biodiversity Committee, American Society of Mammalogists > >> Taxonomy Advisor, IUCN/SSC Small Mammal Specialist Group > >> personal web: n8u.org | Google Scholar < > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fscholar.google.com%2Fcitations%3Fhl%3Den%26user%3DzIn4NoUJ%26view_op%3Dlist_works%26gmla%3DAJsN-F6ybkfthmTdjTpow6sgMhWKn1EKcfNtmIF_wzZcev7yeHuEu5_aolFS85rWiVRHpiQgbwg43i6eS6kArrabLdFL4bntzUSRmlRP2CW4lbZqeEcColwdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=sa9VBUNzHzzPUecVRmknFoa36gEQA7hySS7uFuxXh9o%3Dreserved=0> > | ASU profile < > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fisearch.asu.edu%2Fprofile%2F3682356data=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=D9p3TXN7AdorpSqBqnydAvKNoPXOGZ0wR7ESw8bB4H0%3Dreserved=0 > > > >> e: nathan.up...@asu.edu | Skype: nate_upham | Twitter: @n8_upham < > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fn8_uphamdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674848184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=OM3Q8KIzJ%2BpAvHy6Z5B74%2FD%2Bsy7LkDGVCSAotBNwjnU%3Dreserved=0 > > > >> > >>> On Jun 2, 2021, at 4:19 PM, Eliot Miller > wrote: > >>> > >>> Hi Russell, > >>> > >>> A package I wrote a while back should be able to do that fairly easily. > >>> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2Feliotmiller%2FaddTaxa__%3B!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQrzxl2aIw%24data=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674848184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=kgDYJXKFXRFIM8mv1x8fViY6QDiEjTQ8qP5mQ0SUq3k%3Dreserved=0 > The only paper it's described in > >>> remains > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fbsapubs.onlinelibrary.wiley.com%2Fdoi%2Ffull%2F10.3732%2Fajb.1500195__%3B!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQp7PnRRHg%24data=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674848184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=DmHEuwIvxP0uiRhdBPZ%2BuUWUoVpIQ5PW8Xfrjyv9Nyg%3Dreserved=0 > >>> It's a wrapper for bind.tip
[R-sig-phylo] Using bind.tip and lapply on a multiPhylo object
Dear R-sig-phylo, I have been working with a mammalian phylogeny I recently downloaded from VertLife (http://vertlife.org/phylosubsets/). Unfortunately, the phylogeny is missing a large number of species, so I am trying to manually add these taxa to the phylogeny. I have a series of 100 trees that I am using to do things such as test for phylogenetic signal. I know how to use bind.tip to add new taxa to a single tree, but I am having more trouble with a multiPhylo object. I am primarily adding these taxa by placing them as sister to their nearest included relative (since most of them are elevated former subspecies), but the issue here is that in the 100 trees in the multiPhylo object the node representing the taxon to bind these taxa to is not the same across all trees due to shifting topologies. This is an example of the code I have been using, in which "tree" is the tree object. This works for a single 'phylo' tree but not 'multiphylo'. ``` newtree<-lapply(tree,bind.tip,tip.label="Cercopithecus_albogularis", position=0.59,edge.length = 0.59, where=mrca(tree)["Cercopithecus_mitis","Cercopithecus_mitis"]) ``` Now, this code will not work, but I know exactly why: 'tree' is a multiPhylo object and so the 'where' argument cannot find the node for the terminal taxon. However, the issue is how can I tell R to repeat this 'where' argument for each of the 100 trees, since the node in question is not identical across these trees? Is there an easier way to do this than using the 'mrca' call for each terminal taxon? I've noticed adding a 'mrca' argument also increases computation time and if I am reinventing the wheel it would be nice to know if I am overthinking things. Sincerely, Russell [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Changing the color of text and nodes in arc.cladelabels in phytools
Dear Dr. Revell, I tried par(fg="green") and it kind of worked. It changed the curved labels and their associated text to a color, but it only changed the outline of the dots denoting marked nodes. Additionally, if I tried to input a vector of colors equal to my number of groups (e.g., par(fg=c("red","green","blue","cyan","orange")), I get the error ""fg1" is not a graphical parameter". The same results are produced if I try par(col="green"). The command par(col.lab="green") just turns the entire background of the figure green. I also tried part(fill="green"), but got the warning message "In par(fill = "green") : "fill" is not a graphical parameter". The command par(bg="green') does nothing. Sincerely, Russell On Mon, Apr 5, 2021 at 11:11 AM Liam J. Revell wrote: > Dear Russell. > > I'm sorry that arc.cladelabels does not have the functionality that you > want. > > Have you tried setting par(fg)? E.g.: > > par(fg="the color you want") > arc.cladelabels(...) > par(fg="black") > > You could also try par(col) or par(col.lab). > > Let me know if that works, if not I can see about adding the option you > need to the function. > > All the best, Liam > > Liam J. Revell > University of Massachusetts Boston [Assoc. Prof.] > Universidad Católica de la Ssma Concepción [Adj. Res.] > > Web & phytools: > http://faculty.umb.edu/liam.revell/, http://www.phytools.org, > http://blog.phytools.org > > Academic Director UMass Boston Chile Abroad: > https://www.umb.edu/academics/caps/international/biology_chile > > U.S. COVID-19 explorer web application: > https://covid19-explorer.org/ > > On 4/4/2021 10:36 PM, Russell Engelman wrote: > > EXTERNAL SENDER > > > > Dear R-sig-phylo, > > > > I am a researcher who has been trying to plot a continuous trait map > onto a > > phylogeny using the phytools and ape packages in R. I have been trying to > > denote specific clades using the arc.cladelabels function in phytools, > but > > I have also been trying to color-code the text, curve, and node for each > of > > the groups to make it easier to see which branches pertain to which > clade. > > However, I have been having a lot of trouble adjusting the properties of > > the text and nodes created by arc.cladetools. > > > > Here is an example using the *Anolis *svl data that shows what I mean: > > > > ``` > > library(ape) > > library(phytools) > > anole.tree<-read.tree(" > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.phytools.org%2Feqg2015%2Fdata%2Fanole.tredata=04%7C01%7Cliam.revell%40umb.edu%7C017b93bf1ab74239149608d8f84157ed%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637532306822786314%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=NN6WMsldnLYZtuXWpndNA6ORhejdZbJdqDuNeI6PsY8%3Dreserved=0 > ") > > svl <- read.csv(" > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.phytools.org%2Feqg2015%2Fdata%2Fsvl.csvdata=04%7C01%7Cliam.revell%40umb.edu%7C017b93bf1ab74239149608d8f84157ed%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637532306822796308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=EA2P63ohsozCGtfU9hpg9k319zu168X%2BfmCZlOJ4cX4%3Dreserved=0 > ", > > row.names=1) > > svl <- as.matrix(svl)[,1] > > fit <- fastAnc(anole.tree,svl,vars=TRUE,CI=TRUE) > > obj<-contMap(anole.tree,svl,plot=FALSE) > > nodes<-c(mrca(anole.tree)["equestris","noblei"], > > mrca(anole.tree)["barahonae","barbatus"]) > > labels<-c("clade1","clade2") > > plot(obj,type="fan",cex=5,fsize=0.5,lwd=5) > > for(i in 1:length(nodes)) > >arc.cladelabels(text = labels[i],node = nodes[i],mark.node = TRUE, > > ln.offset = > > 1.39, col = c("green","red")[i], lab.offset = > > 1.46, fsize = .5, orientation = "curved") > > ``` > > > > None of the calls I have tried for the arc.cladelabels function seem to > be > > able to adjust the color of the text labels the function creates (e.g., > > text.col, col.text, col.lab, etc), nor the color or size of the nodes > when > > mark.node = TRUE. The "col" call only affects the line drawn. Examining > the > > base code for arc.cladelabels finds that the color of the nodes is always > > set to be red, and I have been unable to
[R-sig-phylo] Changing the color of text and nodes in arc.cladelabels in phytools
Dear R-sig-phylo, I am a researcher who has been trying to plot a continuous trait map onto a phylogeny using the phytools and ape packages in R. I have been trying to denote specific clades using the arc.cladelabels function in phytools, but I have also been trying to color-code the text, curve, and node for each of the groups to make it easier to see which branches pertain to which clade. However, I have been having a lot of trouble adjusting the properties of the text and nodes created by arc.cladetools. Here is an example using the *Anolis *svl data that shows what I mean: ``` library(ape) library(phytools) anole.tree<-read.tree("http://www.phytools.org/eqg2015/data/anole.tre;) svl <- read.csv("http://www.phytools.org/eqg2015/data/svl.csv;, row.names=1) svl <- as.matrix(svl)[,1] fit <- fastAnc(anole.tree,svl,vars=TRUE,CI=TRUE) obj<-contMap(anole.tree,svl,plot=FALSE) nodes<-c(mrca(anole.tree)["equestris","noblei"], mrca(anole.tree)["barahonae","barbatus"]) labels<-c("clade1","clade2") plot(obj,type="fan",cex=5,fsize=0.5,lwd=5) for(i in 1:length(nodes)) arc.cladelabels(text = labels[i],node = nodes[i],mark.node = TRUE, ln.offset = 1.39, col = c("green","red")[i], lab.offset = 1.46, fsize = .5, orientation = "curved") ``` None of the calls I have tried for the arc.cladelabels function seem to be able to adjust the color of the text labels the function creates (e.g., text.col, col.text, col.lab, etc), nor the color or size of the nodes when mark.node = TRUE. The "col" call only affects the line drawn. Examining the base code for arc.cladelabels finds that the color of the nodes is always set to be red, and I have been unable to figure out how to alter it. I could always write a custom function that copies arc.cladelabels but alters the color function, but my concern is that this is part of an RMarkdown document and as a result any alteration I make to the arc.cladelabels function will not show up when another person knits the data (as it will call on their copy of phytools). Alternatively I am wondering if it is possible to export the tree to ggtree and add the nodes there, but I have noticed that ggtree does not allow for arced clade labels and for some reason it also greatly extends the margins of the plot beyond the margins of the graph when its equivalent of arc.cladelabels (geom_cladelabel) is called. I was wondering if anyone else here knew any potential ways I could adjust the color and size of the text and nodes for this function. Sincerely, Russell [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/