Re: [R-sig-phylo] collapsing sets of nodes based on label values
You might need to reindex each time, something like (pseudocode): bad.nodes <- function_to_get_node_numbers_with_support_less_than_90(phy) while(length(bad.nodes>0)) { phy <- collapse.to.star(phy, bad.nodes[1]) bad.nodes <- function_to_get_node_numbers_with_support_less_than_90(phy) } But this will collapse the bad node and all its descendants. I don't see a function that just collapses at a node and leaves descendants intact, but this may be what you're looking for (the stuff that ape::collapse.singles does is similar, and could perhaps be modified). One pedantic point -- we often talk about node support, collapsing nodes, etc., but it's really about the branch: the node doesn't have 83% bootstrap support but the bipartition created by the edge subtending it does. For example, if the tree is (A,(B,(C,(D,E, if the support for the CDE clade is 83%, it's actually the proportion of trees that have a split between AB | CDE, so the branch separating the CDE clade from everything else. The node itself there has one branch coming in (that comes from the rest of the tree containing A and B) and two branches coming out (one going to C, one going to DE). We could summarize trees by actual node support: how many times we have a node that has an edge going to AB, an edge going to C, and an edge going to DE, but we typically don't. [For example, the branch could separate AB | CDE, and the node is AB | C | DE, but one could also have a node that is AB | CD | E for the same bipartition on the subtending edge]. I only bring it up here because if you end up having to write code to do collapsing, it can be important to remember that what is actually being collapsed is the edge subtending the labeled node because it is the the edge with low support. Best, Brian ___ Brian O'Meara Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville He/Him/His On Fri, May 22, 2020 at 9:42 AM Karla Shikev wrote: > Dear all, > > I'm surprised I could not find a simple function to do this in any of the > packages I know. I just want to take a tree with node labels (e.g. > bootstrap values from a RAxML analysis) and collapse all nodes with support > values below, say, 90%. I tried to do this recursively using > collapse.to.star in phytools, but node numbers change after the first > collapsed node and my indexing gets completely off. > > with best regards, > > Karla > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Alllowing and considering "missing data" as values for reconstruction of ancestral character states
We corresponded with Carolina offline, but here I'm including a generalized solution of the quick solution we just sent her in case it's useful for others (avoiding the issue lampooned in https://xkcd.com/979/). The data was a delimited file with a species column and columns for each character, with column labels "species" and then the character numbers. So, loaded into R after calling the relevant libraries: library(ape) library(corHMM) library(geiger) phy <- read.nexus("cladeb_tree.nex") data <- read.delim("cladeb_sexch.txt", stringsAsFactors = FALSE) corHMM can handle polymorphic data, but it needs to be in format 0&3, not 0+3, which was how it was coded initially. So change that: for (i in 2:ncol(data)) { data[,i] <- gsub('\\+', '&', data[,i]) } Then reconstruct ancestral states and show the output in a pdf. Loop over each character, deleting taxa from the matrix that have an NA and deleting taxa from the matrix and from the tree that aren't on both: trait_indices <- c(2:ncol(data)) pdf(file="CarolinaPlots.pdf") for (i in seq_along(trait_indices)) { col_num <- trait_indices[i] data_local <- data[,c(1,col_num)] data_local <- data_local[!is.na(data_local[,2]),] data_local <- data_local[data_local$species%in%phy$tip.label,] bad_tips <- phy$tip.label[!phy$tip.label%in%data_local$species] phy_local <- phy if(length(bad_tips)>0) { phy_local <- ape::drop.tip(phy, bad_tips) } recon <- rayDISC(phy_local, data_local, model="ARD") plotRECON(recon$phy,recon$states,title=paste0(colnames(data)[col_num], ": ", paste(names(table(data_local[,2])), "=", table(data_local[,2]), collapse=", "))) } dev.off() Hopefully a few general tips there. Slightly better style would be to preface most functions with the package name: use corHMM::rayDISC() rather than rayDISC -- this is becoming more expected in packages. Best, Brian ___ Brian O'Meara Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville He/Him/His On Mon, Apr 13, 2020 at 4:18 PM Carolina Santos Vieira < carolsantosvie...@gmail.com> wrote: > Hi Liam and everyone > > The numbers from this message "1000 trees with a mapped discrete character > with states: 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, > 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55" are the same numbers from > the nodes on my tree. As I replied to Brian, I assumed that this was a list > of which states were mapped during make.simmap simulation. This warning > message only returns when applying the ARD model. > This character in particular is binary, and has no missing data or > non-applicable coding. > > As I mentioned previously, a big part of my dataset contains polymorphic > and non-applicable coding, and characters with 3 states (0,1,2). For those, > when the script runs, the likelihoods are very high, and plots don't agree > with coding. > > Thank you so much for your replies. > Best, > Carolina Vieira > Bióloga - Mestre em Ecologia e Conservação > Doutoranda em Biologia Animal (PPGBAN - UFRGS) > > > > <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail> > Livre > de vírus. www.avast.com > <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>. > > <#m_2399085183096779712_m_1993609250326834374_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > Em seg., 13 de abr. de 2020 às 16:57, Liam J. Revell > escreveu: > >> Hi Carolina. >> >> I agree with Brian that a Q matrix in which the elements of a row are >> all zeros is not likely to be you problem. It just means that the ML >> estimated transition rate from 1->0, in your case, is zero. >> >> I agree with Brian that the message "1000 trees with a mapped discrete >> character with states: 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, >> 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55" is >> surprising. Does your discrete character really have 27 states? >> >> All the best, Liam >> >> Liam J. Revell >> Associate Professor, University of Massachusetts Boston >> Profesor Asistente, Universidad Católica de la Ssma Concepción >> web: http://faculty.umb.edu/liam.revell/, http://www.phytools.org >> >> Academic Director UMass Boston Chile Abroad: >> https://www.umb.edu/academics/caps/international/biology_chile >> >> On 4/13/2020 3:23 PM, Brian O'Meara wrote: >> > **[EXTERNAL SENDER] >> > >> > With raydisc, you need one column with th
Re: [R-sig-phylo] Alllowing and considering "missing data" as values for reconstruction of ancestral character states
With raydisc, you need one column with the species name and a column with the data; so something like data[,c(1,2)] might be needed. If you want, you could email the files (tree, data, script to run) to Jeremy ( jmbea...@uark.edu) and me to check out. For the simmap results, a rate of zero can be the estimate if it seems that there are no transitions from state 0 to 1. But the number of discrete states at the end ("1000 trees with a mapped discrete character with states: 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55") seems unexpected to me. Best, Brian _______ Brian O'Meara Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville He/Him/His On Mon, Apr 13, 2020 at 2:32 PM Carolina Santos Vieira < carolsantosvie...@gmail.com> wrote: > Hi Liam and all users > > My morphological data varies from binary and multisate characters, > including polymorphism and non-applicable coding for many of those > characters. I've made a lot of progress after Liam's advices, but lately I > have been getting an error from make.simmap. > > By trying to run a simulation with make.simmap under ARD model on > characters with states that the probability is zero on occuring in a given > terminal, after calling summary(), this error is displayed: > > Error in colMeans(x$count) : 'x' must be an array of at least two > dimensions > > When changing the input of the same data in a two column array, the error > still persists. The same data was analyzed with make.simmap under "ER" and > "SYM" models, and they run just fine. I am also comparing the results using > fitDiscrete (geiger) and ace (ape) functions and they display results for > "ARD" model without a glitch. According to output from fitDiscrete > function, ARD model is indicated as the "best" model comparing AIC > likelihoods. > I've noticed that in this particular character, the "Q" matrix extracted > from make.simmap simmulation under "ARD" is the following: > > Q = > 01 > 0 -1.916782 1.916782 > 1 0.00 0.00 > > Are those "zero" probabilities the cause of the error? And by that I mean, > should I take from this evidence that "ARD" is not an appropriate model to > this character? > > Also, results from characters with "non-applicable" coding (NA in matrix) > for some terminal are coming out very different from what I would expect. > Is this coding allowed for ancestral reconstruction? > I am sorry for the many questions, and I would be very glad if any > suggestions come to mind. > > Best wishes, > Carolina Vieira. > > - > This is the error rundown: > > > anc.tree_ard<-make.simmap(tree,ch434,model="ARD") > make.simmap is sampling character histories conditioned on the transition > matrix > > Q = > 01 > 0 -1.916782 1.916782 > 1 0.00 0.00 > (estimated using likelihood); > and (mean) root node prior probabilities > pi = > 0 1 > 0.5 0.5 > Done. > > plot(anc.tree_ard,node.numbers=T,cores,fsize=0.5,ftype="i") > > add.simmap.legend(colors=cores,prompt=FALSE,x=0.5*par()$usr[1], > + y=-30*par()$usr[3],fsize=0.8) > > title(main="ACSR \"ARD\" model (make.simmap)") > > anc.tree_ard<-make.simmap(tree,ch434,model="ARD",nsim=1000) > make.simmap is sampling character histories conditioned on the transition > matrix > > Q = > 01 > 0 -1.916782 1.916782 > 1 0.00 0.00 > (estimated using likelihood); > and (mean) root node prior probabilities > pi = > 0 1 > 0.5 0.5 > Done. > > anc.tree_ard > 1000 phylogenetic trees with mapped discrete characters > > simmap_ard<-summary(anc.tree_ard) > > simmap_ard > 1000 trees with a mapped discrete character with states: > 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, > 47, 48, 49, 50, 51, 52, 53, 54, 55 > > Error in colMeans(x$count) : 'x' must be an array of at least two > dimensions > > > > Carolina Vieira > Bióloga - Mestre em Ecologia e Conservação > Doutoranda em Biologia Animal (PPGBAN - UFRGS) > > > < > https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail > > > Livre > de vírus. www.avast.com > < > https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail > >. > <#D
Re: [R-sig-phylo] ancestral state transitions clustered in time?
You can use something like corHMM (https://doi.org/10.1093/sysbio/syt034) and perhaps its rayDISC function to find parts of the tree where the transition rates between states are different (for your question, where rates might be higher). Basically, it allows a hidden "trait" (which isn't necessarily a single trait, just some thing or things that change on the tree) to affect the transition rates. A good example is https://academic.oup.com/view-large/figure/115233499/syt034f3.jpeg -- you can see where transition rates are higher. The nice thing (though note my conflict of interest, as a coauthor) is that it allows the rates to actually change over the tree; different approaches that use a single rate matrix over the whole tree and then reconstruct changes (using things such as stochastic character mapping) are essentially using a model to infer things in order to find deviations from a model. One downside of the corHMM model is that the hidden changes evolve under a continuous time Markov model -- fine if you think whatever factors affect character rate (biogeographic changes, presence of certain other species, new morphological traits, etc.) evolve in this way (which is true for nearly all of them) but not if a single factor appears suddenly (i.e., radical change in environment for all species once an asteroid his the planet). CorHMM might have a timeslice function to allow this, but I'm not sure. If you have a lot of characters, a different approach could be to estimate branch lengths under something like a Lewis MKV model, which basically gives you amount of change averaged across all characters, and then compare the ratio of those branch lengths to the chronogram branch lengths for those same edges. Those branches with higher than the average ratio have more character change per unit of time than other branches. There's probably a paper or papers that do this, but I don't recall any at the moment, my apologies. If the question is about two rate regimes (i.e., before or after a massive event) or gradual change of rates, you could do tree transformations: use Geiger's tree transformation function with discrete characters to find the transformations that best fit the data -- maybe more change earlier than later, a discrete shift in rate 30 MY ago, etc. Some standard caveats on these and other potential solutions: ancestral state reconstruction is very hard to do well: N tips have N-1 ancestral states to estimate, plus rate parameters. That's a lot, which is why methods that fit overall rates could be better than reconstructing many changes. But it's possible to over-interpret these, too: you can in theory from corHMM get an estimate of the support for every hidden state and rates for every node, but I wouldn't read a lot into every single node. Also, there's still the challenge of Maddison & FitzJohn (2015): https://doi.org/10.1093/sysbio/syu070. It still scares me, personally, and the issues raised affect our interpretation of many methods that look at rates, such as the ones here. Best, Brian _______ Brian O'Meara Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville He/Him/His On Thu, Apr 9, 2020 at 2:55 PM Elizabeth Miller wrote: > I am interested in testing whether ancestral state changes (e.g. habitat > transitions) are clustered in time and/or in specific lineages rather than > randomly distributed on phylogeny. Does a method already exist for testing > this? I am aware of BayesTraits for testing correlation among states, but > not a method for clustered distribution of state changes with time. What > immediately comes to mind is simulating an expected temporal distribution > of state changes if changes were random and comparing to observed temporal > distribution, but I just want to check if something more sophisticated > exists. > > Thank you! > > -- > Elizabeth Miller, Ph.D. > NSF Postdoctoral Research Fellow > School of Aquatic and Fishery Sciences > University of Washington > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Phylogenetic varying slopes and intercepts model
Also worth noting is that people often use taxonomic levels if they don't have a tree. Trees are more available than might be expected (though still far less available than they should be), so you can get a tree and use phylogenetic comparative methods. Sources of trees: https://tree.opentreeoflife.org/ https://www.treebase.org/ https://phylo.cs.nmsu.edu/ http://datelife.org/ (though the datelife R package <https://github.com/phylotastic/datelife> may be more useful, depending on scale) https://datadryad.org/ Plus people make individual trees available: large trees for birds, sharks and rays, plants, etc. that may not be deposited into a formal repository but are on a website, plus trees stored as supplemental info in papers. So if you're using taxonomy as a predictor because there doesn't seem to be a good tree, there might actually be one with a bit of digging. Best, Brian ___ Brian O'Meara, http://brianomeara.info, especially Calendar <http://brianomeara.info/calendar.html>, CV <http://brianomeara.info/cv.html>, and Feedback <http://brianomeara.info/feedback.html> Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville He/Him/His On Wed, Sep 11, 2019 at 1:33 AM Joe Felsenstein wrote: > Oscar Inostraza -- > > Is the variable x also evolving on the tree? If so you need to use > standard phylogenetically-informed comparative methods to estimate the > variances and covariances of changes in both characters. > > You may not be able to assume that y responds instantly to x. > > J.F. > > Joe Felsenstein, j...@gs.washington.edu > Department of Genome Sciences, University of Washington, Box 355065, > Seattle, WA. 98195-5065 USA > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] model averaging using brownie.lite
I bet the first tree has more meaningful data: the pectinate tree with those branch lengths will likely have all but one of the taxa having nearly identical states. So I think tree shape and branch lengths should matter for this. I've done some preliminary analyses on this, building on Beaulieu et al. (2018) and Jhwueng et al. (2014, https://doi.org/10.1515/sagmb-2013-0048, also note COI), but nothing definitive yet. It's also worth looking at Ho and Ané (2014, https://doi.org/10./2041-210X.12285) who talk about AIC in the context of OU shifts, but who get into sample size with shifts in a modified BIC that uses taxa in different regimes as sample size (but again, univariate, so maybe it's actually matrix size). I also probably am missing important work by others -- my apologies if so. If you know of any, please let me know (and probably Karla, too!). So, in summary yeah, what Liam said: number of taxa, but it might be more complex. Best, Brian ___ Brian O'Meara, http://brianomeara.info, especially Calendar <http://brianomeara.info/calendar.html>, CV <http://brianomeara.info/cv.html>, and Feedback <http://brianomeara.info/feedback.html> Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville He/Him/His On Thu, Sep 5, 2019 at 10:00 AM Liam Revell wrote: > Dear Karla. > > In my opinion, it is probably correct to use the number of tips on the > tree as the sample size for AICc when estimating the Brownian rate: as > the number of independent pieces of information is n-1, just like with > an ordinary variance. For other parameters in phylogenetic comparative > analyses, the effective sample size may be different, however. > > All the best, Liam > > Liam J. Revell > Associate Professor, University of Massachusetts Boston > Profesor Asistente, Universidad Católica de la Ssma Concepción > web: http://faculty.umb.edu/liam.revell/, http://www.phytools.org > > Academic Director UMass Boston Chile Abroad (starting 2019): > https://www.umb.edu/academics/caps/international/biology_chile > > On 9/5/2019 9:49 AM, Karla Shikev wrote: > > [EXTERNAL SENDER] > > > > Thanks so much, Liam! Just one quick follow-up question: what do you > > suggest should be the sample size for transforming AIC into AICc? the > > number of tips on the tree? > > > > Karla > > > > On Thu, Sep 5, 2019 at 10:27 AM Liam Revell wrote: > > > >> Dear Karla. > >> > >> You could try & create your own logLik method for the object class > >> "brownie.lite" as follows: > >> > >> ## method > >> logLik.brownie.lite<-function(object,...){ > >> lik<-setNames( > >> c(object$logL1,object$logL.multiple), > >> c("single-rate","multi-rate")) > >> attr(lik,"df")<-c(object$k1,object$k2) > >> lik > >> } > >> ## fit model > >> fit<-brownie.lite(tree,x) > >> ## use it > >> logLik(fit) > >> AIC(fit) > >> > >> All the best, Liam > >> > >> Liam J. Revell > >> Associate Professor, University of Massachusetts Boston > >> Profesor Asistente, Universidad Católica de la Ssma Concepción > >> web: http://faculty.umb.edu/liam.revell/, > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.phytools.orgdata=02%7C01%7Cliam.revell%40umb.edu%7C04607945a9f74968c14e08d73207f67f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637032882107478464sdata=ofsem4h4SNk6g6QFUwD%2BJKO3TsTArNfH9%2BAyYDEjCvY%3Dreserved=0 > >> > >> Academic Director UMass Boston Chile Abroad (starting 2019): > >> https://www.umb.edu/academics/caps/international/biology_chile > >> > >> On 9/5/2019 9:13 AM, Karla Shikev wrote: > >>> [EXTERNAL SENDER] > >>> > >>> Dear all, > >>> > >>> I've been trying to use brownie.lite to implement the tutorial > available > >>> here ( > >> > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftreethinkers.org%2Ftutorials%2Fmorphological-evolution-in-r%2Fdata=02%7C01%7Cliam.revell%40umb.edu%7C04607945a9f74968c14e08d73207f67f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637032882107478464sdata=64k6WMtazzmyn0SLRrx2wEA%2F2wkk3%2B%2F3dBS0HtjlUT8%3Dreserved=0 > ) > >> to > >>> calculate model-averaged rates of evolution and for model selection (1 > >>> versus 2 rates). However, the current version of phytools 0.6-99 won't >
Re: [R-sig-phylo] model averaging for discrete character evolution
I think this is ok in theory: say you have 10% of the weight on the ER model, 90% on the ARD model, so your states at the nodes are based 10% on the probabilities from ER, 90% from ARD. [Worth noting, of course, that with N tips with one character each, estimating N-1 ancestral states, plus the rates, plus the model weights, is asking a bit much of the data -- it can be worth asking if there's another way to answer the biological question]. The risk of model averaging is if some of the models are truly terrible: you might get extreme parameter values because there's not enough info to estimate them well. It won't be apparent with discrete ancestral states (since there's a finite, reasonable state space), but you could have an asymmetrical rate near infinity, for example, if you fit an ARD model with not much data, and infinity times a weight for that model still means a big number. Model averaging for parameter estimates between the rates is ok, btw: symmetrical model: rate_0_to_1 = s rate_1_to_0 = s asymmetrical model: rate_0_to_1 = a rate_1_to_0 = b weighted params: rate_0_to_1 = s * weight_sym + a * weight_asym rate_1_to_0 = s * weight_sym + b * weight_asym Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville On Wed, Aug 7, 2019 at 11:33 AM Jacob Berv wrote: > Dear R-sig phylo > > I’ve been running a few discrete character Mk type models using phytool's > SIMMAP — and I had the idea that it might be useful to try model averaging > across posterior probabilities for node states. > > Might this make sense to do, to avoid problems associated with model > ranking via AIC? Ie, average the node state probabilities based on AIC > weights? Is there some fundamental problem with this? > > I could imagine generating some code to generate all possible transition > models given a set of N states, and then rather than ranking with AIC, > model averaging for parameter estimates (though now that I think about it, > not sure how one might reasonably average a symmetrical rate and an > asymmetrical rate). > > Best, > Jake > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] patristic distance doubts and meanings
Hi, Bruno. It's also possible that there are other ways of getting at the biological question you're going after, since others also examine how phylogenetic relatedness affects overlap (my apologies for the redundancy if you know about this already, just a teachable moment, as they say). An early, key review of this is Webb et al., (2002): https://doi.org/10.1146/annurev.ecolsys.33.010802.150448 and a more recent review is Cavender-Bares et al. (2009): https://doi.org/10./j.1461-0248.2009.01314.x. The literature blossoms from there, but those should give a feel for the approaches. Here's some packages used for things like overlap in species in areas [stolen from the task view <https://cloud.r-project.org/web/views/Phylogenetics.html>]: *Community/Microbial Ecology *: picante <https://cloud.r-project.org/web/packages/picante/index.html>, vegan <https://cloud.r-project.org/web/packages/vegan/index.html>, SYNCSA <https://cloud.r-project.org/web/packages/SYNCSA/index.html>, phylotools <https://cloud.r-project.org/web/packages/phylotools/index.html>, PCPS <https://cloud.r-project.org/web/packages/PCPS/index.html>, caper <https://cloud.r-project.org/web/packages/caper/index.html>, DAMOCLES <https://cloud.r-project.org/web/packages/DAMOCLES/index.html>, and cati <https://cloud.r-project.org/web/packages/cati/index.html> integrate several tools for using phylogenetics with community ecology. HMPTrees <https://cloud.r-project.org/web/packages/HMPTrees/index.html> and GUniFrac <https://cloud.r-project.org/web/packages/GUniFrac/index.html> provide tools for comparing microbial communities. betapart <https://cloud.r-project.org/web/packages/betapart/index.html> allows computing pair-wise dissimilarities (distance matrices) and multiple-site dissimilarities, separating the turnover and nestedness-resultant components of taxonomic (incidence and abundance based), functional and phylogenetic beta diversity. adiv <https://cloud.r-project.org/web/packages/adiv/index.html> can calculate various indices of biodiversity including species, functional and phylogenetic diversity, as well as alpha, beta, and gamma diversities. entropart <https://cloud.r-project.org/web/packages/entropart/index.html> can measure and partition diversity based on Tsallis entropy as well as calculate alpha, beta, and gamma diversities. ecospat <https://cloud.r-project.org/web/packages/ecospat/index.html> can also examine phylogenetic diversity. metacoder <https://cloud.r-project.org/web/packages/metacoder/index.html> is an R package for handling large taxonomic data sets, like those generated from modern high-throughput sequencing, like metabarcoding. Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville On Wed, Feb 6, 2019 at 3:07 PM Eliot Miller wrote: > The amount of time two taxa have been evolving independently from one > another is, barring some complex introgression, independent of other > species in the tree. Pruning the tree from however many species to 700 > won't affect the distances between those 700. After pruning, use > cophenetic() to nearly instantly calculate the relevant distances between > taxa. All the values you need are in the matrix. Access the evolutionary > distance between any two taxa by pulling the relevant element of the > resulting cophenetic matrix, e.g. copheneticMatrix["sp1", "sp2"]. If you > are interested in the time since two taxa shared a common ancestor, > assuming the tree is ultrametric and in units of time, that value is the > distance between those two taxa divided by 2. > > Cheers, > Eliot > > On Mon, Feb 4, 2019 at 5:30 PM Bruno Garcia Luize > wrote: > > > Thank you Liam, > > > > For sure it is a lot of computational time. I'm only interested in a > subset > > of species, lets say close to 700 species in a regional species pool. I > did > > the computations using fastDist and it take almost 7 hs to complete in a > > windows based 16 Gb RAM. The function was realy useful for my propose. > > Another doubt that came is if I prune the bigger tree with my species > list > > and apply the ape::cophenetic the distances will remain the same or the > > prunning will change the tree edges lengths and consequently the > patristic > > distances. But, by the way I see that fastDist did the work I want. > > > > Best regards, > > > >
Re: [R-sig-phylo] Mesquite->ape node numbers and/or parsimony ancestral reconstruction
It's possible that Mesquite (and within R) that discrete traits may be interpreted by the software as continuous, depending on how they're stored -- so it could do a reconstruction of a character with tip states of 0, 1, and 2 and get ancestral values of 0.1, 1.75, etc. using squared change parsimony or the equivalent. This isn't something one should do (for one thing, if the trait is unordered, the ancestor of a clade with taxa in state 0 and 2 having a state of 1 doesn't make sense) but it is possible someone could run squared change parsimony on discrete traits if the software mistakenly treats them as continuous. Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville On Thu, Jan 31, 2019 at 2:53 PM Liam Revell wrote: > Roland. > > You have not done 'squared change parsimony' in that case, but just > parsimony. The problem (among various) is that for a discrete character > there is not always a single most parsimonious set of ancestral states > and there are often many equally parsimonious reconstructions. That's > why some scientists have developed criteria (such as 'ACCTRAN' and > 'DELTRAN') for consistently selecting one or two among the usually many > equally parsimonious reconstructions. One should not be deceived by > this, however. Just because the ACCTRAN and DELTRAN criteria exist > doesn't mean that these MP solutions for ancestral states are any more > probable than any other equally parsimonious solution (and under some > circumstances all parsimonious solutions can be misleading). > > I believe parsimony ancestral state reconstruction is implemented in the > phangorn function ancestral.pars. > > All the best, Liam > > Liam J. Revell > Associate Professor, University of Massachusetts Boston > Profesor Asistente, Universidad Católica de la Ssma Concepción > web: http://faculty.umb.edu/liam.revell/, http://www.phytools.org > > On 1/31/2019 2:18 PM, Roland Sookias wrote: > > Dear Liam, > > > > Thanks very much indeed - a really nice blog post. However, the issue is > > that I am using discrete states, which are not ordered. (and to make > things > > more exciting there are also polymorphisms and missing data - although > they > > can be worked around I guess) > > > > Is there any way to get ML to yield the same as SCP would (in Mesquite at > > least) for this kind of data? > > > > Best wishes > > > > Roland > > > > On Thu, Jan 31, 2019 at 8:03 PM Liam Revell wrote: > > > >> Dear Roland. > >> > >> I didn't really solve your problem of reading a Mesquite result into R; > >> however I just posted about the relationship between unweighted & > >> weighted square-change-parsimony & ML ancestral character estimation on > >> my blog: > >> http://blog.phytools.org/2019/01/unweighted-weighted-square-change.html > . > >> (In summary, you should be able to do what you want within R.) > >> > >> All the best, Liam > >> > >> Liam J. Revell > >> Associate Professor, University of Massachusetts Boston > >> Profesor Asistente, Universidad Católica de la Ssma Concepción > >> web: http://faculty.umb.edu/liam.revell/, http://www.phytools.org > >> > >> On 1/31/2019 12:48 PM, Roland Sookias wrote: > >>> Dear all, > >>> > >>> I want to get squared change parsimony ancestral state reconstructions > >> for > >>> a matrix into R. > >>> > >>> I have tried the asr_max_parsimony() function from castor, but the > >> results > >>> are not the same as in Mesquite for example. To take an example, a > >>> two-taxon clade with a unique state for these taxa optimized as having > a > >>> different state (shared by near relatives) at the ancestral node. This > >>> doesn't happen in Mesquite and is *not* what I want. > >>> > >>> If there were some way to get the results from Mesquite into R that > would > >>> also be OK, but I cannot figure out the node numbering system in > >> Mesquite, > >>> to match them up with nodes as numbered by ape. > >>> > >>> Any help would be *very* greatly appreciated. > >>> > >>> Best wishes, > >>> &g
Re: [R-sig-phylo] OU for non-ultrametric trees
You can run in OUwie if you set the root.age (we should probably make that automatic, but haven't yet). See https://www.rdocumentation.org/packages/OUwie/versions/1.50/topics/OUwie. I believe the past issue with non-ultrametric trees comes from functions that handles the creation of the VCV matrix inappropriately by assuming an ultrametric tree. Using non-ultrametric trees may be possible in other OU packages (ouch, slouch, phylolm, MVmorph, etc.) but I don't know for sure. It's also worth checking for why your tree is non-ultrametric. If it has a mixture of extinct and modern taxa, that's cool. If it's because you think that the branch lengths reflect the amount of change that'd drive the continuous character change (i.e., they mean number of generations for taxa with different generation times, so they don't have the same root to tip length) that's ok, too. If it's just that those are raw branch lengths (parsimony, likelihood) without making the tree a chronogram, that's bad -- the main thing OU/BM models are doing is effectively stretching and smooshing branch lengths (for OU, also adjusting the expected means) so if the starting branch lengths don't mean something that's relevant, the parameters you get from stretching also don't mean anything. [I imagine you're doing it well, just a teachable moment for new students on the list]. Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville On Thu, Dec 6, 2018 at 7:35 AM Danielle Miller wrote: > Hi all, > > I have a non-rooted non-ultrametric tree and a corresponding set of a > single trait values for each one of the tips. > I’m interested whether it follows a BM model or an OU. > > Reading previous comments in the archive, I understood that running an OU > process is inadequate in a case of non-ultrametric trees, however I did not > fully understand why. > As I use ouch package, what are the consequences of running an unrooted > tree? And a non-ultrametric one? > > What would be the best way assessing this issue? Diversitree should be > more reliable? > > I will appreciate any explanation and help, > Danielle Miller > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] understanding variance-covariance matrix
Hi, Agus. The variance-covariance matrix comes from the tree and the evolutionary model, not the data. Each entry between taxa A and B in the VCV is how much covariance I should expect between data for taxa A and B simulated up that tree using that model. I don't want to be *that guy*, but O'Meara et al. (2006) https://onlinelibrary.wiley.com/doi/10./j.0014-3820.2006.tb01171.x has a fairly accessible explanation of this (largely b/c I was just learning about VCVs when working on that paper). Hansen and Martins (1996) https://onlinelibrary.wiley.com/doi/10./j.1558-5646.1996.tb03914.x have a much more detailed description of how you get these covariance matrices from microevolutionary processes. Typically, ape::vcv() is how you get a variance covariance for a phylogeny, assuming Brownian motion and no measurement error. It just basically takes the history two taxa share to create the covariance (or variance, if the two taxa are the same taxon). A different approach, which seems to be what you're doing, would be to simulate up a tree many times, and then for each pair of taxa (including the pair of a taxon with itself, the diagonal of the VCV), calculate the covariance. These approaches should get the same results, though the shared history on the tree approach is faster. Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville On Sat, Aug 25, 2018 at 1:16 PM Agus Camacho wrote: > Dear list users, > > I am trying to make an easy R demonstration to teach the > variance-covariance matrix to students. However, After consulting the > internet and books, I found myself facing three difficulties to understand > the math and code behind this important matrix. As this list is answered by > several authors of books of phylocomp methods, thought this might make an > useful general discussion. > > Here we go, > > 1) I dont know how to generate a phyloVCV matrix in R (Liams kindly > described some options here > < > http://blog.phytools.org/2013/12/three-different-ways-to-calculate-among.html > > > but I cannot tell for sure what is X made of. It would seem a dataframe of > some variables measured across species. But then, I get errors when I > write: > > tree <- pbtree(n = 10, scale = 1) > tree$tip.label <- sprintf("sp%s",seq(1:n)) > x <- fastBM(tree) > y <- fastBM(tree) > X=data.frame(x,y) > rownames(X)=tree$tip.label > ## Revell (2009) > A<-matrix(1,nrow(X),1)%*%apply(X,2,fastAnc,tree=tree)[1,] > V1<-t(X-A)%*%solve(vcv(tree))%*%(X-A)/(nrow(X)-1) >## Butler et al. (2000) >Z<-solve(t(chol(vcv(tree%*%(X-A) > V2<-t(Z)%*%Z/(nrow(X)-1) > >## pics >Y<-apply(X,2,pic,phy=tree) > V3<-t(Y)%*%Y/nrow(Y) > > 2) The phyloVCV matrix has n x n coordinates defined by the n species, and > it represents covariances among observations made across the n species, > right?. Still, I do no know whether these covariances are calculated over > a) X vs Y values for each pair of species coordinates in the matrix, across > the n species, or b) directly over the vector of n residuals of Y, after > correlating Y vs X, across all pairs of species coordinates. I think it may > be a) because, by definition, variance cannot be calculated for a single > value. I am not sure though, since it seems the whole point of PGLS is to > control phylosignal within the residuals of a regression procedure, prior > to actually making it. > > 3) If I create two perfeclty correlated variables with independent > observations and calculate a covariance or correlation matrix for them, I > do not get a diagonal matrix, with zeros at the off diagonals (ex. here > < > https://www.dropbox.com/s/y8g3tkzk509pz58/vcvexamplewithrandomvariables.xlsx?dl=0 > >), > why expect then a diagonal matrix for the case of independence among the > observations? > > Thanks in advance and sorry if I missed anything obvious here! > Agus > Dr. Agustín Camacho Guerrero. Universidade de São Paulo. > http://www.agustincamacho.com > Laboratório de Comportamento e Fisiologia Evolutiva, Departamento de > Fisiologia, > Instituto de Biociências, USP.Rua do Matão, trav. 14, nº 321, Cidade > Universitária, > São Paulo - SP, CEP: 05508-090, Brasil. > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.o
Re: [R-sig-phylo] keeping states discrete when plotting HiSSE results
Hi, Bonnie. It's not treating binary characters as continuous, but rather showing the uncertainty in what the state is: an intermediate color means it's not really sure. There's substantial uncertainty with ancestral states, and our goal was to communicate that rather than the point estimate of whatever state is best at a time lest people over interpret it [which reminds me to finish testing the justifiably controversial OUwie ancestral state recon that was asked about before on R-sig-phylo]. However, the way plot.hisse.states works is basically plotting a tree of rates and then a narrower tree of states on top of that, so I think you could just plot a third tree of states on top of all that. You can get the internal reconstructions and terminal states as binary characters by doing states.tips <- hisse:::ConvertManyToBinaryState(hisse.results, "tip.mat") states.internal <- hisse:::ConvertManyToBinaryState(hisse.results, "node.mat") and then use phytools to plot a tree with discrete traits using these on top of the hisse plot. Here's the relevant code we use for plotting: you'll want to change contMapGivenAnc to a function to give discrete change mappings. state.tree <- contMapGivenAnc(tree=tree.to.plot, x=states.tips, plot=FALSE, anc.states=states.internal, lims=state.lims, ...) state.colors <- colorRampPalette(state.colors, space="Lab")(length(rate.tree$cols)) state.tree$cols[]<- state.colors plot(state.tree, outline=FALSE, lwd=edge.width.state, legend=FALSE, type=type, fsize=fsize, ...) Hope this helps, Brian ___________ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) On Tue, Jul 3, 2018 at 9:44 AM Bonnie Blaimer wrote: > Hi all, > > I am using the HiSSE package and function plot.hisse.states on a list of > several HiSSE objects to generate a heatmap phylogeny of model-averaged > results. > > I'm wondering if there is a way to plot the character state reconstructions > as being discrete changes, and if so, could someone provide guidance on > how? The default seems to treat the binary characters as continuous and > shows a somewhat gradual transition between the two different colors > specified for the two states. > > Many thanks! > Bonnie > > -- > *Bonnie B Blaimer* > Assistant Professor > Department of Entomology & Plant Pathology > 3204D Gardner Hall, Campus Box 7613 > North Carolina State University > Raleigh, NC 27695-7613 > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] diversification rates with two traits present simultaneously
Rich FitzJohn has programmed something of what Will suggested in diversitree: look at details in ?make.musse.multitrait. I'm not sure if there was ever a paper published on this. We do a similar thing to Will's idea in http://rspb.royalsocietypublishing.org/content/283/1830/20152304 (with the added complication of varying over combinations of traits, since converting six binary traits to a 2^6 state musse model would have been a bit ambitious given the data size [or, frankly, given any possible sampling from the tree of life]). However, neither approach deals with hidden sources of rate heterogeneity. Jeremy Beaulieu and Daniel Caetano are always adding stuff to hisse that deals with hidden rates (see preprint on a hidden biogeography model at https://www.biorxiv.org/content/early/2018/04/17/302729), but I don't think anything has been peer-reviewed and published yet for this use case (I think the core hisse model is also in RevBayes, and that may have added options making a musse version possible). If your tree is big enough, separating it into clades and seeing if the pattern (i.e., when converting 00 -> 1, 01 -> 2, 10 -> 3, 11 -> 4 for musse, and you find that state 3 (aka binary states 10) has highest diversification rate) is consistent across the clades could be a good way of seeing if there are some kinds of heterogeneity that matter. Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) On Tue, Jun 19, 2018 at 2:03 AM Gilles Benjamin Leduc wrote: > Hello, > > Personally, I would advise you not to use a tree, for your 2 traits times > 2 (+/-) makes 4 combinations, meaning a tree that is dichotomic cannot > reveal anything! Can you quantify your "diversification" without "rate"? > How do you obtain your "diversification rates"? If it is something you can > count, you may try an old fashion Khi-square test… you would then know if > there is a correlation… > > Have a nice day, > > Benjamin > > > Le Lundi 18 Juin 2018 23:46 GMT, Elizabeth Christina Miller < > ecmil...@email.arizona.edu> a écrit: > > > Hello, > > > > I am wondering what comparative method(s) is appropriate for testing if > > diversification rates are highest when two traits are present together, > > rather than one alone? Specifically, if I have two binary traits, let's > say > > freshwater/marine and temperate/tropical, what is the best way to test if > > diversification rates are highest in tropical+freshwater groups, as > opposed > > to tropical+marine or temperate+freshwater? I think MuSSE can do this, > but > > my tree is large so I want to avoid issues associated with rate > > heterogeneity. Are there any alternatives? > > > > > > > > -- > > Elizabeth Miller > > PhD Candidate: Wiens Lab > > Ecology and Evolutionary Biology > > University of Arizona > > > > [[alternative HTML version deleted]] > > > > ___ > > R-sig-phylo mailing list - R-sig-phylo@r-project.org > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] estimate ancestral state with OUwie models
Prompted by Bruno's email, and similar requests by some students at the Arnold & Felsenstein Evolutionary Quantitative Genetics course, we've started adding ancestral state reconstruction to OUwie. It still needs debugging and testing, but should be ready fairly soon. I do believe bayou does ancestral state estimation, too, but I don't think all the models you want will be in the set. Could be adequate to answering the biological question, though. Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) On Mon, Jun 11, 2018 at 6:01 PM David Bapst wrote: > Just to follow off what Lucas said, but please note you cannot rescale > branches of a phylogeny using an OU model when the tree is > non-ultrametric (such as when it contains extinct, fossil taxa as > tips). Slater (2014, MEE) discusses this more in a brief correction to > Slater (2013). > > I don't know if anyone in this conversation has a non-ultrametric > tree, but I wanted to make that clear for anyone who stumbles on this > thread n the future using a google search. > -Dave > > > > On Sun, Jun 10, 2018 at 12:25 PM, Lucas Jardim > wrote: > > Hi Bruno, > > > > You can transform the branches of your phylogeny using the estimated > > parameters of OU models. Then, if those models describe the observed data > > adequatly, the transformed tree should model the observed data as a > > Brownian motion model. So you can use an ancestral state reconstruction > > based on Brownian motion model. However, I do not know if that is the > best > > approach as optimum values would not be included into the reconstruction > > process. > > > > Best, > > -- > > Lucas Jardim > > Doutor em Ecologia e Evolução > > Bolsista do INCT-EECBio (Ecologia, Evolução e Conservação da > > Biodiversidade) > > Instituto de Ciências Biológicas > > Laboratório de Ecologia Teórica e Síntese > > Universidade Federal de Goiás > > http://dinizfilho.wix.com/dinizfilholab > > > > [[alternative HTML version deleted]] > > > > ___ > > R-sig-phylo mailing list - R-sig-phylo@r-project.org > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > > > -- > David W. Bapst, PhD > Asst Research Professor, Geology & Geophysics, Texas A & M University > https://github.com/dwbapst/paleotree > Google Calendar: https://goo.gl/EpiM4J > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Problem with negative ages in OUwie.boot?
On Wed, May 2, 2018 at 2:53 PM, David Bapstwrote: > Given that your tree appears to be non-ultrametric enough to cause > branching.times to throw some nonsensical node ages, if it is supposed > to be ultrametric. I recommend checking it carefully to figure out why > the tips seem to not quite be at the same distance from the root. > > Sometimes this happens with tree import from a file -- it could be a newick tree with branch lengths precise to the hundredths but a lot of the R ultrametric tests by default use higher precision (1e-08, iirc). Best, Brian [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] grafting chronograms onto backbone phylogeny
Look at congruify.phylo in geiger (and the associated paper: Eastman JM, LJ Harmon, and DC Tank. 2013. Congruification: support for time scaling large phylogenetic trees. Methods in Ecology and Evolution). You could also look into SDM in ape (converting trees [not data] to distance matrices, first): relevant paper Criscuolo, A., Berry, V., Douzery, E. J. P. , and Gascuel, O. (2006) SDM: A fast distance-based approach for (super)tree building in phylogenomics. Systematic Biology, 55, 740–755. Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) On Thu, Apr 26, 2018 at 5:07 PM, Chris Law <cj...@ucsc.edu> wrote: > Hi all, > > I am trying to graft 3 chronograms onto a family level backbone. Does > anybody have any suggestions on what is the best way to do this? Or is > going through the tree files in textwrangler the only way to do this. > > Thanks! > > > *Chris Law **|* > * *PhD Student > *|* > * *University of California, Santa Cruz > > > > Coastal Biology Building > > 130 McAllister Way > Santa Cruz, CA 95060 > cj...@ucsc.edu *|* research.pbsci.ucsc.edu/eeb/cjlaw/ > Small Mammal Research in the Forest Internship > <http://research.pbsci.ucsc.edu/envs/smurf> > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r- > sig-ph...@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] quantitative state dependent diversification of another quantitative trait
I think you want SLOUCH: Hansen et al. 2008: https://onlinelibrary.wiley.com/doi/pdf/10./j.1558-5646.2008.00412.x. One trait evolves under Brownian motion, another trait follows it. Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) On Wed, Apr 4, 2018 at 2:43 PM, Zach Culumber <zculum...@gmail.com> wrote: > Hi everyone, > > We are interested in examining whether shifts in one continuous trait have > preceded shifts in another continuous trait through evolutionary time for a > phylogeny of ~80 species. To do this we were thinking of calculating the > change in the value of each trait for each tip to its most recent ancestor > (i.e., node) and then for each node to its most recent ancestral node. We > have all the tip values and conducted ASR to get the node values. However, > we are unaware of any functions or packages that would automate calculation > of all these distances for teach tip and node to its most recent ancestor. > Does anyone have any suggestions of functions or packages that might > possibly do this? Alternatively, are there better existing options for > exploring this question? I.e., something similar to QuaSSE but for analysis > between two quantitative traits rather than traits and species > diversification? Something like Bayes Traits or PGLS would allow tests of > correlated evolution, but not the order of evolutionary shifts. > > Thank you for any insights! > > Zach > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r- > sig-ph...@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Interpretation of standard errors of parameter estimates in OUwie models
Apparently the plot did not come through correctly. I'm attaching a PDF of it. Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) On Wed, Apr 4, 2018 at 4:24 PM, Brian O'Meara <omeara.br...@gmail.com> wrote: > Hi, Rafael et al. > > ?OUwie has some information on the standard errors: > > The Hessian matrix is used as a means to estimate the approximate standard >> errors of the model parameters and to assess whether they are the maximum >> likelihood estimates. The variance-covariance matrix of the estimated >> values of alpha and sigma^2 are computed as the inverse of the Hessian >> matrix and the standard errors are the square roots of the diagonals of >> this matrix. The Hessian is a matrix of second-order derivatives and is >> approximated in the R package numDeriv. So, if changes in the value of a >> parameter results in sharp changes in the slope around the maximum of the >> log-likelihood function, the second-order derivative will be large, the >> standard error will be small, and the parameter estimate is considered >> stable. On the other hand, if the second-order derivative is nearly zero, >> then the change in the slope around the maximum is also nearly zero, >> indicating that the parameter value can be moved in any direction without >> greatly affecting the log-likelihood. In such situations, the standard >> error of the parameter will be large. >> >> For models that allow alpha and sigma^2 to vary (i.e., OUMV, OUMA, and >> OUMVA), the complexity of the model can often times be greater than the >> information that is contained within the data. As a result one or many >> parameters are poorly estimated, which can cause the function to return a >> log-likelihood that is suboptimal. This has great potential for poor model >> choice and incorrect biological interpretations. An eigendecomposition of >> the Hessian can provide an indication of whether the search returned the >> maximum likelihood estimates. If all the eigenvalues of the Hessian are >> positive, then the Hessian is positive definite, and all parameter >> estimates are considered reliable. However, if there are both positive and >> negative eigenvalues, then the objective function is at a saddlepoint and >> one or several parameters cannot be estimated adequately. One solution is >> to just fit a simpler model. Another is to actually identify the offending >> parameters. This can be done through the examination of the eigenvectors. >> The row order corresponds to the entries in index.matrix, the columns >> correspond to the order of values in eigval, and the larger the value of >> the row entry the greater the association between the corresponding >> parameter and the eigenvalue. Thus, the largest values in the columns >> associated with negative eigenvalues are the parameters that are causing >> the objective function to be at a saddlepoint. >> > > You can get better (by which I mean more accurate) estimates by > parametric bootstrapping (simulation). > > Doing a quick summary of your OUM results (code at end): > > Trait: contr > nonfor: 0.12 (-0.12, 0.36) > inter: 0.12 (-0.12, 0.36) > fores: 0.13 (-0.13, 0.39) > Phylogenetic half life: 0.06 > Tree height: 28.34 > > Trait: cshd > nonfor: 0.24 (-0.23, 0.72) > inter: 0.33 (-0.32, 0.98) > fores: 0.2 (-0.19, 0.6) > Phylogenetic half life: 6.35 > Tree height: 28.34 > > Trait: dors > nonfor: 0.04 (-0.04, 0.12) > inter: 0.15 (-0.14, 0.44) > fores: 0 (0, 0.01) > Phylogenetic half life: 3.68 > Tree height: 28.34 > > Trait: vent > nonfor: 0.05 (-0.05, 0.14) > inter: 0.32 (-0.3, 0.93) > fores: -0.03 (0.03, -0.08) > Phylogenetic half life: 5.43 > Tree height: 28.34 > > It does seem that you don't have a lot of ability to tell optima apart. > The alpha value is pretty low, though not tremendously (the half life is a > good way to look at that). But the raw data also suggest it's not three > very clear clumps (beeswarm plot, regimes on the x axis, trait 1 value on > the y). It's not phylogenetically corrected, but still a good gut check: > > > > I agree that for your data, it may be t
Re: [R-sig-phylo] Interpretation of standard errors of parameter estimates in OUwie models
cat(paste0("\n",regimes[regime.index], ": ", result.vector[1], " (", result.vector[2], ", ", result.vector[3], ")")) if (trait.index==1) { observations[[regime.index]] <- bla$data[which(bla$data[,1]==regime.index),2] } } cat(paste0("\nPhylogenetic half life: ", round(log(2)/data["alpha", traits[trait.index]], 2))) cat(paste0("\nTree height: ", round(max(ape::branching.times(bla$phy)),2))) } points.to.plot <- bla$data[,1:2] colnames(points.to.plot) <- c("regime", "trait1") beeswarm(trait1~regime, data=points.to.plot, col="black", pch=20) ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) On Wed, Apr 4, 2018 at 4:07 PM, Jacob Berv <jakeberv.r.sig.ph...@gmail.com> wrote: > Even if those models may fit the data better, they may not necessarily > help Rafael determine whether or not the parameter estimates of interest > are different across regimes (though perhaps BMS might be informative). > > Rafael, maybe you could try fixing the ancestral regimes to match most > likely states for each node from your SIMMAP summary? I wonder if that > might decrease your ‘uncertainty’ in parameter estimates. > > I don’t have a great answer for your main question though, which is a good > one. > > Jake > > > On Apr 4, 2018, at 8:59 PM, William Gearty <wgea...@stanford.edu> wrote: > > > > Hi Rafael, > > > > Have you tried running the BM1, BMS, or OU1 models? If so, how do the > AICc > > values for those compare to those of the OUM model? If not, you should > make > > sure you run those. > > If you find that the these models fit your data better, this could > indicate > > that there isn't a large different across regimes and an OUM model isn't > > necessarily suitable. > > > > Best, > > Will > > > > On Wed, Apr 4, 2018 at 12:30 PM, Rafael S Marcondes < > raf.marcon...@gmail.com > >> wrote: > > > >> Dear all, > >> > >> I'm writing (again!) to ask for help interpreting standard errors of > >> parameter estimates in OUwie models. > >> > >> I'm using OUwie to examine how the evolution of bird plumage color > varies > >> across habitat types (my selective regimes) in a tree of 229 tips. I was > >> hoping to be able to make inferences based on OUMV and OUMVA models, > but I > >> was getting nonsensical theta estimates from those. So I've basically > given > >> up on them for now. > >> > >> But even looking at theta estimates from OUM models, I'm getting really > >> large standard errors, often overlapping the estimates from other > selective > >> regimes. So I was wondering what that means exactly. How are these erros > >> calculated? How much do high errors it limit the biological inferences I > >> can make? I'm more interested in the relative thetas across regimes > than on > >> the exact values (testing the prediction that birds in darker habitats > tend > >> to adapt to darker plumages than birds in more illuminated habitats). > >> > >> I have attached a table averaging parameter estimates and errors from > >> models fitted across a posterior distribution of 100 simmaps for four > >> traits; and one exemplar fitted model from one trait in one of those > >> simmaps. > >> > >> Thanks a lot for any help, > >> > >> > >> *--* > >> *Rafael Sobral Marcondes* > >> PhD Candidate (Systematics, Ecology and Evolution/Ornithology) > >> > >> Museum of Natural Science <http://sites01.lsu.edu/wp/mns/> > >> Louisiana State University > >> 119 Foster Hall > >> Baton Rouge, LA 70803, USA > >> > >> Twitter: @brown_birds <https://twitter.com/brown_birds> > >> > >> > >> ___ > >> R-sig-phylo mailing list - R-sig-phylo@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > >> Searchable archive at http:
Re: [R-sig-phylo] correlation of binary and multistate
You can do this with corHMM by making an appropriate model with rayDISC. For example, if character A has states 0,1 and character B has states 0,1,2, you then recode this into a single multistate character: 00 -> 0 01 -> 1 02 -> 2 10 -> 3 11 -> 4 12 -> 5 and then create a rate matrix that matches your hypotheses (i.e., force the 00->10 rate [the 0->3 rate in the new coding] to be the same as the 01->11 and 02->12 rates (so that the state of the second character doesn't affect the rate of the first character) or and compare (or model average) with a model where they are free to vary). It wouldn't be hard to make a looping thing to do this automatically (Jeremy Beaulieu has for two and three binary trait correlation in the same package with corDISC but not for multistate yet) but as far as I know no one has (pull requests welcome!). There may also be this functionality in phytools: I remember some Pagel correlation work was added, but I don't know about its limitations with multistate. Best, Brian PS: Of course, with all this stuff, I now worry about the Maddison and FitzJohn issue: https://academic.oup.com/sysbio/article/64/1/127/2847997. _______ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) On Wed, Mar 21, 2018 at 3:19 PM, John Denton <jden...@amnh.org> wrote: > Hi all, > > Does anybody know of a package that does character correlations between > binary and multistate characters (and even better with the possibility of > missing data)? > > Thanks! > > ~John > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r- > sig-ph...@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] How to test stratification of sampling across tree?
One thing you could do is take all nonoverlapping pairs of taxa (Felsenstein's other technique in the contrasts paper): that is, for a tree (A,(B,(C,(D,E, you can look at D-E and B-C, *or* D-E and A-B, *or* D-E and A-C, *or* A-B and C-D, etc. (so, still leaving out one taxon each time, but only using each edge once) and then compare the states for pair of taxa. If state 0 has freq p, and state 1 has freq q, you should see p^2 0-0 pairs, 2pq 0-1 pairs, and q^2 1-1 pairs with truly random sampling; if it's maximally stratified, you should see only two of these pairs (i.e., if 1 is less common, you should see only 0-1 and 0-0 pairs). You could rig up a test statistic (prob comparing two multinomial models) from this. I have some code that purportedly gets all independent such pairs in R at https://r-forge.r-project.org/scm/viewvc.php/pkg/R/independentTaxa.R?view=markup=366=omearalab. However, I haven't rigorously tested it (this was in the dark ages before we all used testthat, too), so feel free to take, hack, republish, etc., but test first [anyone else should feel free to take this, too -- it could naturally go into phytools, ape, or phangorn, for example, assuming it actually works]. Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) Communication Director, Society of Systematic Biologists On Tue, Mar 14, 2017 at 1:37 PM, Ross Mounce <ross.mou...@gmail.com> wrote: > So I tried a 12-taxon fully pectinate tree with Blomberg's K as calculated > by picante::Kcalc() > > library(picante) > library(ape) > aa<-"(A,(B,(C,(D,(E,(F,(G,(H,(I,(J,(K,L)));" > t1<-read.tree(text=aa) > t4 <- compute.brlen(t1,method="Grafen",1) > tipvals <- c(0,1,0,1,0,1,0,1,0,1,0,1) > Kcalc(tipvals,t4) > > K = 0.3487135 > > > There are possible 924 permutations of 6 'painted' tips from 0011 > to 1100 (each of those two extreme distributions gives the maximum > value of K for this particular tree & number [6] of painted tips: 2.37703) > > There are 336 discrete integer value results of K for this tree, and > painting 6 tips: > > Min. 1st Qu. MedianMean 3rd Qu.Max. > 0.3438 0.3676 0.4489 0.5332 0.5945 2.3770 > > > A histogram of all 924 possible values of K (for 6 painted tips) shows that > Blomberg's K in terms of value distribution is extremely positively skewed > (it has a skewness of 2.671139), which is great if you're looking to test > phylogenetic signal without false positives, but not so great if you're > trying to assess "evenness" at the other tail of the distribution. In the > case of this exact tree, because I've enumerated all possible permutations > of 6 painted tips, I can calculate the 5% significance threshold as values > of K that are 0.3480158 or less. It seems some normalisation procedure > might be needed before safely using Blomberg's K to assess the significance > of evenness, if one is not going to exhaustively examine/enumerate all > possible values (which I can't do for a 1000+ tip tree). > > Does that make sense? > > Certainly interesting... > > > Ross > > > > > > > > > On 14 March 2017 at 14:53, Ross Mounce <ross.mou...@gmail.com> wrote: > > > Thanks Dave, > > > > I'll try Blomberg's K with small simulated fully-bifurcating trees of > > simple shape (e.g. fully pectinate), where I can easily paint the tips > > myself in what I believe to be a "maximally stratified manner" e.g. > > 010101010 to see if Blomberg's K does actually reach minimum (i.e. > 0.0 > > ?) for such a distribution. If it does, great! This is the measure I > need. > > > > I still wonder though, for a complex tree structure in terms of > > balance/shape somewhere intermediate between fully balanced and fully > > pectinate; how does one arrive empirically at _the_ most optimal > > stratified/even sampling ('painting') of tips if say only 25% of tips > > are/can be 'painted'. I guess a lot depends on how one defines what 'even > > sampling' on a phylogeny actually is, does it include branch lengths et > > cetera... > > > > I'll give it a try anyway, > > > > Thanks again, > > > > Ross > >
Re: [R-sig-phylo] Use of a Phylocom tree for PGLS
There also are several empirical plant chronograms available for reuse: check TreeBase, OpenTree, and Dryad. Or you could use something like phlawd or supersmart (just out in SystBio) to make a phylogeny for your taxa. Phylomatic is convenient and will have all your taxa placed in some way, but it'd make your conclusions more robust to just download a tree, make sure the taxa match yours (look at the taxize package), prune out the taxa that don't match (treedata() in Geiger), and rerun. If the conclusions are qualitatively the same, you satisfy the reviewer; if not, you may have saved yourself from an error. Either way the paper is improved. I haven't reviewed your paper, but if the central point is using a phylogeny to look at signal (rather than have this be just one small part of a paper focused on something else), then doing even more to get a better tree might be worth it -- the tree is THE thing used in every analysis, may as well get it right (or at least, less wrong). If the paper is mostly on something else, though, this might be a good enough solution. Two notes for disclosure: I've been on papers that make empirical trees, so I suppose there's a minor COI there (I might get one more citation). I'm also part of a project, phylotastic, that is working to make getting trees easier. We're doing a soft release soon, but there's nothing I'd suggest you try using yet (we know people are going to form a first impression and then decide to come back based on that first impression -- we're not quite ready for our debut yet, though it is all open). My part is to make it easier to get dates on trees: there's an R package for this (datelife, on github) and a website, but I'm still debugging -- don't use yet for something that matters, but it should make all this easier in the future. But it does give me an incentive to say, "get a better tree", so take that into consideration. Best, Brian _______ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) Communication Director, Society of Systematic Biologists On Sat, Sep 17, 2016 at 11:38 PM, Jacob Berv <jakeberv.r.sig.ph...@gmail.com > wrote: > Personally, I don’t think I’d have a problem with this approach > (especially given the paper Liam cited) given that you are using a > phylogeny (a model) to test a hypothesis, which is, after all, all we can > ever do. You can always do more, so any threshold of phylogenetic tree > “quality” is going to be somewhat arbitrary anyway. I don’t mean to sound > defeatist — obviously you should do the best job that you can, and it > sounds like you have done that given that the goal of your research is not > to reconstruct a new phylogeny of the clade you’re studying. Perhaps you > can try to explain this in your rebuttal. > > Or alternatively, use Liam’s awesome code (below) to generate all possible > phylogenetic trees give your polytomies, and run your PGLS on all of them > to see if its sensitive to topology. You could probably loop this up > somehow without much effort. > http://blog.phytools.org/2016/08/resolving-one-or-more- > multifurcations.html?m=1 <http://blog.phytools.org/ > 2016/08/resolving-one-or-more-multifurcations.html?m=1> > > Jake > > > > > On Sep 17, 2016, at 6:46 PM, Oscar Valverde <os.valverd...@gmail.com> > wrote: > > > > Dear colleages > > > > I am working on an phylogenetic signal and PGLS analysis using a database > > with values for ~600 plant species. To construct my phylogeny I used the > > backbone Phylomatic supertree (http://phylodiversity.net/phylomatic/) > and > > added branch lengths with the based on a fossil calibration for > angiosperms > > using the bladj function in phylocom, and resolved polytomies with the > > multi2di function in phylotools. > > > > Now that I am trying to publish the paper, some reviewers indicated that > > such tree is not suitable for statistical analysis because the level of > > resolution of the tree is too low (to the family level maybe?) and the > > uncertainty is too high to get any reliable result with respect to PGLS > or > > phylogenetic signal of the traits. Instead, they suggest I should build > my > > own tree based on sequences. Of course this is a major project to > undertake > > and in my opinion far
Re: [R-sig-phylo] Standard errors of theta estimates in OU models
That's a really cool question. I don't know. I'm pretty sure that practically, the answer is no (esp when you consider all the issues -- any feasible OU is too simple a model to match reality, tree errors, data errors, etc. -- that also come into an analysis). However, I could imagine that in a perfect world you could look at expected standard errors created using parametric bootstrapping from the MLE of the parameters and see if the actual standard errors are different -- maybe it could reflect plasticity or some other factor. But I really don't know. You might look at some of the methods that can deal with intraspecific and interspecific processes to get at this sort of biological question (the Felsenstein 2008 Contrasts revisited paper might be interesting as a start, though it's not OU and comes at this from a different direction), but it's fun to think about ways to go from what is treated as annoying noise to a meaningful signal. Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) Communication Director, Society of Systematic Biologists On Thu, Sep 15, 2016 at 5:37 PM, Xiaojing Wei <weixx...@umn.edu> wrote: > Dear R-sig-phylo users, > > I wonder if the standard errors of the theta estimates in OU models have > any biologically meaningful interpretations. For instance, could it give > some indication of plasticity in traits, or the optimal level of plasticity > in traits? Or does it simply reflect estimation uncertainty of the > phylogeny or the OU model? > > Thanks a lot! > > -- > Xiaojing Wei > PhD student > Dept. of Ecology, Evolution, and Behavior > University of Minnesota > Rm. 211 Ecology Bld., 1987 Upper Buford Cir. > St. Paul, 55108 > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r- > sig-ph...@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Query about Phylogenetics in R
A similar solution is to create a pdf: pdf(file="AwesomeTree.pdf", width=10, height=10) plot(phy) dev.off() As you increase the width and height, the labels become proportionally smaller. You can probably mess around with cex options, too, but this is the approach I normally use. Best, Brian _______ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) Communication Director, Society of Systematic Biologists On Wed, Aug 31, 2016 at 5:25 AM, Vojtěch Zeisek <vo...@trapa.cz> wrote: > Hello, > if onlythe readibility of the labels is the issue, what about using svg() > function? It produces a SVG file which You can open in almost any vector > editor (like Inkscape) andmove the labels around. Chech ?svg for possible > settings and play with them little bit. They vary among operating systems. > So You can do something like > svg(...) # Any parameters inside, at least output file name > # All the tree ploting commands... > dev.off() # To save the SVG file to the disc > Check also parameters of function tip.labels() - You can specify font size > or replace the text with some symbol. > (Sorry for being so brief, I don't have exact examples handy) > Yours, > V. > > > 31. srpna 2016 11:11:33 CEST, "Bhuller, Ravneet" < > ravneet.bhulle...@imperial.ac.uk> napsal: > >Dear All, > > > >I am trying to construct a phylogenetic tree (neighbour joining) using > >either APE or PHANGORN in R. But, since I have got 220 strains of > >bacteria in my data, the resulted tree is not very clear. The branch > >labels are so much overlapping that they cannot be read at all. Is > >there any way, I can get a neat tree with clearly read labels? Any > >guidance will be very much appreciated. > > > >Regards, > > > >Rav > -- > Vojtěch Zeisek > http://trapa.cz/cs > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r- > sig-ph...@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Measurement error for tips with only 1 measured specimen (OUwie)
Another possibility is to do a regression to predict variance for species with a single observation. Or even do a phylogenetic regression so species nearer the ones with missing data matter more. But all this stuff is minor tweaks: it's great you're incorporating measurement error at all, and I hope your results are robust to any of these suggestions for tweaks. Best, Brian ___ Brian O'Meara, http://www.brianomeara.info, especially Calendar <http://brianomeara.info/calendars/omeara/>, CV <http://brianomeara.info/cv/>, and Feedback <http://brianomeara.info/teaching/feedback/> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville Associate Director for Postdoctoral Activities, National Institute for Mathematical & Biological Synthesis <http://www.nimbios.org> (NIMBioS) Communication Director, Society of Systematic Biologists On Tue, Aug 16, 2016 at 11:53 PM, Liam J. Revell <liam.rev...@umb.edu> wrote: > Hi Santiago. > > This is identical to my suggestion, except that the pooled variance is a > weighted mean in which weights (for better or worse) are proportional to > the sample size of each species. If the variances are indeed homogeneous, > this should be preferred because it gives greater weight to species whose > variance we should know well. If not, then it risks giving high weight to a > species with a well-estimated, but peculiarly high or low variance. > Computing the straight mean, as you suggest, comes with exactly the > opposite set of shortcomings since species with relatively small sample > sizes may have very poorly estimated variances. > > All the best, Liam > > Liam J. Revell, Associate Professor of Biology > University of Massachusetts Boston > web: http://faculty.umb.edu/liam.revell/ > email: liam.rev...@umb.edu > blog: http://blog.phytools.org > > On 8/16/2016 10:46 PM, Santiago Claramunt wrote: > >> Hi Rafael, >> >> Your method would underestimate the error associated with values derived >> from single specimens because those values would have the highest errors, >> not average errors. >> What I have done in such cases is to estimate an average standard >> deviation across species and use that average standard deviation as the >> standard error of the species with single specimens. >> I don't know of a formal description of this solution but it is mentioned >> in one of my papers: http://rspb.royalsocietypublis >> hing.org/content/279/1733/1567 >> >> Best, >> >> Santiago >> >> Research Associate >> Department of Ornithology >> American Museum of Natural History >> https://sites.google.com/site/sclaramuntuy/ >> >> >> On Aug 16, 2016, at 4:34 PM, Rafael S. Marcondes <raf.marcon...@gmail.com> >>> wrote: >>> >>> Hi all, >>> >>> I’m using OUwie to fit multi-optima OU models and I have a question about >>> incorporating measurement error into my analyses. >>> >>> I’m running my models with known measurement error (mserr=‘known’) and >>> using the standard error (std.error()) as an estimate of it, as >>> recommended >>> by Ives et al (2007). However, for some (a minority) of my tips, I was >>> only >>> able to measure 1 specimen, so I have no standard error for them. So I’m >>> not sure about how to deal with those. At first I thought about just >>> setting their measurement error as 0, but then I figured that would >>> introduce false confidence. So what I’m doing now is I’m setting >>> measurement error for those tips as the mean of the errors of all the >>> tips >>> for which I did measure more than one specimen. I got that idea also from >>> Ives et al when they mention averaging the error across species (jn the >>> third-to-last paragraph), but that was in a different context. I can’t >>> find >>> any references that report dealing with the same problem, even though I >>> assume it must not be an uncommon one. So I’m wondering if mine is really >>> the best way to do it and, or if anyone has alternative suggestions? >>> >>> i hope I’ve made my problem clear, and thanks in advance for any >>> suggestions. >>> >>> >>> *--* >>> *Rafael Sobral Marcondes* >>> >>> PhD Candidate (Systematics, Ecology and Evolution/Ornithology) >>> Museum of Natural Science <http://sites01.lsu.edu/wp/mns/> >>> Louisiana State University >>> 119 Foster Hall >>
Re: [R-sig-phylo] Chronos function in ape
Geiger has congruify functions that can use pathd8 to make a tree ultrametric (perhaps using an external chronogram to date it). There's some code in there to use r8s, as well, but it takes some digging. Best, Brian ___ Brian O'Meara Associate Professor Dept. of Ecology & Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info On Wed, Jul 6, 2016 at 1:02 PM, Marco Fracassetti < marco.fracasse...@unibas.ch> wrote: > Hi to all of you, > > I want to calibrated a relationship tree done with Treemix (Pickrell & > Pritchard 2012) done with SNPs frequency. The branch length of the tree > reflect the amount of genetic drift. > The chronos function from the package ape want tree whose branch lengths > are in number of substitution per sites. > Is It correct to t use the chronos function ? > There is another R function that could transform a relationship tree in > ultrametric tree? > > Thanks in advance for the help, > > Marco > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] [function like ratebystate, but for discrete characters?]
Under the hood, the methods basically treat (0,0), (0,1), (1,1), and (1,0) as states 1, 2, 3, 4. The difference is that the rate matrix is (often) constrained so that you can't change both traits instantaneously: the 00 <-> 11 rates are forced to be zero, as are the 01 <-> 10 rates. You can change this assumption. I typically think it makes sense to keep this for most traits: even if biologically you think 11 is unstable, if you think it is an intermediate, then I'd keep it. There could be traits where 11 just can't exist: then I might do a three state model of 00, 01, and 10. But I agree with Liam; for your questions, these sorts of approaches are the natural ones to use. However, if you haven't already, read Maddison & FitzJohn 2015: http://sysbio.oxfordjournals.org/content/64/1/127.long . It has some important things to worry about regarding all these tests. My personal rule of thumb at this point is that if you see traits changing many times on the tree, it's ok to use methods like Pagel (1994), but still be worried; if you have few changes, back away slowly and think about other potential questions. This might change as we get new approaches, but I doubt we'll ever get to the point where we can make strong conclusions from a few changes. Best, Brian _______ Brian O'Meara Associate Professor Dept. of Ecology & Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info On Tue, May 3, 2016 at 8:49 PM, Michael Foisy < michael.fo...@mail.utoronto.ca> wrote: > > Hi Liam, > > Thank you for your quick reply! Been playing around with > fitPagel{phytools} and corDISC{corHMM}. That makes sense; state-dependent > model, so just look at the transition rates between characters! > > This might be a silly question, but I'll ask it anyways. For the traits > that I'm looking at, it doesn't make biological sense to keep Trait A, if > you gain Trait B. Given this, one might expect the distribution of traits > at the tips to be A or B, but rarely both A and B. And so, what I'm really > interested in is something that might look (in reality) more like (1, 0) > --> (0, 1), and less like (1, 0) --> (1, 1) --> (0, 1), because the middle > state is not evolutionarily stable. So, I guess what I'm wondering, is: > if Trait A facilitates the evolution of Trait B (1,0) -> (1,1), and Trait A > is always lost quickly after the evolution of Trait B (1,1) -> (0,1), then > would this be an issue for fitPagel and corDISC models? I know they > estimate all these transition rates, so maybe I'm just overthinking this. > Or, might it be better to treat the two characters as one, multistate > character? (they, along with trait C, serve the same function; they're just > phenotypically distinct). J! > ust want to make sure I'm using the right models for the traits I'm > looking at! > > > I hope that was clear... Thank you very much! > , Michael Foisy > > MSc. Student > University of Toronto > michael.fo...@mail.utoronto.ca > > From: Liam J. Revell <liam.rev...@umb.edu> > Sent: Tuesday, May 3, 2016 8:11:39 PM > To: Michael Foisy; r-sig-phylo@r-project.org > Subject: Re: [R-sig-phylo] [function like ratebystate, but for discrete > characters?] > > Hi Michael. > > This is essentially the method of Pagel (1994). Though often interpreted > as a test for correlated evolution of two binary characters, the model > that is actually being fit is one in which the rate of transition 0->1 > (or vice versa) in character A is affected by the state of binary > character B; or, conversely, if the rate of transition 0->1 (or vice > versa) in character B is affected by the state of character A. > > This is implemented in fitPagel of phytools and I believe that this & > other more sophisticated flavor are available as part of the package > corHMM. > > All the best, Liam > > Liam J. Revell, Associate Professor of Biology > University of Massachusetts Boston > web: http://faculty.umb.edu/liam.revell/ > email: liam.rev...@umb.edu > blog: http://blog.phytools.org > > On 5/3/2016 8:00 PM, Michael Foisy wrote: > > Hello, > > > > Quick question. I'm interested in whether the presence/absence of > character 'A' affects the rate of evolution in character 'B'. Is there a > function that will do this for two or more discrete characters? > > > > I've seen ratebystate {phytools} for continuous data; but my data are > all binary! > > > > > > Thank you! > > , Michael Foisy > > > > MSc. Student > > University of Toronto > > michael.fo...@mail.utoronto.ca > > > > > > [[alternative HTML version deleted]] > > > > __
Re: [R-sig-phylo] estimating the evolutionary rate of a continous trait
Hi, Belinda. One thing to watch is over-intepretation: BM is consistent with drift, OU is consistent with selection, but various kinds of selection can also lead to BM. [1]. A lot of the people involved in these methods (including me) are guilty of sloppiness in language in this area, leading to confusion. Also, another issue could be branch lengths: are they proportional to time (or, arguably, something like number of generations)? All these methods are basically stretching the tree in various ways (and, for OU, adjusting expected means), so if branch lengths are arbitrary, so are the results. I ask due to the lambda fit. Other things that can cause this are unincorporated measurement error (b/c it looks like a lot of evolution right at the tips, meaning little evolution along the branches). I'd suggest incorporating this (you can give known estimates of measurement error; some programs allow this to be estimated as well (I forget whether we have the estimation bit exposed in OUwie, but known error should be there at least)). Surface can estimate different regimes on the tree, but IIRC it does not estimate different rates, just OU means. I believe auteur (in geiger) and BAMM can model different BM rates on different branches, which sounds like your question. We have some code to try different OU models (including perhaps ones with different rates) on different branches buried somewhere in OUwie (we call it OUwie-dredge so people know to be cautious) but it hasn't been tested or published yet. One other caution: if you have a BM model, and are just modeling different rates, it can be done well. Trying to estimate different rates with OU models is harder to do well: there's interaction between alpha and sigma that can make them difficult to distinguish (not formally nonidentifiable [unlike OU means sometimes], just difficult). For me, I'd probably lean towards one of the BM rate heterogeneity methods for your question. Hope this helps, Brian [1] Hansen, T. F. and E. P. Martins. 1996. Translating between microevolutionary process and macroevolutionary patterns: the correlation structure of interspecific data. Evolution 50:1404-1417. ___ Brian O'Meara Associate Professor Dept. of Ecology & Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info On Fri, Apr 15, 2016 at 10:20 AM, Belinda Kahnt <belind...@gmx.de> wrote: > Dear all, > I am a newbie to this mailing list and phylogenetic analyses in R and hope > you can comment on a question I have. I would like to estimate the rate of > evolution of a continous trait and check if the trait evolves faster along > some branches of the topology. I already checked with Pagels lambda if the > trait evolves according to a Brownian motion model,(i.e. pure drift), which > it did not (lambda ~ 0). Modelling the trait evolution under the > Ornstein-Uhlenbeck (OU) model provided a significant better fit (p<- 0.01) > with a very high alpha parameter of 8.7 (indicating a role of selection in > trait evolution). In order to infer now the rate of trait evolution I am > searching for a R package that is able to do this inference based on the OU > model. However, so far I either just found methods based on the Brownian > motion model (e.g. brownie.lite function in phytools, Motmot) or methods > that also rely on the reconstruction of ancestral selective regimes > beforehand (like mvMORPH or OUwie). I want to keep the model as simple as > possible and therefore don't know if it is necessary to also model > ancestral selective regimes if I just want to know if the evolutionary rate > of the trait varies along branches and which branches show an accelerated > rate?! Which programm would you recommend? I would be very thankful for > your help and recommendations. > Best wishes, > Belinda > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Accounting for phylogeny in binary predictor, binary response data
There's also corDISC in corHMM (disclosure: I'm a coauthor on the package). It does Pagel 1994 with two or three binary characters, allows for user set root states (I believe the canonical Pagel (1994) assumes equal freq at the root, I don't know if phytools uses this same assumption [a lot of programs assume equilibrium freq]), and allows additional control over the rate matrix. However, it requires you to specify what sort of rate matrix you want (has equal rates, all rate different, etc., but you would have to specify the dependent model matrix): the phytools code seems to loop over the named Pagel matrices, which are a subset of the possible matrices. So, I'd summarize it as phytools being more turnkey, corHMM having more control but less easy to use. Oh, and in either case, read this and be afraid before starting analyses: Maddison & FitzJohn, 2014: http://sysbio.oxfordjournals.org/content/64/1/127 . Best, Brian ___ Brian O'Meara Associate Professor Dept. of Ecology & Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Wed, Feb 10, 2016 at 12:41 PM, Heath Blackmon <coleo...@gmail.com> wrote: > Grace, > > The package phytools includes a function fitPagel if that is the method > that you want to use. He has a post on his blog that discusses his > implementation so it won't be a black box for you: > > > http://blog.phytools.org/2014/12/r-function-for-pagels-1994-correlation.html > > cheers > > Heath > > On Wed, Feb 10, 2016 at 11:14 AM, Joe Felsenstein <j...@gs.washington.edu> > wrote: > > > I do not know offhand whether there is an R implementation, but how about > > Mark Pagel's 1994 method for testing whether two 0/1 characters changing > > along a ohylogeny are changing independently? > > > > J.F. > > - > > j...@gs.washington.edu > > Joe Felsenstein, Department of Genome Sciences and Department of Biology > > Box 355065, University of Washington, Seattle, WA 98195-5065 > > > > [[alternative HTML version deleted]] > > > > ___ > > R-sig-phylo mailing list - R-sig-phylo@r-project.org > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > > Searchable archive at > > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > > > > > -- > *Heath Blackmon, Ph.D.* > > *Postdoctoral Researcher* > *University of Minnesota* > *coleoguy.github.io <http://coleoguy.github.io>* > *@coleoguy* <https://twitter.com/coleoguy> > *Phone 682-444-0538* > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Fitting Brownian evolution with bounds
There's alpha code to do this in TreEvo, but we're still readying this for publication (and it's now supported by an NSF grant, so expect much more development), and so I wouldn't advise using it yet. I don't know of anything else at the moment. Best, Brian ___ Brian O'Meara Associate Professor Dept. of Ecology & Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Mon, Nov 30, 2015 at 4:18 AM, saurabh <saurab...@ncbs.res.in> wrote: > Hello, > > does anyone know if its possible and how to fit Brownian evolution with > either reflecting or absorbing boundaries? > > And many thanks for creating this excellent resource! > Saurabh > > -- > Saurabh Mahajan > Research Scholar/Graduate Student > Deepa Agashe's Lab > National Center for Biological Sciences > Bengaluru 560065 > INDIA > > Ph: +91 80 2366 6504 > Mobile: +918050001886 > Email: saurabh...@gmail.com > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] simulate Ne changes on a given phylogeny
Dick Hudson's ms software can simulate gene trees along a species tree or network with migration, changing population size, etc. The package phyclust can call ms. You could then just simulate nucleotides on these gene trees. Best, Brian ___ Brian O'Meara Associate Professor Dept. of Ecology & Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Sat, Oct 17, 2015 at 10:12 PM, Jacob Berv <jakeberv.r.sig.ph...@gmail.com > wrote: > Dear R-sig-phylo, > > I have a somewhat general simulation question and I was hoping someone on > here might have some insight. > > I’m trying to figure out if it’s possible to simulate nucleotide sequence > data (an arbitrary number of neutral loci under a multi species coalescent > model), on an ultrametric input topology (where tips represent species), > with user defined changes in effective population size at the start and end > of a particular internal branch. In my searching I’ve come across some > software by Deren Eaton (https://github.com/dereneaton/simLoci < > https://github.com/dereneaton/simLoci>) that looks like it might do what > I want - but I’m not sure. It looks like I can specify migration events > between taxa, but perhaps not population size changes on internal branches. > There are many other applications for simulating sequence data but I am not > familiar with any of them. Any thoughts would be appreciated! > > Thanks, > > Jacob Berv > > Ph.D. Student > Lovette Lab > Cornell Laboratory of Ornithology > > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Ancestral Reconstruction with Polymorphic States
Well, if you want to have polymorphic ancestors, one way would be to treat it as an ordered character: 0 - 0/1 - 1, so that any movements from 0 to 1 have to go through a polymorphic ancestor. This would entail fixing the rate matrix to have the 0-1 rates forced to be zero (which I *think* is allowed by phytools). Then you could use make.simmap() or similar. If you just want to have ancestors be 0 or 1 only, but treat 01 as a mixture of two states at the tips, you could use corHMM; note, though, that it'll give you estimated states at nodes but not a full stochastic mapping. Another possibility for treating them as polymorphic, and allowing polymorphic ancestors, is to use a biogeographic model (like in biogeoBEARS). It'll require some futzing with the matrices, though, to make all the changes happen along branches rather than at nodes -- unless you think that's how speciation happens (i.e., a polymorphic species with 01 speciates by separating into a 0 monomorphic species and a 1 monomorphic species). Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Mon, Jul 6, 2015 at 1:19 PM, Jackson, Laura Marie laura.jack...@coyotes.usd.edu wrote: Hi all, Is it possible to generate an ancestral reconstruction (i.e., make.simmap in {phytools} using data with polymorphic states (01)? It seems that {phytools} thinks that 01 is a third state, which would be incorrect in the final mapping. When I try this on an example file and tree, it does not accurately color the branches to represent the polymorphic states. Any help with this would be greatly appreciated! simmap - make.simmap(tree, matrix, nsim=100) make.simmap is sampling character histories conditioned on the transition matrix Q = 0 0/1 1 0 -3.715682 2.376705 1.338976 01 2.376705 -2.376705 0.00 11.338976 0.00 -1.338976 (estimated using likelihood); and (mean) root node prior probabilities pi = 0 0/1 1 0.333 0.333 0.333 Done. --- Laura M. Jackson PhD Student Department of Biology 185 Churchill Haines University of South Dakota 414 E. Clark St. Vermillion SD 57069-1746 Email: laura.jack...@usd.edumailto:laura.jack...@usd.edu [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] corDISC binary coding problems
Hi, Franz. Sorry for the delay in answering. Much as I [selfishly] like people to use corDISC, I'm not sure if it's appropriate to your question. It sounds like you have some (unmentioned) binary state, let's call it foo, that you're trying to correlate with a proportion, and so you convert the proportion into two binary states -- is that correct? And you can't observe 00? Having an unobserved state can be problematic for rate estimates (the MLE for 00-01 or 00-10 will be INF, the MLE for 10-00 and 01-00 will be 0). Why not use something like Ives and Garland (2010) to allow correlation between a binary dependent variable and one or more independent variables (implemented in packages such as phylolm)? If you do want do do a Pagel 94 like test, you could discretize the proportion as a multistate character: i.e., 0, 1, 2, 3, 4 = 0:100 gymno:angio, 25:75 g:a, 50:50 g:a, 75:25 g:a, 100:0 g:a. You would then have 10 overall states: foo=0 with proportion 0, foo 0 w prop 1, foo 0 w prop 2, ... foo 1 w prop 5. Convert this into a rate matrix and constrain it (i.e., if you want to use the traditional Pagel 94 assumption of not instantaneous changes in two char at once, set f0p0 - f1p5 to have a rate of zero, etc.; also, do a limited set of rates). If it were me, though, I'd do Ives and Garland or similar (GEE, PGLS, etc.: others might weigh in on the most appropriate way to correlate binary with proportion). As to whether the different thresholds matter, it would depend (partially) on how sensitive the coding is. If all the proportions are 0-10% or 90-100%, a range of cutoffs wouldn't matter. If they're more evenly distributed, you're changing the data a lot, so it should cause different results. It's great you're checking sensitivity to thresholds. If it were me, I'd probably focus on the threshold I thought was most biologically meaningful, but also say (in the pub, not bury in the supplement) that results were very sensitive to the threshold used, and describe the range of results that come from changing the threshold. It's possible that the values you get back vary, but the main result is the same: maybe foo 1 is always positively correlated with gymnosperm substrates, though the magnitude of this varies with the threshold, for example. You're right that you can't model average over different data, and the threshold is effectively changing your data. corDISC should be generally fine for uneven state frequencies, other than the caveat that rate estimates can get very extreme if some states are very low frequency under some models (if 00 is missing, but all rates are equal, the rates will probably be ok, but in a model with free states, the rates will be extreme). It will be less fine for biased sampling: that is, if state 1 and state 0 are occur in frequency A:B in nature, but you have happened to sample very different C:D frequency, you'd get biased estimates. corHMM doesn't have a way to deal with this. You could simulate under your true frequency, subsample down to the frequency in your data, and run the analyses to get an idea of how your results might be affected. If this were a diversification question both diversitree and hisse can deal with uneven sampling that you specify; diversitree does have trait only models, but I'm not sure if they incorporate sampling frequency. Hope this helps. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Fri, Jun 26, 2015 at 3:45 AM, f.k...@mailbox.org f.k...@mailbox.org wrote: Hello, I have used the Pagels 94’ extended version in corDISC with 3 binary traits and 1300 species. To be able to use it, I had to code two binary character out of a continuous. My continuous variable is substrate proportion data. So 100% is gymnosperm exclusivity, 0% is gymnosperm exclusivity. I made two binary character out of it, using different thresholds 10%, 20%, 30%. 40%, 50%. The resulting two character were XY, where X= angiosperm occurrence, Y= gymnosperm occurrence. Therefore a 11 would be occurence on both and e.g. 10 is occurrence primarily on a given substrate, where “primarily” is defined by the threshold. The third binary doesn’t matter in this case. So my first question is: Is it valid to have a state that is coded by two binaries, like 11 that means “occurrence on both” because the meaning of the coding depends on both binaries and is therefore not “independent”? I have a bad feeling about coding that has a “meaning together”... The other question is about the results. I ran corDISC with all 5 different thresholds and the results are super different. Almost all of them range between 10 and 100 (100 was the upper bound). So the general question is: Is the method not reliable for different codings
Re: [R-sig-phylo] Missing States in ML Reconstruction of Discrete Ancestral States (phytools, ape)
corHMM should be able to do this as long as the states are defined in the transition matrix (rate.mat.maker() to start, then modify). BioGeoBEARS could do this (since it has to for biogeography: not all combos of ancestral areas are observed at the tips), and one of its models might be equivalent to what you want. I imagine diversitree can, too. Of course, this all might work best with a model with somewhat constrained rates. If rates are all free, and the model is unordered, the MLE for rate of moving into the unseen state will be zero, and rate for moving out of it will be Inf. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Wed, Jun 17, 2015 at 3:06 PM, David Bapst dwba...@gmail.com wrote: Liam, Ah, I see, rerootingMethod(), unlike ace, will accept a matrix representation of discrete trait values. However, it doesn't appear that one can define a model matrix still? Outside of molecular data, the missing state issue is often for ordered data, hence the need to define a model matrix. library(ape) library(phytools) data(bird.orders) #make a matrix for a 4 step trait model-matrix(0,4,4) for(i in 1:4){for(j in 1:4){ if(abs(i-j)2){ model[i,j]-0.1 } }} rownames(model)-colnames(model)-0:3 #simulate data where you have only observed endmembers trait - factor(c(rep(0, 5), rep(3, 18))) names(trait)-bird.orders$tip.label trait-to.matrix(trait,seq=0:3) rerootingMethod(tree=bird.orders, x=trait) #works rerootingMethod(tree=bird.orders, x=trait, model=model) #fails # Cheers, -Dave On Wed, Jun 17, 2015 at 12:06 PM, Liam J. Revell liam.rev...@umb.edu wrote: Hi David. Actually, if I understand correctly, this does work in phytools. I suspect it can be set up in phangorn as well - Klaus can probably explain how. So, if I understand correctly, you want to get marginal ancestral states for a character that can assume, say, states A, C, G, T, but for which you have only observed states A, C, G (for instance) in your tip taxa? The way to do this in rerootingMethod is as follows: library(phytools) ## simulate some data to work with: tree-pbtree(n=26,tip.label=letters,scale=1) Q-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3) rownames(Q)-colnames(Q)-c(A,C,G) x-sim.history(tree,Q)$states x ## our data ## now convert to a matrix representation: X-to.matrix(x,seq=c(A,C,G,T)) ## reconstruct ancestral states fitER-rerootingMethod(tree,X,model=ER) fitER Note that as we have not observed any evidence that changes to T occur for this character, if we use a symmetric model instead, we will find that the marginal likelihoods at all internal nodes for state T are zero. This is because the maximum likelihood q[i,T] q[T,i] for all i will be zero (sensibly). For instance: fitSYM-rerootingMethod(tree,X,model=SYM) fitSYM Let me know if this is what you had in mind. rerootingMethod just uses modified code from ace internally to do the calculations - so if ace does not permit this presently, it could easily be modified to allow it. phangorn too must allow this because for many nucleotide sites we want to reconstruct ancestral states - but we will have observed only a subset of the four nucleotides at that site. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 6/17/2015 1:46 PM, David Bapst wrote: Hello all, (I'm a troublemaker today.) Sometimes, in ordered discrete data, there are states we know might exist as intermediary between observed states but aren't observed themselves. I suppose this is probably common for meristic data. At least to me, it seems like it should be possible to reconstruct node states in a scenario with 'missing' states in a ML approach by defining the corre Anyway, in a recent conversation with a friend, he noted that he couldn't get the function for ML ancestral reconstruction of discrete characters, specifically ace (in ape) or rerootingMethod (in phytools) to accept data with missing states. I was a little shocked by this, and I didn't believe him in my foolhardiness and investigated myself. # library(ape) data(bird.orders) #make an ordered matrix for a 4 state trait model-matrix(0,4,4) for(i in 1:4){for(j in 1:4){ if(abs(i-j)2){ model[i,j]-0.1 } }} rownames(model)-colnames(model)-0:3 #simulate data where you have only observed endmembers trait - factor(c(rep(0, 5), rep(3, 18
Re: [R-sig-phylo] Bug in reorder.phylo() (related to cleaning phylo objects)
A kludgy solution to achieve Dave's original use case of fixing not-quite-right phylo objects is to do: phy - read.tree(text=write.tree(phy)) It converts the tree to Newick and reads it back in. It's not as good a solution as using a validator (great addition, Emanuel!) to find and fix the general problem, but it can help in a pinch. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Mon, Jun 15, 2015 at 11:29 AM, Emmanuel Paradis emmanuel.para...@ird.fr wrote: Hi David, collapse.singles() seems to work correctly if the phylo object is correctly conformed. Some features of this class may seem annoying (and Klaus and I alredy discussed about this), but these help for other things. As a reminder the class phylo is defined here: http://ape-package.ird.fr/misc/FormatTreeR_24Oct2012.pdf Recently, I put a function on github to help code writters: https://github.com/emmanuelparadis/checkValidPhylo This could help you to detect problems that would be tough to find otherwise. Cheers, Emmanuel Le 12/06/2015 22:41, David Bapst a écrit : Hi Klaus (and others), Ah, I see! The real bug then appears to be in collapse.singles, as it does not reorder the ID numbers in $edge. Here's a quick function to return the root ID: getRootID-function(tree){ uniqueNode-unique(tree$edge[,1]) whichRoot-sapply(uniqueNode,function(x) (sum(x==tree$edge[,2])==0)) rootID-uniqueNode[whichRoot] return(rootID) } And if we run it... getRootID(tree) #original tree [1] 151 getRootID(tree1) #after collapse.singles [1] 151 getRootID(tree2) #after reorder.phylo [1] 131 And we can see that although the number of nodes certainly changed when collapse.singles was run, it didn't change the root node's ID. So I guess this is really a collapse.singles bug report. Thanks for the insight, Klaus! -Dave On Fri, Jun 12, 2015 at 2:27 PM, Klaus Schliep klaus.schl...@gmail.com wrote: Hi David, I found the bug. Somehow ape assumes on the one hand that the root is number of tips + 1. On the other hand the root of an phylo object is the node which is in tree$edge[,1], but not in tree$edge[,2], this is the case even in an unrooted tree. This annoys me for a long time. The root of tree1 with the definition above is actually node 151 and not 131. Changing these solves your problem: Here is a small function to do this for you: switch.nodes- function(tree, a, b){ ntips = length(tree$tip.label) tree$edge[tree$edge==a] = 0L tree$edge[tree$edge==b] = as.integer(a) tree$edge[tree$edge==0L] = as.integer(b) if(!is.null(tree$node.label)){ tmp-tree$node.label[a-ntips] tree$node.label[a-ntips] = tree$node.label[b-ntips] tree$node.label[b-ntips] = tmp } tree } library(phangorn) attr(tree1, order) = NULL getRoot(tree1) tree2 = switch.nodes(tree1, 131, 151) plot(tree2) # now works for me Cheers and have a nice weekend, Klaus On Fri, Jun 12, 2015 at 2:53 PM, David Bapst dwba...@gmail.com wrote: Hello all, As those of you who directly manipulate the guts of phylo objects in your code (or construct new phylo objects whole cloth from un-phylo-like data structures) have probably experienced, it is sometimes easy to create $edge matrices that aren't accepted by ape functions (I often use plot.phylo as my litmus for this). When this occurs, various things can be done; I usually do the following sequence: collapse.singles() reorder.phylo(,cladewise) read.tree(text=write.tree(,file=NULL)) ...or something along those lines. However, I've run into an issue where reorder.phylo() returns what is essentially a scrap of the original $edge matrix, without warning. Very worrisome! I'm using R version 3.2.0 and ape 3.3. Here's, my script, including a call to a saved object on my website, so that the error can be reproduced: (Why do I have an .Rdata object listed with a .txt extension, you might wonder? Because my school apparently sanitizes file extensions on our hosted websites that it thinks are executables, archives or doesn't recognize, but doesn't bother to check file innards when it does recognize the extensions. Its easily hacked, at least.) # library(ape) load(url(http://webpages.sdsmt.edu/~dbapst/weirdTree_06-12-15.txt;)) #can I plot it? plot(tree) #nope tree1-collapse.singles(tree) #any single-child nodes left? sum(sapply(unique(tree1$edge[,1]),function(x) sum(x==tree1$edge[,1])==1)) #nope #can I plot it? plot(tree1) #nope #now reorder tree2-reorder.phylo(tree1,cladewise) tree2$edge #now reorder with postorder tree2-reorder.phylo(tree1,postorder) tree2$edge
Re: [R-sig-phylo] clade-specific release and radiate model?
Hi, Daniel. It's a bit arguable whether as alpha - 0, OU - BM: I think it should, but IIRC in OUCH this doesn't happen, and that's a deliberate choice. That said, I think that an OU with alpha near zero would be ok for your question, though you might want to think about how to penalize parameters (that is, for that clade there'd be an OU mean parameter that is unidentifiable (alpha of zero, so no pull, so no evidence for it): should you count this as a parameter when doing model comparison? I'd argue no: you're doing OU with alpha of zero as a kludgy hack to treat it as BM). Another approach would be to resuscitate the censored model of O'Meara et al. 2006. Slice your tree on the edge leading to the released clade (I guess this truly releases the clade to roam free of its relatives) so you have the paraphyletic non-released set and the released clade as separate trees. Then you can try fitting the same or different models to the two trees. The downside of this is that you must use an additional ancestral state (at the MRCA of the released clade); the upside is that any weird changes happening on the edge leading to the released clade aren't in the model and so don't affect the fit (you could imagine that whatever led the clade to be ecologically released happened somewhere on the stem edge, but you don't know where, and it could be associated with a major single shift in your continuous trait, too). You could try an OU on the non-released tree (let's call this A) and an OU on the released clade (B), OU on A and BM on B, etc. The only practical difficulty with this is constraining the cases where A and B are supposed to have the same model: by default, optimization will happen separately in different trees, but you can create a wrapper function that calls OUwie.fixed() separately on A and B but with the same parameters and adds the likelihood and then optimize the parameters in this wrapper function. Hope this helps, Brian On Fri, May 29, 2015 at 2:02 AM, Daniel Fulop dfulop@gmail.com wrote: Hi All, Do you know if there are any methods out there for fitting ecological release and release and radiate models that are clade-specific? That is, in which the change in mode (OU to BM) happens at the root of a clade instead of at specific time for the whole phylogeny (as in Slater 2013). As far as I know the closest out there are models in OUwie, say with a clade-specific second OU process with a very low alpha. However, I don't think that biologically OU is a good model for the trait and clade in question (though it is for the rest of the tree). I know that at the limit as alpha - 0 OU converts to BM, but I would ideally still like compare standard models' fits (including OUwie models) to the fits of clade-specific release models. Any leads or suggestions would be much appreciated, especially about how to implement these clade-specific models with current tools or about how to roll my own. Thanks! Dan. -- Daniel Fulop, Ph.D. Postdoctoral Scholar Dept. Plant Biology, UC Davis Maloof Lab, Rm. 2220 Life Sciences Addition, One Shields Ave. Davis, CA 95616 ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] ancestor vs. change plots
In Paleo, you can (well, arguably) see the ancestor and descendent pair, so you can assess the amount of change and the ancestral state. Using just info at the tips, you are inferring from N taxa N-1 ancestral states and 2N-2 rates of change. That seems a lot to ask from data, especially since the rates and states affect each other and so aren't independent. You're also using a model that assumes an even rate of change over the tree, so an ancestor will tend to be between its descendants (exactly between, if you have a two taxon tree with coeval taxa). You could try more exotic models that allow multiple Brownian rates or even multiple Ornstein-Uhlenbeck rates, but there's a limit there, as well. There are Bayesian approaches that can paint a smear (estimated from post-burnin samples) of rates over multiple branches, but I'd be careful interpreting them as robust estimates of rate that are independent of the states. If you have separate traits, you could ask does trait X lead to higher rates (or OU mean, or OU alpha) of evolution in trait Y. Basically this involves painting a tree with discrete trait X (generalist vs specialist predator, say) and estimating different rates for the other trait (i.e., mouth volume) on the branches painted in different states (and seeing if these rates are biologically and statistically significantly different). Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Fri, Apr 24, 2015 at 3:14 PM, Milton Tan mzt0...@tigermail.auburn.edu wrote: Hello all, I have a question that is perhaps esoteric, since it's on a method I don't see used often. I am looking at the dynamics of body size evolution, and have come upon ancestor-vs-change plots described in Alroy 2000 (Understanding the dynamics of evolutionary trends, Paleobiology). This is interesting because it will allow me to see if rate of body size change depends on body size. I haven't seen this method widely used, so for anyone unaware how this works: for each branch, you plot the ancestral state vs. the amount/rate of change along the branch. In theory, by looking at a scatterplot of ancestral size vs. change, I can answer questions such as Are smaller taxa more likely to evolve to a larger size? I don't see too many people using this method, so I thought I'd ask why. Is there a particular reason for it not being used? Are there more powerful methods to answer this same question that have supplanted it? I also have a more specific question, assuming that ancestor-vs-change plots are valid. For my dataset, I have reconstructed ancestral states for my clade from tip data and pic, and then plotted ancestral states versus the rate of change along every branch (attached). While there are a couple outliers, you can see the distribution roughly hangs around 0 along the entire graph, but I'd like to fit a line so I can present if the slope significantly differs from 0 (ie. body size evolutionary rate truly does not depend on body size of ancestors). Alroy (2000) describes that the shape of the plot can be informative on different dynamics of evolutionary change, implying fitting of linear and polynomial lines, but doesn't really discuss how to test this statistically. Alroy seems to use this method and fits a polynomial line in Alroy 1998 (Cope's Rule and the Dynamics of Body Mass Evolution in North American Fossil Mammals, Science), but he doesn't really describe his statistics ! in depth. So my question is how can I fit a line to this data? Problematically, since each branch has a sister branch that shares the same ancestral node, each ancestral state is represented twice. I think this makes the observations non-independent. If they are, this makes me think that a linear regression is inappropriate, but I'm not sure. I could average the amount of change along each branch for every ancestral node, but I'm not sure if that's the best way. Does anyone have any insight on an appropriate way to determine a best fit line statistically? Thanks in advance, Milton Tan Auburn University Department of Biological Sciences PhD Candidate [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Unsual values of alpha in OU models
You can select between models using AICc, but then you can look at the reliability, more robustly than using eigenvalues, by using OUwie.boot() to do parametric bootstrapping. Another very useful thing done by Hansen, Bartoszek, and colleagues is to do a contour plot around the maximum likelihood estimate to see how much a parameter like alpha can vary without changing the likelihood much. As Aaron indicated, this can be a horrifyingly large region. It's especially tough with OUMA and OUMVA where there are multiple alphas. One thing to consider is what kind of model is most likely to give you results you trust and that are relevant for your question. It could be that you care about OUM-type models, but don't really care if it's OUMA, OUM, or OUMVA. You should certainly report that, say, OUMVA is the best fitting under AICc, but perhaps if your biological question is about the optima only you focus on OUM due to OUMVA seeming to have issues estimating any parameters well (though of course also indicate if the results under OUMVA generally agree or disagree with the conclusions from OUM -- you need to show that it's not that OUMVA fit better but gave you an answer you didn't like, so you cherry picked a different model to get the result you want). Hope this helps, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Fri, Mar 27, 2015 at 12:57 PM, Diego Salazar Tortosa dsala...@ugr.es wrote: Thank you for your answer Aaron. Clear to me that this situation is common in the OU models, but I don't know what criteria I should use when select between different OU models, AICc or reliability of parameters? Thanks in advance Regards Diego Salazar 2015-03-27 11:08 GMT+00:00 Aaron King kin...@umich.edu: In many situations, the OU model parameter alpha is not well identified. This has been pointed out before, but it's quite common for the parameter values (and not just alpha, though that's typically the worst) to be poorly identified, even if the model selection is unambiguous. OU model parameters can be well identified under some circumstances (detailed in the paper) but when they are not, estimates can be *very* unreliable. On Fri, Mar 27, 2015 at 4:07 AM, Diego Salazar Tortosa dsala...@ugr.es wrote: Hi, I am trying to analyze whether the evolution of a continuous trait is state-dependent of discrete trait with three levels, through OU models. My phylogeny has 113 species and 112 internal nodes. Clearly the most parsimonious models according AICc are OUMA and OUMVA but have alpha values between 2.5 and 4, are these values usual or too high? Also some eigenvuales obtained with diagn=T are negative for both models, althought standard errors are very small. Are these models valid? Thanks in advance Diego Salazar -- Diego Francisco Salazar Tortosa Ph student Departamento de Ecología Facultad de Ciencias Universidad de Granada Av. Fuente Nueva s/n 18071 Granada Telefono: +34 958241000 ext 20007 Movil: +34 634851132 email: dsala...@ugr.es die...@correo.ugr.es dftort...@gmail.com --- Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es Ud. el destinatario indicado, queda notificado de que la utilización, divulgación o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, se ruega lo comunique inmediatamente por esta misma vía y proceda a su destrucción. This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it. -- [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ -- Aaron A. King, Ph.D. Ecology Evolutionary Biology Mathematics Center for the Study of Complex Systems University of Michigan GPG Public Key: 0x15780975 -- Diego Francisco Salazar Tortosa Ph student Department of Ecology University of Granada Av. Fuente Nueva s/n 18071 Granada Telefono: +34 958241000 ext 20007 Movil
Re: [R-sig-phylo] assign multiple discrete states to single tip
There's also the rayDISC function in the corHMM package. It can take in polymorphic data (so you can have some taxa red, some white, some red white). Note that along the tree, it assumes that there's one state. If you wanted ancestral nodes to be validly polymorphic (not just equal chances of red and white, but a character red+white), you'd need to do as Liam suggests and create a polymorphic trait and an ordered state matrix. This is possible in ace() in ape, rayDISC() in corHMM, and probably others. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Wed, Mar 25, 2015 at 10:49 AM, Liam J. Revell liam.rev...@umb.edu wrote: Hi Matt. What is the purpose of this? To visualize the states on the tree? Or to conduct some kind of analysis (modeling evolution, reconstructing ancestral states, etc.) on the tree? If the former, then you can use tiplabels in the ape package, and supply a matrix to the argument pie in which each row is a tip taxon and each column is a state. For non-polymorphic species you can just put a 1 in the column for the state of that taxon, and zero otherwise; and for polymorphic taxa you can put 1/(number of states in that species) - or the state frequencies, if you have them - for each state found in that species, and zero otherwise. If you're interested in, for example, reconstructing ancestral states, then it may in fact make the most sense to treat polymorphism as another state. In that case, you must decide if you will, for instance, require that transitions from A - B first pass through state AB, etc. This kind of constraint is fairly straightforward to implement in ape's ace function. Let us know I will see if I can give you any more concrete suggestions. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 3/25/2015 7:38 AM, Koski, Matthew H wrote: Hi, I'd like to assign multiple discrete character states to tips that are polymorphic. E.g., color in a given species is polymorphic (pink or white) so it is assigned two states for color. Is there a way to do this without creating an extra 'polymorphic' category? There is a similar question on stackoverflow with no responses yet: http://stackoverflow.com/questions/27413070/how-to- assign-multiple-states-for-taxa-in-discrete-character-data-set? ? ?Thanks in advance, Matt Matthew Koski University of Pittsburgh Biological Sciences Ashman Lab koskimatt.wix.com/matthewhkoskihttp://koskimatt.wix.com/matthewhkoski [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r- sig-ph...@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r- sig-ph...@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] fitDiscrete across multiple datasets
You can calculate the likelihood of the data under a given transformation: transform the tree with a delta of 0.3 or whatever, then calculate the likelihood of the data under a Brownian motion model using this transformed tree. This is the same likelihood as calculating the likelihood of the data under a delta model directly (assuming the same delta). However, what you can do is apply the same transformation to all your datasets and add the likelihood. This becomes a new likelihood function. You can then optimize this (optim, nloptr, etc.). I can rig up a working example later tonight -- off to teach now. Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Mon, Oct 27, 2014 at 9:34 AM, Luke Matthews lmatth...@activatenetworks.net wrote: HI Ben, I too am curious if anyone has an R answer to this question you pose. One non-R way I can think of is to put the data into MrBayes, which has a number of different models available. You could probably effectively test some models not baked in MrBayes by systematically manipulating the branchlengths you submit to it. MrBayes provides various options for how tightly coupled the parameter estimates should be across the different genes. It would seem something similar might exist within R but I'm not aware of any. Best Luke Luke J. Matthews | Senior Scientific Director | Activate Networks, Inc. -- Message: 2 Date: Fri, 24 Oct 2014 16:23:58 -0400 From: Benjamin Furman benjamin.ls.fur...@gmail.com To: r-sig-phylo@r-project.org Subject: [R-sig-phylo] fitDiscrete across multiple datasets Message-ID: CACyKK5XATvW36iW_MfAR5x7LZzZjG+AAN+rtY2V7jO5VPBY5= g...@mail.gmail.com Content-Type: text/plain; charset=UTF-8 Hello Everyone, I have a tree and discrete data (number of gene copies, for many genes) and would like to use the fitDiscrete function in geiger, or something similar. However, I would like to estimate the parameters given all of the datasets, not just with the data for each gene. For instance, if I was using the delta model to vary rates across the tree, I would like this delta value to reflect some sort of summary value across all datasets. Does anyone have an idea as to how this could be accomplished or perhaps point me in the right direction? Thank you for any guidance, Ben -- Benjamin Furman, B.Sc. Specialization Ph.D. Candidate, Evans Lab http://benevanslab.wordpress.com McMaster University Twitter: @Xen_Ben Email: benjamin.ls.fur...@gmail.com, furma...@mcmaster.ca website: http://benjaminfurman.wordpress.com [[alternative HTML version deleted]] -- ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo End of R-sig-phylo Digest, Vol 81, Issue 12 ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] fitDiscrete across multiple datasets
Well, a model with different rates (but same tree scaling) for different characters is nested in a model with same rates (and same tree scaling) for different characters, so you could do a likelihood ratio test to see which model is better (though of course AIC et al. are also possible). See the appendix of this article (http://www.genetics.org/content/177/3/1395.long) for some discussion of this in the context of comparative methods (though the method hasn't been used much, due to difficult implementation). Of course, the field has been been doing partitioned and nonpartitioned models for tree inference for years, just not for comparative methods as much. In general, I like Ben's idea to use multiple characters to get at shared rates; we should see more of this. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Mon, Oct 27, 2014 at 3:16 PM, Liam J. Revell liam.rev...@umb.edu wrote: Luke's points are correct. However, if we wanted to estimate a common transition matrix across characters we could just substitute optim.pml for a phyDat object of type=USER for ace internally; and if we wanted to estimate different parameter values across different data columns, we could just run fitTransform separately on each column of X and sum the log-likelihoods. I don't know whether either of these things (or the main exercise) are a good idea, but they are all certainly possible. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 10/27/2014 2:57 PM, Benjamin Furman wrote: Hello All, I would also like to echo a thank you for the responses. They have been helpful. As for your summary Luke, I'll have to let the experts comment on that. Ben On Mon, Oct 27, 2014 at 1:49 PM, Luke Matthews lmatth...@activatenetworks.net mailto:lmatth...@activatenetworks.net wrote: Hi all, Thanks Brian and Liam for these very cool responses. It does make sense that one should be able to iterate through the characters and add the likelihoods, which is elegant and seems appropriate. It seems to me what this process accomplishes is: 1. Completely couples the estimation of delta across the characters - thus we are now picking the value for delta that maximizes the likelihood across all characters not allowing any variation in delta across them. 2. Leaves completely uncoupled the transition rate estimates across the different characters. Thus, the different characters are freely estimating rates of transition from 2 to 3 gene copies, etc. such that the rate for one gene has no influence on the rate for another. Do I have that right? Depending on Ben's hypothesized processes those may be reasonable assumptions. I can also imagine processes where somewhat decoupling the delta estimation or somewhat coupling the rate estimations would be reasonable. Best Luke -Original Message- From: Liam J. Revell [mailto:liam.rev...@umb.edu mailto:liam.rev...@umb.edu] Sent: Monday, October 27, 2014 1:26 PM To: omeara.br...@gmail.com mailto:omeara.br...@gmail.com; Luke Matthews Cc: r-sig-phylo@r-project.org mailto:r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] fitDiscrete across multiple datasets Hi all. Here is a link to code that does what Brian suggests: http://blog.phytools.org/2014/10/optimizing-tree- transformations-for.html Note that (as noted in the blog post) I have arbitrarily used ace internally to compute the discrete character log-likelihood, because it is fast. You could instead change the code in minor ways and use fitDiscrete, which is slower but should be more robust. The demo is fairly explicit, but let us know if it works or if I have made any errors. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu mailto:liam.rev...@umb.edu blog: http://blog.phytools.org On 10/27/2014 7:54 AM, Brian O'Meara wrote: You can calculate the likelihood of the data under a given transformation: transform the tree with a delta of 0.3 or whatever, then calculate the likelihood of the data under a Brownian motion model using this transformed tree. This is the same likelihood as calculating the likelihood of the data under a delta model directly (assuming the same delta). However, what you can do is apply the same
Re: [R-sig-phylo] tip order in phylogeny - How does it work?
Look at reorder.phylo() in ape and reorder() in phylo4: your desired tip order may be there. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Wed, Oct 8, 2014 at 4:45 AM, Juan Antonio Balbuena j.a.balbu...@uv.es wrote: Hello I am working on a simulation that requires adding a number of random tips to multiple phylogenies: treeP - read.nexus(file= my file.tre) # 5K trees used to build a consensus tree (treeP is thus a multiPhylo object) NT = 30 # number of taxa in phylogeny N.dup = 3 # number of tips (duplications) I want to add to the trees sites - sort(sample(1:NT, N.dup), decreasing=TRUE) # Random tips to duplicate dup.labels - treeP[[1]]$tip.label[sites] treeP - .uncompressTipLabel(treeP) # # Add duplicates to branches following a Liam Revell recipe published in his blog: # for (n in 1:length(treeP)) { for (i in 1:N.dup ) { pos - runif(n=1)* treeP[[n]]$edge.length[which(treeP[[n]]$edge[,2]==sites[i])] edl - runif(n=1)* treeP[[n]]$edge.length[which(treeP[[n]]$edge[,2]==sites[i])] treeP[[n]] - bind.tip(treeP[[n]], tip.label=paste(dup.labels[i], 1, sep=.), edge.length=edl, where=sites[i], position=pos) } } # treeP- .compressTipLabel(treeP) # Check tip labels: treeP$tip.label $tip.label [1] S24 S25 S9S15 S3S21 S12 S17 S19 S30 S22 S4S16 S27 S10 S23 S1 S2S14 [20] S6S8S26 S29 S13 S20 S7S28 S18 S5S11 S18.1 S20.1 S16.1 When I plot any tree in treeP, S18.1, S20.1 and S16.1 appear together with S18, S20 and S16, as intended. So why are they at the end of the array above? This behaviour makes makes my life difficult because it forces me to rearrange a number of matrices to make them match this tip label order. I would appreciate if a kind soul can explain why it is that way and whether is possible to reorder the tip labels, placing each added tip together with its respective counterpart? Thanks in advance for your help. Juan Antonio -- Dr. Juan A. Balbuena Cavanilles Institute of Biodiversity and Evolutionary Biology University of Valencia http://www.uv.es/~balbuena P.O. Box 22085 http://www.uv.es/cophylpaco http://www.uv.es/cavanilles/zoomarin/index.htm 46071 Valencia, Spain e-mail: j.a.balbu...@uv.estel. +34 963 543 658fax +34 963 543 733 *NOTE!* For shipments by EXPRESS COURIER use the following street address: C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain. ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Tree-for-all hackathon
This might be of interest for various R phylogenetics programmers: there's currently no package to access this info. Title Tree-for-All: A hackathon to access OpenTreeâs global phylogeny resources Content A global âtree of lifeâ will transform biological research in a broad range of disciplines from ecology to bioengineering. To help facilitate that transformation, the OpenTree project [1] now provides online access to 4000 published phylogenies, and a newly generated tree covering more than 2.5 million species. The next step is to build tools to enable the community to use these resources. To meet this aim, OpenTree, Arbor [2] and NESCentâs HIP working groups [3] are staging a week-long hackathon September 15 to 19 at U. Michigan, Ann Arbor. Participants in this âTree-for-allâ will work in small teams to develop tools that use OpenTreeâs web services to extract, annotate, or add data in ways useful to the community. Teams also may focus on testing, expanding and documenting the web services. How could a global phylogeny be useful in your research or teaching? What other data from OpenTree would be valuable? How could OpenTree web services be integrated into familiar workflows and analysis tools? How could we add to the database of published trees, or enrich it with annotations? If you can imagine using these resources, and you have the skills to work collaboratively to turn those ideas into products (as a coder, or working side-by-side with coders), we invite you to apply for the hackathon. The full call for participation (http://bit.ly/1ioPPMc) provides instructions for how to apply, and how to share your ideas with potential teammates (strongly encouraged prior to applying). Applications are due July 8th. Travel support is provided. Women and underrepresented minorities are especially encouraged to apply. If you have questions, contact Karen Cranston (karen.crans...@nescent.org, @kcranstn, OpenTree), Arlin Stoltzfus (ar...@umd.edu, HIP), Julie Allen ( juli...@illinois.edu, HIP), or Luke Harmon (lu...@uidaho.edu, Arbor). [1] http://www.opentreeoflife.org [2] http://www.arborworkflows.com/ [3] http://www.evoio.org/wiki/HIP (Hackathons, Interoperability, Phylogenies) ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Phylogenetic signal, variation in rates and sample size
For rate estimation between temperate and tropical, I'd suggest using OUwie, as it's made for exactly this question: map tropical/temperate on the tree, try the BMS model to estimate the rate of evolution in temperate and tropical taxa (note that I'm one of the authors, so take this bias into account). There are other models, too, to allow you to test things like whether temperate and tropical taxa have different evolutionary means, and you can compare these models to Brownian motion models, depending on your question. Note that in theory, these methods should be robust to random sampling of taxa. For your first question, you could estimate the amount of phylogenetic signal, create a much larger tree under some branching process, transform the branch lengths to match your phylogenetic signal, simulate traits on this transformed tree, then subsample back and estimate the phylogenetic signal on the untransformed but pruned simulated tree. Ideally, the parameter estimate you get from the simulated data matches the true parameter, suggesting that sampling does not matter for phylogenetic signal. Frankly, I'd be surprised if random sampling does matter for signal, though some biased sampling (even from taxonomic bias, of not separating two populations into different species until they are different enough) could have an effect. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Mon, May 5, 2014 at 11:33 AM, dga...@huskers.unl.edu dga...@huskers.unl.edu wrote: Hi Imran, I'm by no means an expert on phylogenetic signal with incomplete phylogenies but I would check out the GEIGER package, specifically the mecca function. I think mecca is set up to do estimates based on incomplete tip sampling. Cheers, -Dan Gates PhD candidate Smith/Pilson Lab University of Colorado/Nebraska http://ec2-54-186-114-109.us-west-2.compute.amazonaws.com:8080/DanielJGates From: r-sig-phylo-boun...@r-project.org r-sig-phylo-boun...@r-project.org on behalf of imran khaliq imrankhal...@hotmail.com Sent: Monday, May 5, 2014 5:15 AM To: r-sig-phylo@r-project.org Subject: [R-sig-phylo] Phylogenetic signal, variation in rates and sample size Dear All, I am trying to understand the effects of incomplete phylogenies on the phylogenetic signal. I am using different trees and I have at least 100 species in each tree but still these are only 2 % of total species. So how can I be sure that results are generalize enough to apply on the whole group? Secondly, if I want to compare varaition in trait's rates in two groups e.g. tropical and temperate, what methods are appropriate to estiamte the evolutionary rates. I tried Brownian rate parameter estimation by tranforming the branch lengths according to the lambda values of trait in each group. It gives me lower varaince for the group that has higher lambda value and that is confusing. In fact with the same trait values, if I transform the branch lengths of the tree with different lamda values it gives me higher varaince for higher lambda value. Can anyone help me? Best regards, Imran [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Fit BM, OU (1 theta), OU (2 theta) with the same function
There are at least three packages to do this: OUCH, SLOUCH, and OUwie. OUCH and OUwie are on CRAN; for SLOUCH, go to http://freshpond.org/software/SLOUCH/ . Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Jan. 1, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Tue, Feb 11, 2014 at 3:05 PM, Santiago Sánchez santiago.snc...@gmail.com wrote: Hello list, Does someone know if there is a function out there that can fit (similar to fitContinuous() in geiger and compar.ou() in ape) OU models with multiple optima (theta) as well as a BM model. I'm not sure if fitContinuous() allows OU with multiple theta, and, as far as I know there is no compar.bm in ape. Any help is greatly appreciated. Cheers, Santiago -- Santiago Sánchez-Ramírez Department of Ecology and Evolutionary Biology, University of Toronto Department of Natural History (Mycology), Royal Ontario Museum 100 Queen's Park Toronto, ON M5S 2C6 Canada Other email: santiago.sanc...@mail.utoronto.ca Tel. 416-586-8025 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Fit BM, OU (1 theta), OU (2 theta) with the same function
Make sure you're using version 1.39. Version 1.38 introduced this error; it was fixed about a week later (just a few days ago), but that version may not have made it out to all the cran mirrors yet. http://cran.rstudio.comis the mirror I've been directing people to most recently; it's updated frequently, plus for developers it lets you see how often your packages are downloaded. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Jan. 1, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Tue, Feb 11, 2014 at 9:09 PM, Santiago Sánchez santiago.snc...@gmail.com wrote: Hi Brian, Thank you for pointing these out. I've been playing around with OUwie and I ran into an error which I couldn't solve. This comes out when fitting any OU model implemented in OUwie, and it doesn't happen with BM models (they run fine): test - OUwie(phy, data, model=OU1) Initializing... Finished. Begin thorough search... Finished. Summarizing results. Error in svd(m) : infinite or missing values in 'x' I would appreciate if you could give me some suggestions. Cheers, Santiago 2014-02-11 15:09 GMT-05:00 Brian O'Meara bome...@utk.edu: There are at least three packages to do this: OUCH, SLOUCH, and OUwie. OUCH and OUwie are on CRAN; for SLOUCH, go to http://freshpond.org/software/SLOUCH/ . Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Jan. 1, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Tue, Feb 11, 2014 at 3:05 PM, Santiago Sánchez santiago.snc...@gmail.com wrote: Hello list, Does someone know if there is a function out there that can fit (similar to fitContinuous() in geiger and compar.ou() in ape) OU models with multiple optima (theta) as well as a BM model. I'm not sure if fitContinuous() allows OU with multiple theta, and, as far as I know there is no compar.bm in ape. Any help is greatly appreciated. Cheers, Santiago -- Santiago Sánchez-Ramírez Department of Ecology and Evolutionary Biology, University of Toronto Department of Natural History (Mycology), Royal Ontario Museum 100 Queen's Park Toronto, ON M5S 2C6 Canada Other email: santiago.sanc...@mail.utoronto.ca Tel. 416-586-8025 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ -- Santiago Sánchez-Ramírez Department of Ecology and Evolutionary Biology, University of Toronto Department of Natural History (Mycology), Royal Ontario Museum 100 Queen's Park Toronto, ON M5S 2C6 Canada Other email: santiago.sanc...@mail.utoronto.ca Tel. 416-586-8025 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] OUwie diagn=T Error in dim(edges)
It's likely that something is wrong with your tree, especially mapping of regimes. However, another possibility is that there is a problem computing the Hessian matrix. Without your tree and data, it would be hard to tell. You might also try doing contour plots. They take more time, but they're a much better estimate of uncertainty than the Hessian approach. We used to have code to do this in OUwie until CRAN maintainers got upset (the code required akima, which has a more restrictive license than OUwie's). You (or anyone else) can still get the contour plotting code at https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/R/OUwie.contour.R?revision=133root=ouwie. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Jan. 1, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Fri, Jan 24, 2014 at 9:00 AM, Maria Ghazali m_ghaz...@ukr.net wrote: Hi, Please, explain me what I am doing wrong in calculating OUwie (OUwie version 1.37). I want to run some additional diagnostics (with diagn=T) to estimate standard errors of alpha and sigma (the element solution.se). But an error notification appears. Example: data(tworegime) model.tworegime=OUwie(tree,trait,model=c(OUMV),root.station=TRUE,diagn=T) #Initializing... #Finished. Begin thorough search... #Finished. Summarizing results. #Error in dim(edges) : 'edges' is missing Thank you! [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] compar.ou
Often if you have a trait constrained to be positive, it's appropriate to log-transform it, which has the happy effect of making the theta for the trait when converted back to a non-log scale constrained to be positive (as well as probably being more appropriate for how your trait evolves). You can have theta outside the range of observed values, but this should only happen in something like OU1 if your tree is not ultrametric (ultrametric = equal root to tip length, as with a chronogram of extant species), and even then is unlikely (basically, it would only happen if you have something like a trend). You could do a constrained search by wrapping OUwie.fixed() in an optimizer where you return a really bad, but finite, likelihood if you exceed one of your bounds (or use an optimizer that allows setting bounds), or, perhaps easier, modify the OUwie() function to return a bad likelihood for particular values of theta. But my first suggestion is to think about whether your data are better log-transformed (i.e, under Brownian motion, is an increase or decrease of 10% of the present value equally likely regardless of trait value (in which case, log transformation makes sense)). You also may want to check that the branch lengths you are using make sense for your problem (OU, BM, etc. models are largely (not entirely) about transforming branch lengths on trees, so you want to make sure they are sensible for your question to start with: do you care about rates with time units or amount of molecular change?). Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Tue, Nov 12, 2013 at 6:39 AM, sandra goutte gou...@mnhn.fr wrote: Dear all, Following up this conversation; Using OUwie, i get reasonable values for theta under several models for all my traits, but one. In this case, this is a trait which can only take positive values and i get negative theta values for all the different models, including BM and OU1. the standard error is not big enough to consider my theta value to be around zero. (for example: theta = -2.279, se= 0.209). Any idea of what could be causing this and if it is possible to force OUwie to estimate theta within certain boundaries? Thank you, Sandra. 2013/10/29 Marguerite Butler mbutler...@gmail.com Hi Sandra and others, You can also assess confidence using parametric bootstrap, a procedure which we generally recommend for all users. ouch has built-in facilities to do so (the bootstrap() and simulate() functions in addition to update() ). I think there are examples in my tutorial. If not, let me know and I can send you another. Marguerite On Oct 29, 2013, at 6:08 AM, Cecile Ane a...@stat.wisc.edu wrote: FYI, we have some theory to explain why alpha has large standard errors and in which conditions. As Brian says, it comes with a flat likelihood with respect with alpha. http://dx.doi.org/10.1214/13-AOS1105 or http://www.stat.wisc.edu/~ane/publis/2013HoAne_AoS.pdf On 10/29/2013 09:46 AM, Brian O'Meara wrote: In at least the OUwie paper we spent a lot of time doing simulations to determine this empirically (this may have been examined in other papers, too, though none come to mind). Alpha can be estimated, but sometimes with scarily large standard errors (but not always). This property should hold for related methods (it's a property of the likelihood surface for these models). You can just plot the alpha values vs likelihood to get a sense of this surface for your dataset (ideally, estimating the other parameters for each fixed alpha, which is possible, though the kludge of holding the other parameters at their MLE and just varying alpha will give you a sense of the surface, though it's a bit non-conservative). We had code in OUwie to do a contour plot of the likelihood surface but took it out (a small licensing difference between akima (university license) and OUwie (GPL) led to OUwie being pulled from CRAN by the CRAN maintainers until we resolved the difference). Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Tue, Oct 29, 2013 at 10:21 AM, sandra goutte gou...@mnhn.fr wrote: Thank you Marguerite. Looking at OUwie and OUCH/SLOUCH, i see that alpha is estimated along the other parameters, whereas in Hansen 1997 and other papers
Re: [R-sig-phylo] compar.ou
In at least the OUwie paper we spent a lot of time doing simulations to determine this empirically (this may have been examined in other papers, too, though none come to mind). Alpha can be estimated, but sometimes with scarily large standard errors (but not always). This property should hold for related methods (it's a property of the likelihood surface for these models). You can just plot the alpha values vs likelihood to get a sense of this surface for your dataset (ideally, estimating the other parameters for each fixed alpha, which is possible, though the kludge of holding the other parameters at their MLE and just varying alpha will give you a sense of the surface, though it's a bit non-conservative). We had code in OUwie to do a contour plot of the likelihood surface but took it out (a small licensing difference between akima (university license) and OUwie (GPL) led to OUwie being pulled from CRAN by the CRAN maintainers until we resolved the difference). Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Tue, Oct 29, 2013 at 10:21 AM, sandra goutte gou...@mnhn.fr wrote: Thank you Marguerite. Looking at OUwie and OUCH/SLOUCH, i see that alpha is estimated along the other parameters, whereas in Hansen 1997 and other papers it is suggested that this would lead to very large standard errors. Is that problem resolved in these functions? Best, Sandra. 2013/10/26 Marguerite Butler mbutler...@gmail.com Dear Sandra, You might also want to look at the papers that go along with slouch, ouch, and ouwie. Here are some pdfs, along with some tutorials. On my Rcompstart, the ouch stuff starts around page 165 Let me know if you need help with ouch. Marguerite On Oct 24, 2013, at 7:03 AM, sandra goutte gou...@mnhn.fr wrote: Dear list, My aim is to compare the fit of models for which *theta* is allowed to change at different nodes (different combinations of 1 ,2 or 3 nodes). I don't really understand the calculation of the deviance, but if i'm not mistaken the difference between the deviances of 2 models follows a *chi*square distribution with the difference of parameters in the models as degree of freedom. Now, how to compare two models with the same number of optimum changes but at different nodes? I am having a hard time understanding the details of the function with regards to Hansen's (1997) paper. What is the relationship between the deviance and the RSS from Hansen (1997)? Is it possible to calculate the RSS from the output of compar.ou ? I would appreciate any advice or insight on the subject! Sandra. -- PhD Student Muséum National d'Histoire Naturelle Département Systématique et Évolution USM 601 / UMR 7205 Origine, Structure et Évolution de la Biodiversité Reptiles Amphibiens - Case Postale 30 25 rue Cuvier F-75005 Paris Tel : +33 (0) 1 40 79 34 90 Mobile: +33 (0) 6 79 20 29 99 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ Marguerite A. Butler Associate Professor Department of Biology University of Hawaii 2538 McCarthy Mall, Edmondson Hall 216 Honolulu HI 96822 Office: 808-956-4713 Dept: 808-956-8617 Lab: 808-956-5867 FAX: 808-956-9812 http://www.hawaii.edu/zoology/faculty/butler.html http://www2.hawaii.edu/~mbutler http://www.hawaii.edu/zoology/ -- PhD Student Muséum National d'Histoire Naturelle Département Systématique et Évolution USM 601 / UMR 7205 Origine, Structure et Évolution de la Biodiversité Reptiles Amphibiens - Case Postale 30 25 rue Cuvier F-75005 Paris Tel : +33 (0) 1 40 79 34 90 Mobile: +33 (0) 6 79 20 29 99 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Error in OUwie
The error happens after it does the calculation for the point estimates, so it's likely happening when it starts doing the calculations to look at the curvature at the solution (to see if it's a maximum rather than a saddle point and to get an estimate of standard errors). Doing OU1 - OUwie(tree, data, model = OU1, diagn=FALSE) will turn off these extra diagnostics for now. Note there's a dedicated OUwie support list at https://groups.google.com/forum/#!forum/ouwie-discuss , too. People often post to dedicated lists first before R-sig-phylo, but such lists can be hard to find (I think that's true in this case) or are inactive. One other advantage of R-sig-phylo posts for help is that you're more likely to hear about other package alternatives (i.e., Have you tried mvSLOUCH or OUCH instead?). Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Tue, Oct 29, 2013 at 4:20 PM, Christopher D. Muir cdm...@biodiversity.ubc.ca wrote: All, This my first time posting to R-sig-anything, so apologies for any breach in etiquette. I searched through the mailing list and couldnt find anything about this problem. A brief description of my data: I am running OUwie on a 528-taxon megatree generated using phylomatic. I made the tree dichotomously branching using multi2di, then assigned short branch lengths to these so that I could estimate ancestral character states (regimes in the OU model) to each node. OUwie works fine with my dataset for the BM1 and BMS models, but for all of the OU models, I get the same error with the singular-value decomposition function: OU1 - OUwie(tree, data, model = OU1) Initializing... Finished. Begin thorough search... Finished. Summarizing results. Error in svd(m) : infinite or missing values in 'x' Im using R v. 2.15.1 and the latest OUwie v 1.33. I think this has something to do with the structure of my tree and/or the frequency of regime shifts. When I replace my data with Brownian Motion data simulated with my tree, I generally get the same error. The most salient feature I can think of is that regime shifts are quite frequent within the tree and there are many short branches because of resolving polytomies. There are no NAs or anything unusual in the trait data. Chris --- Christopher D. Muir, Biodiversity PDF Office: BioDiv 237 E-mail: cdm...@biodiversity.ubc.ca Biodiversity Research Centre University of British Columbia 6270 University Blvd. Vancouver, BC, Canada V6T 1Z4 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] compar.ou (sandra goutte)
You can also use OUwie for a variety of models (OU with different means and/or different variances and/or different attraction values), but it returns AIC and lnL scores but not RSS. It sounds, though, that mvSLOUCH might be the best option in your case. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Fri, Oct 25, 2013 at 8:47 AM, Krzysztof Bartoszek krz...@chalmers.sewrote: Hi Sandra, I don't know about compar.ou but my mvSLOUCH package (returns RSS and allows you to have measurement error and missing data) and Butler King's ouch package return you a lot of stastics that you can use for model comparison. Maybe in your case the AIC.C would be more appropriate? Best wishes Krzysztof Bartoszek ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] trouble estimating lambda with fitDiscrete
I don't necessarily see it as a bad idea. The discussion of lambda mostly concerns continuous traits, but it's just a way to stretch the tree to see if transformed branchlengths provide a better fit. You could see if discrete character change is based on observed branch lengths (lambda = 1) or if there's no phylogenetic signal (lambda = 0). Phylogenetic signal itself may be of dubious interest, but I don't see how discrete traits are worse for this than continuous traits. For a discussion of phylogenetic signal and why it might not be a good measure for some processes, see Revell et al. 2008: http://sysbio.oxfordjournals.org/content/57/4/591.full For a discussion of tree stretching as a general class of methods that can be used for various kinds of data, see O'Meara 2012 http://www.annualreviews.org/doi/abs/10.1146/annurev-ecolsys-110411-160331[yes, self-promotion]. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Mon, Sep 30, 2013 at 4:18 PM, Liam J. Revell liam.rev...@umb.edu wrote: I'm not sure of the wisdom (or interpretation) of fitting a lambda model to discrete traits; however it looks like your main problem is that you are using geiger 1.99-x. To get the old function output you will need to use the CRAN archive install from source. You should be able to fit the lambda model in geiger =1.99 if you think this is a good idea - but the argument/value has changed from treeTransform=lambda to transform=lambda. Good luck. - Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.**revell/http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 9/30/2013 12:15 PM, Katherine Brooks wrote: Hello, I am trying to estimate lambda using the fitDiscrete function so that I can obtain a measure of phylogenetic signal in a trait. The trait has 5 character states and is coded as letters A, B, C, D, E. When I run fitDiscrete, I do not get the appropriate output, as suggested by many phylogenetics wikis I've read. I should get output that looks like this: Finding the maximum likelihood solution [050 100] [] $Trait1 $Trait1$lnl [1] -86.76405 $Trait1$q [,1] [1,] -0.03717188 $Trait1$treeParam [1] 0.9943398 $Trait1$message [1] R thinks that this is the right answer. Perhaps I am making a mistake with the data input? Here is what I enter and the output I get: newphy=read.nexus(S1078.wo.**perotensis.fig.tre) newphy Phylogenetic tree with 55 tips and 54 internal nodes. Tip labels: Ammospermophilus_harrisii, Ammospermophilus_interpres, Ammospermophilus_leucurus, Callospermophilus_lateralis, Callospermophilus_madrensis, Callospermophilus_saturatus, ... Rooted; includes branch lengths. SocGrCat=read.csv(file.choose(**),row.names=1) SocGrCat SocGrade Ammospermophilus_harrisiiA Ammospermophilus_interpres A Ammospermophilus_leucurusA Callospermophilus_lateralis A Callospermophilus_madrensis A Callospermophilus_saturatus A Cynomys_gunnisoniD Cynomys_leucurus C Cynomys_ludovicianus E Cynomys_mexicanusD Cynomys_parvidensD Ictidomys_mexicanus A Ictidomys_tridecemlineatus B Ictidomys_parvidens A Marmota_baibacinaE Marmota_bobakE Marmota_broweri E Marmota_caligata E Marmota_camtschatica E Marmota_caudata E Marmota_flaviventris D Marmota_himalayana E Marmota_marmota E Marmota_menzbieriE Marmota_monaxA Marmota_olympus E Marmota_sibirica E Marmota_vancouverensis E Neotamias_townsendii A Otospermophilus_beecheyi B Otospermophilus_variegatus C Poliocitellus_franklinii A Spermophilus_citellusB Spermophilus_dauricusB Spermophilus_erythrogenysB Spermophilus_fulvus C Spermophilus_major B Spermophilus_pygmaeusB Spermophilus_suslicusB Spermophilus_xanthoprymnus B Urocitellus_armatus B Urocitellus_beldingi B Urocitellus_brunneus A Urocitellus_canus
Re: [R-sig-phylo] Pairwise tree distance matrix
You may look into treedist in phangorn (but just does pairs of trees, I believe) or dist.multiphylo in distory (polynomial time, but I don't think it's RF distance), or dist.topo in ape (but again, pairwise and not RF). There's also RMesquite (http://rmesquite.r-forge.r-project.org/) to have R just call Mesquite functions. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Thu, Sep 19, 2013 at 6:41 PM, Rob Lanfear rob.lanf...@gmail.com wrote: Hi All, I'm looking for a method to calculate a pairwise distance matrix of RF distances between a set of trees. Specifically, I know it's possible to do this in linear time (relative to the number of taxa), using an algorithm proposed in 1985[1]. This algorithm is implemented in various places (e.g. TreeSetVis in Mesquite), but I couldn't find an implementation in R. If anyone knows of an implementation, or has ideas on where best to start building one, please let me know. Cheers, Rob [1] Day,W. H. E. 1985. Optimal algorithms for comparing trees with labeled leaves. J. Classiï¬cation 2:7â28. -- Rob Lanfear Research Fellow, Ecology, Evolution, and Genetics, Research School of Biology, Australian National University phone: +61 (0)2 6125 3611 www.robertlanfear.com [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Count Deep Coal in R?
I think the package phybase has functions for looking at gene trees in species trees that could be modified for this. The package has been purged from CRAN (well, archived), but you can still install from source. I'm CCing the package author, Liang Liu, to see if he has any ideas. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Mon, Jul 8, 2013 at 7:27 AM, Melisa Olave melizz...@hotmail.com wrote: just wondering if you found the way to count deep coal in R?? I'm trying to do the same... with no success! thank you! Melisa ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] obtaining reasonable values from OUCH
You're at best on the border of how many parameters you can estimate given the size of your dataset. One thing you might try is running OUwie, which also implements that model but uses a different optimizer and starting values, though I wouldn't be surprised if you get similar answers. Trying a grid of values for a contour plot can tell you something about the likelihood surface you're trying to optimize on; it could be very flat. We had this function in OUwie until a license conflict with akima made us have to cut this functionality, but it's easy enough to do. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Sat, Jun 29, 2013 at 9:03 AM, Pascal Title pti...@umich.edu wrote: Hello, Given a phylogeny with 29 tips, and 5 traits that each fit a OU model better than a BM model of trait evolution (according to fitContinuous in GEIGER), I would like to fit a multivariate OU model to these 5 traits using the hansen() function in the OUCH package in R. The goal is to get multivariate rate values. I am using R v3.0.1 and version 2.8-2 of the OUCH package. Under default settings, I get an unsuccessful convergence error. By changing various settings, I stop getting a convergence error message. But the alpha and sigma2 values seem to be unrealistically large: *hansen(data = otd[colnames(data)], tree = ot, regimes = otd[regimes], * *sqrt.alpha = priors, sigma = priors, method = L-BFGS-B, * *maxit = 5000)* * * * optim diagnostic message: CONVERGENCE: REL_REDUCTION_OF_F = FACTR*EPSMCH* *alpha:* * [,1] [,2][,3] [,4] [,5]* *[1,] 61.385945 17.5563426 -33.7110759 3.697183 50.622310* *[2,] 17.556343 45.9034684 -0.3705461 27.995283 -6.633346* *[3,] -33.711076 -0.3705461 42.8040546 8.072352 -40.675454* *[4,] 3.697183 27.9952829 8.0723524 25.435097 -12.115062* *[5,] 50.622310 -6.6333460 -40.6754540 -12.115062 59.536768* * * *sigma squared:* * [,1] [,2] [,3] [,4] [,5]* *[1,] 89.262795 32.572924 9.099638 28.769264 24.711740* *[2,] 32.572924 40.506308 13.271956 19.279625 -9.633395* *[3,] 9.099638 13.271956 4.387822 5.986198 -3.965904* *[4,] 28.769264 19.279625 5.986198 11.966763 2.241925* *[5,] 24.711740 -9.633395 -3.965904 2.241925 18.995603* I've been told that sigma2/(2*alpha) should be approximately equal to the variance of the trait. If I try this for the first trait, sigma2/(2*alpha) = 0.727, and the actual variance of the first trait is 1.304. So that doesn't match, and values for alpha and sigma2 are extremely large. My question is, where do I go from here? What can I do to get more realistic values? I've heard that you can give different starting values to the sqrt.alpha and sigma arguments of the hansen function, but here these are lower triangular matrices, so I don't know what to adjust, or what other starting values to give, or how to know when I've found the most realistic values I'm going to find. From the following link, you can download the tree, data and script to see for yourself: https://dl.dropboxusercontent.com/u/34644229/Pascal_OUCH.zip Any advice or suggestions would be greatly appreciated! Thank you, -Pascal -- Pascal Title, MSc. PhD student, Rabosky Lab http://www-personal.umich.edu/~drabosky/Home.html Dept of Ecology and Evolutionary Biology University of Michigan pti...@umich.edu [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Difficulty getting posterior probabilities into ape?
Phyloch has never been on CRAN as far as I know, but you can get it from http://www.christophheibl.de/Rpackages.html . Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Wed, Jun 19, 2013 at 11:14 PM, Tom Diggs jtdi...@mail.ad.uab.edu wrote: Thank you very much for the answer. Unfortunately, that command is not working for me. It apparently reads the file just like the read.nexus command does in my case. The tree comes in fine, with the branch lengths, but no posterior values. So I guess that the read.nexus command ignores anything inside square brackets? I imagine that's what the read.annotated.nexus was designed to deal with, but unless I'm doing something incredibly wrong, it's not working. I also read that phyloch was a way to get around this problem, but it's not in CRAN anymore either. Otherwise, should I go through and just eliminate the square brackets manually? Replace them with regular parentheses? Leave them out altogether? The phylogeny I'm currently dealing with isn't THAT big, and while it'd be a pain to do it manually, it wouldn't be prohibitive. Has anyone done that before? Thanks, Tom - Original Message - From: Jombart, Thibaut t.jomb...@imperial.ac.uk To: Tom Diggs jtdi...@mail.ad.uab.edu; r-sig-phylo@r-project.org Sent: Wednesday, June 19, 2013 12:39 PM Subject: RE: [R-sig-phylo] Difficulty getting posterior probabilities into ape? Hello, Marc Suchard has recently implemented this in the package epibase - check the function read.annotated.nexus. All the best Thibaut. __**__ From: r-sig-phylo-bounces@r-project.**orgr-sig-phylo-boun...@r-project.org[ r-sig-phylo-bounces@r-**project.org r-sig-phylo-boun...@r-project.org] on behalf of Tom Diggs [jtdi...@mail.ad.uab.edu] Sent: 19 June 2013 18:33 To: r-sig-phylo@r-project.org Subject: [R-sig-phylo] Difficulty getting posterior probabilities into ape? Hi all, I was wondering if anyone else has had this problem, and how they addressed it. I've been using BEAST to generate trees, and TreeAnnotator to summarize them. In the output nexus file, the tree plainly has posterior probabilities for each node listed, but when I try to import the trees into R, my tree has no node labels (mytree$node.labels = NULL). I searched the archives for this listserve, and found one instance of this problem cropping up with MrBayes output. http://www.mail-archive.com/r-** sig-ph...@r-project.org/**msg02227.htmlhttp://www.mail-archive.com/r-sig-phylo@r-project.org/msg02227.html Unfortunately, the package phybase, listed as a potential solution to the problem, is no longer in CRAN. Am I missing something obvious? Is ape having trouble reading the node labels for some reason? Any help would be much appreciated. Thanks so much. Tom [[alternative HTML version deleted]] __**_ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/**listinfo/r-sig-phylohttps://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-** sig-ph...@r-project.org/http://www.mail-archive.com/r-sig-phylo@r-project.org/ __**_ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/**listinfo/r-sig-phylohttps://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-** sig-ph...@r-project.org/http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] R postdoc with O'Meara and Carstens
This should be coming in via EvolDir soon, but may be of interest to many of the readers of this list. You can find out more about the project, including looking at the R code, at http://www.brianomeara.info/phrapl . Please forward to any good candidates, and let me know if you have any questions (or suggestions). Thanks, Brian Phylogeography often compares a select handful of models to answer an empirical question. We have just been funded to continue development of Phrapl, a way to examine thousands of potential phylogeographic models for fit to data. It is a bit like ModelTest, but for models of population subdivision and migration rather than substitution rates. The aim is for empiricists to be able to let their data tell them which processes are important in the phylogeographic history of a group. A postdoctoral position is available to work on improving, testing, and applying this approach. The current code is mostly in R but using other languages to speed up the computation internally is quite possible. The position is in the lab of PI Brian OMeara at U. of Tennessee, Knoxville, but coordination with Co-PI Bryan Carstens and occasional travel to Ohio State is expected. The OMeara lab currently hosts five postdocs (including four co-mentored through NIMBioS) as well as a co-advised grad student. The EEB department features ten research groups whose work includes phylogenetics/phylogeography, and the presence of the National Institute for Mathematical and Biological Synthesis further creates a robust ecosystem of colleagues. Knoxville, Tennessee, is just a half hour away from the US most visited national park, Great Smoky Mountains NP. Cost of living is low, and the Knoxville area can cater to interests ranging from farmers markets, local music, and handmade crafts to SEC football and NASCAR. I strive to create a lab group that is welcoming and open. We have good diversity in terms of gender (57% female), life stage (i.e., 43% with young children) but (so far) poor diversity on various other metrics. Mentoring to allow postdocs to reach their career goals is a focus. Research in the lab ranges from empirical work on plant taxonomy to quantitative trait evolution models to models of protein evolution. Applications should include 1) A cover letter (include expected completion date of PhD, if appropriate, as well as relevant skills) 2) A CV 3) A short research statement 4) Contact information for two references 5) Link(s) to repositories with examples of code you have written or attachments including such code. Experience in programming is strongly preferred; experience in R is desired, but not required. If in doubt, apply: please do not self-select yourself out from what might be a mutually beneficial position. Review of applications will begin on May 1 and continue until filled (start dates are flexible). Presubmission inquiries are encouraged (I can give you a copy of the proposal, point you to the code, and answer any questions you may have). All qualified applicants will receive equal consideration for employment and admissions without regard to race, color, national origin, religion, sex, pregnancy, marital status, sexual orientation, gender identity, age, physical or mental disability, or covered veteran status. Eligibility and other terms and conditions of employment benefits at The University of Tennessee are governed by laws and regulations of the State of Tennessee, and this non-discrimination statement is intended to be consistent with those laws and regulations. In accordance with the requirements of Title VI of the Civil Rights Act of 1964, Title IX of the Education Amendments of 1972, Section 504 of the Rehabilitation Act of 1973, and the Americans with Disabilities Act of 1990, The University of Tennessee affirmatively states that it does not discriminate on the basis of race, sex, or disability in its education programs and activities, and this policy extends to employment by the University. Inquiries and charges of violation of Title VI (race, color, national origin), Title IX (sex), Section 504 (disability), ADA (disability), Age Discrimination in Employment Act (age), sexualorientation, or veteran status should be directed to the Office of Equity and Diversity (OED), 1840 Melrose Avenue, Knoxville, TN 37996-3560, telephone (865) 974-2498 (V/TTY available) or 974-2440. Requests for accommodation of a disability should be directed to the ADA Coordinator at the Office of Equity and Diversity. ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https
Re: [R-sig-phylo] question about analysing trait differences
The problem is reconstructing overlap down the tree (it's possible to do, but whether it's possible to do well is another question). One thing you could do to avoid this is to use the *other* method from Felsenstein's 1985 independent contrasts paper, taking pairs of species independent of other pairs. A way to do this: 1) Get a clade of size two (aka a cherry sensu Semple and Steel). Every tree has at least one. 2) Difference in overlap and diference in mass for the pair of taxa making up this clade becomes one point (perhaps standardize for time, and positivize it such that you subtract them in the order that leaves the mass difference positive). 3) Prune these two taxa off the tree. 4) Go back to step 1. When you're done, assuming no polytomies, you'll have zero or one species left. [it's possible to do with polytomies, too: if assumed to be hard, then just do a pair of taxa from a terminal polytomy and leave the rest for the next step. If assumed to be soft, then just take two from a terminal polytomy and discard the remaining taxa] Contrasts gives you Ntax - 1 contrasts. This gives you floor(Ntax/2) but avoids having to do reconstructions. I have code around to do this but can't find it at the moment, but it's not hard to write -- post again if you need help with it and no better ideas have been proposed. Hope this helps, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Sun, Apr 21, 2013 at 10:26 AM, Langevelde, Frank van frank.vanlangeve...@wur.nl wrote: Dear all I would like to test whether there is a relationship between two pairwise trait differences, or to be more specific: overlap in geographical ranges between pairs of species (as dependent variable in regression) related to the body mass differences of the pairs (as independent variable). As I expect that these traits are not phylogeneticaly independent, I would like to add phylogeny in the analysis. I thought about using independent contrasts (as these are also calculated pairwise), but I do not know how to analyse. One suggestion would be to use the distance between the pairs of species (e.g. distance to the closest shared node in the tree) and include these as the second independent variable. Is this possible or does anyone has a better suggestion? Thanks for your help! Kind regards Frank van Langevelde Resource Ecology group Wageningen University The Netherlands http://www.resecol.wur.nl/staff/frank/index_FrankvL.htm ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Estimating ancestral states of a trait under selection
We have a method in R, TreEvo, that can allow you to do things like have a model where the optimum shifts through time according to some factor (like external data on climate) or where there are constraints that change through time: it's up on R-forge, and you can install the package, but it hasn't been published yet and so could have problems that won't appear until peers have looked at it more thoroughly. If you're fine waiting to publish with it until it's been vetted and is in press, feel free to play with it. Of course, another thing you could do is throw a prior on the root state based on your knowledge of the climate then and then do a Bayesian reconstruction. I don't know of any packages that do this at the moment, but it wouldn't be hard to write. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Fri, Apr 19, 2013 at 4:00 PM, Eliot Miller eliotmil...@umsl.edu wrote: Hello all, I've read some past posts and tried to figure this one out myself. It appears that the short answer is if there are not some extinct lineages in the tree with known trait values, it's difficult to do. I have a data set of the current climatic niches of a continental radiation of taxa. If I reconstruct the ancestral state of the root using ace() with REML, I get a semi-believable answer, though obviously with wide CIs. My gut instinct is that the absolute value that is returned is slightly less than it should be. The issue is that I know the climate of the continent has changed radically (but at least consistently in one direction!) during the course of the radiation of the clade. An educated guess would put the magnitude of the change at 1200 to 550 mm rain per year over 60 million years. I'd guess that most lineages have felt this aridification as a slow and steady selective pressure to figure it out in increasingly dry conditions (there's probably also been a lot of extinction but let's skip that point for now). So, while I don't have extinct lineages with known trait values, I do have what I'd be comfortable modeling as a continuous selective pressure throughout the course of the radiation. I'm interested to see how doing this would change the actual absolute value returned by the ancestral state reconstruction. Any tips on how this might be best effected? Final note, my tree is from uncorrected molecular distances, so it's not ultrametric. The analysis I'm trying to run here is tangential to my main focus, so I'm ok with making the tree ultrametric with something like chronopl() just out of curiosity to see what comes out. Someday I'll have a time calibrated tree and then I can re-run this analysis. Best wishes, Eliot [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] R Dendrograms for Subclones in a Somatic Cell Population: Time Interval Sampling.
An old parsimony-based approach to this is known as stratocladistics. There are no R implementations, as far as I know, but you could wrap phangorn to do this, I imagine, though it does require writing a new function. Pseudocode: #my.taxa.vector is character vector of tips best.phy - rtree(length(my.taxa.vector), tip.label=my.taxa.vector) best.trees - c(best.phy) best.score - parsimony(best.phy, data) + strato(best.phy, times) for (i in sequence(nsteps)) { new.phy - rSPR(best.trees[[1]]) new.score - pars(new.phy, data) + strato(new.phy, times) if (new.score == best.score) { best.trees - c((new.phy), best.trees) } if (new.score best.score) { best.trees - c(new.phy) best.score - new.score } } this would do a greedy spr search: you'd want to restart from different trees and such. The only tricky thing is figuring out the new strato function: try a set of branch lengths and take as the best score the one that implies the least amount of stratographic debt (ape's node.height function could be useful for this; one thing that makes these easy is that times and node heights must be integers (number of time points from root), so even searching exhaustively is feasible, though almost surely unnecessary). Some of these branch lengths could be zero, indicating, in the case of a terminal branch length, a sample that is a direct ancestor of another sample. The tree isn't quite rooted but it is polarized, with nodes sampled further back in time pushed down the tree. Writing the strato function really wouldn't be that bad to do. Another approach is to do a likelihood search assuming a clock, but with tips constrained to occur at time of sampling rather than being coeval. Heibl and Cusimano's Lagopus package (not on CRAN, go to http://www.christophheibl.de/mdt/mdtinr.html) calls PAML and multidivtime to estimate a tree with age constraints. Multidivtime can use constraints such that the tips are not coeval, but I'm not certain that Lagopus can pass this information (it can certainly do node constraints, just not sure about tip constraints). If it can, or could be modified to do so, this would give you a tree with samples constrained to be at the right times and with possibly zero length branches for truly ancestral samples. You could then collapse these using di2multi in ape. Someone else may know of other ways to attack this problem. Hope this helps, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Thu, Feb 28, 2013 at 3:21 PM, sa...@math.berkeley.edu wrote: Estimating Ongoing Evolution by Repeated Sampling with Long Time Intervals. Is there a way to construct dendrograms similar to those used in phylogenetics but with 2 main differences: (1) Instead of observing at one time, small samples from a very large population are taken at regular intervals, so that some observed cells could easily correspond to an internal node rather than a leaf. (2) There is no obvious outgroup; root should if possible be estimated by presuming that observations from later time points are on average farther from root. More specifically, consider a large, heterogeneous, unstably evolving in vitro cell culture apparently not subject to a Hayflick limit. In our feasibility study, a sample of 20 cells were tested at t=0 for about 100 different numerical aspects of their karyotype (for each cell an ordered vector of 100 numbers is measured from the genome; the individual numbers all have the same order of magnitude). About 15 cell generations later the observation is repeated and similarly four more times for a total of 120 cells over a time span of about 60 cell generations. I would like to estimate the behavior of the major subclones Are some spinning off new karyotypes? Which ones, if any, are in the process of taking over? Are some being outcompeted? And so on. Various difference matrices and binary dendrograms with the cells as leaves are easily constructed and are suggestive. For example at timepoint 5 one karyotype which was prominent, with lots of duplicates, for timepoints 1-4 disappears from the samples. But the dendrograms themselves dont really use the fact that observations were made at six consective times rather than simultaneously; and they require me to make a guess about where root is. There must be a better way to use the data. I assume people who work, say, on development of drug-resistant bacterial lineages have thought this through in some detail and developed R software for it but I wasnt able to locate anything. Thanks in Advance, Ray Sachs, Dept. Math, UCB ___ R-sig-phylo mailing list - R-sig
Re: [R-sig-phylo] Analysis with Multiple cores on Mac Workstation
I agree with Daniel that going in parallel is probably overkill in this case. However, if you do want to get into parallelization with R, a good place to start is the CRAN task view on high performance and parallel computing: http://cran.r-project.org/web/views/HighPerformanceComputing.html. Like all task views, it provides an overview of the packages in that domain and an easy way to install all of them at once (see http://cran.r-project.org/web/views/ ). The built-in parallel package (since R 2.14) makes it easy to use all your cores using mclapply(). If you tend to think in for loops (as most of us do) rather than apply functions, the foreach package combined with doMC is another easy way: there's a good vignette for this at http://cran.r-project.org/web/packages/doMC/vignettes/gettingstartedMC.pdf. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Sat, Dec 8, 2012 at 7:01 AM, Daniel Barker d...@st-andrews.ac.uk wrote: Dear Jose, Is this a problem in practice? The calculation of branch lengths Brian describes do not sound very time-consuming to run. If you're dealing with an enormous tree or enormous sample, they would take some time - but presumably, the greater bottleneck would be obtaining the sample of trees in the first place. Anything can become time-consuming if repeated, e.g. if you're dealing with thousands of separate samples of trees. If things are taking too long in that situation, the best solution would be to run serial processes, not multithreaded - but many of them, submitted to a batch queuing system running on the computer. Examples include SLURM, GridEngine, LSF, Unix 'batch' (comes as part of OS X but I've never managed to set up the load threshold appropriately on OS X), or perhaps Xgrid (I don't know it). Except in very unusual situations, this kind of 'task farming', if properly set up, will always be at least as efficient as 'true' parallelisation like multithreading - often faster. It also doesn't require any special programming, and parallel programming can be fairly painful. An even simpler approach is to divide the task into n approximately equally time-consuming scripts (if you have n CPU cores) and launch them all at the same time - which uses all your cores, and doesn't even require a batch queuing system. Best wishes, Daniel On 08/12/2012 02:43, José Hidasi hidasin...@gmail.com wrote: Dear all, I want to create a consensus tree with branch lengths (Brian O'Meara's function on post *[R-sig-phylo] Why no branch lengths on consensus trees?*) using a Mac Workstation. However, if I only type the function on it, R will not use all cores for running the analysis. I would like to know if there is a function or any way to divide the analysis within the cores, or to use all cores for running the program. I know this is not the best forum to ask something like that (using multiple cores), but I imagined that someone might have the solution for that as some of you work with large databases. Best, José Hidasi * This is Brian O'Meara's function, that i am using to create the consensus tree, on the post *[R-sig-phylo] Why no branch lengths on consensus trees?:* I have a function to create a consensus tree with branch lengths. You feed it a given topology (often a consensus topology, made with ape), then a list of trees, and tell it what you want the branch lengths to represent. It could be the proportion of input trees with that edge (good for summarizing bootstrap or Bayes proportions) or the mean, median, or sd of branch lengths for those trees that have that edge. Consensus branch lengths in units of proportion of matching trees has obvious utility. As Daniel says, the average branch lengths across a set of trees is more difficult to see a use case for, but you could imagine doing something like taking the ratogram output from r8s on a set of trees and summarizing the rate average and rate sd on a given, best, tree as two sets of branch lengths on that tree. I've put the function source at https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/R/consensusBrl en.R?revision=110root=omearalab . You can source the file for the function (consensusBrlen() ) and other functions it needs. It also uses phylobase. Note that this is alpha-quality code -- it's been checked a bit, but verify it's doing what you want. Here's an example of how to use it library(ape) library(phylobase) phy.a-rcoal(15) phy.b-phy.a phy.b$edge.length-phy.b$edge.length+runif(length(phy.b$edge.length), 0, 0.1) phy.c-rcoal(15) phy.list-list(phy.a, phy.b, phy.c) phy.consensus-consensusBrlen(phy.a, list
Re: [R-sig-phylo] Why no branch lengths on consensus trees?
I have a function to create a consensus tree with branch lengths. You feed it a given topology (often a consensus topology, made with ape), then a list of trees, and tell it what you want the branch lengths to represent. It could be the proportion of input trees with that edge (good for summarizing bootstrap or Bayes proportions) or the mean, median, or sd of branch lengths for those trees that have that edge. Consensus branch lengths in units of proportion of matching trees has obvious utility. As Daniel says, the average branch lengths across a set of trees is more difficult to see a use case for, but you could imagine doing something like taking the ratogram output from r8s on a set of trees and summarizing the rate average and rate sd on a given, best, tree as two sets of branch lengths on that tree. I've put the function source at https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/R/consensusBrlen.R?revision=110root=omearalab. You can source the file for the function (consensusBrlen() ) and other functions it needs. It also uses phylobase. Note that this is alpha-quality code -- it's been checked a bit, but verify it's doing what you want. Here's an example of how to use it library(ape) library(phylobase) phy.a-rcoal(15) phy.b-phy.a phy.b$edge.length-phy.b$edge.length+runif(length(phy.b$edge.length), 0, 0.1) phy.c-rcoal(15) phy.list-list(phy.a, phy.b, phy.c) phy.consensus-consensusBrlen(phy.a, list(phy.a, phy.b, phy.c), type=mean_brlen) Best, Brian PS: Note that I am actively looking for grad students: info at http://www.brianomeara.info/lab . Guaranteed five years support, subject to decent performance. ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Wed, Nov 21, 2012 at 11:09 AM, Daniel Barker d...@st-andrews.ac.ukwrote: Dear Scott, What should branch lengths on a consensus tree represent? They cannot be expected substitutions per residue. This would imply no evolution at points where uncertain branching patterns have been reduced to a multi-furcation - which is not what the multi-furcation is meant to imply. (Rather: there was evolution, but we aren't very certain about the branching pattern.) But, MrBayes does provide average lengths of some kind. Best wishes, Daniel On 21/11/2012 15:13, Scott Chamberlain myrmecocys...@gmail.com wrote: When making a consensus tree using ape::consensus the branch lengths are lost. Is there a way to not lose the branch lengths? Or to add them somehow to the consensus tree after making it. library(ape) cat(owls(((Strix_aluco:4.2,Asio_otus:4.1):4.1,Athene_noctua:7.3):6.3,Tyt o_alba:13.5);, file = ex1.tre, sep = \n) cat(owls(((Strix_aluco:1.2,Asio_otus:4.5):3.1,Athene_noctua:7.3):6.3,Tyt o_alba:13.5);, file = ex2.tre, sep = \n) cat(owls(((Strix_aluco:3.2,Asio_otus:4.7):8.1,Athene_noctua:7.3):6.3,Tyt o_alba:13.5);, file = ex3.tre, sep = \n) tree1 - read.tree(ex1.tre) tree2 - read.tree(ex2.tre) tree3 - read.tree(ex3.tre) trees - c(tree1, tree2, tree3) trees_con - consensus(trees) trees_con Phylogenetic tree with 4 tips and 3 internal nodes. Tip labels:[1] Strix_aluco Asio_otus Athene_noctua Tyto_alba Rooted; no branch lengths. Thanks, Scott Chamberlain [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- Daniel Barker http://bio.st-andrews.ac.uk/staff/db60.htm The University of St Andrews is a charity registered in Scotland : No SC013532 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] estimating mutation rates
If you want the branch-specific substitution rate, you could estimate the branch lengths on a topology constrained to match the chronogram, then divide each of the molecular tree's branch lengths by the corresponding chronogram branch lengths to get these rates. If you want a global rate, you might constrain the molecular tree to have the same proportional branch lengths as the chronogram. Look at pml() in phangorn for calculating the likelihoods. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Thu, Nov 8, 2012 at 12:49 PM, john d dobzhan...@gmail.com wrote: Hi there, I have a well-supported phylogeny (with branch lengths proportional to absolute time) and I'm interested in estimating the mutation rate of an independently-obtained molecular dataset. Any suggestions? thanks! John ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] variation in rates over time
I agree with the suggestions so far. I just wanted to point out a few more alternatives: You could use the geiger package to estimate the best scaling for the tworatetree transformation to do this (should be equivalent to the earlier solutions, though it would require running optimization). You could also use the OUwie package with a tree that has been given simmap mappings using phytools. The advantage of this is that you could evaluate Brownian models but you could also look at various Ornstein-Uhlenbeck models (though note that while there is information about different Brownian rates before and after a time slice, info about different alphas (strength of pull parameter) and thetas (the attractor, aka optimal value) is rapidly (but not immediately) lost under an OU process). For completeness, especially for citations, note that O'Meara et al. (2006) and Thomas et al. (2006) independently arrived at essentially the same method, so it is worth reading both papers. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Mon, Sep 17, 2012 at 9:11 PM, Jason S jas2...@yahoo.com wrote: Thanks, guys. That's exactly what I needed. From: Liam J. Revell liam.rev...@umb.edu To: Matt Pennell mwpenn...@gmail.com -project.org Sent: Monday, September 17, 2012 9:22 PM Subject: Re: [R-sig-phylo] variation in rates over time Hi Jason. Matt is absolutely correct. You can do this with phytools. Say, for instance, you have an ultrametric phylogeny with branches in millions of years (tree) and data vector containing the trait values for species (x) and you want to test the hypothesis that the last 3.4 my has a different rate of evolution than the rest of the tree, you could do this as follows: library(phytools) # load phytools tree-make.era.map(tree,c(0,max(nodeHeights(tree))-3.4)) plotSimmap(tree,pts=F,lwd=3) # visualize fit-brownie.lite(tree,x) # fit model That's it. Good luck. Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 9/17/2012 7:46 PM, Matt Pennell wrote: Jason, I think the best way to do this is with the approach of O'Meara et al. 2006 Evolution Brownie. Liam Revell has implemented this in R in his package phytools. You can modify the steps taken in this tutorial here http://phytools.blogspot.com/2011/07/running-brownielite-for-arbitrarily.html perhaps in conjunction with the function make.era.map() http://phytools.blogspot.com/2011/10/new-function-to-plot-eras-on-tree.html though I admittedly have not tried this myself. Perhaps Liam or someone else has a better explanation but I hope this is at least somewhat helpful. cheers, matt Hello, I see that there are several interesting alternatives to test for different rates among clades. However, I was wondering if there is a method to test for varying rates over time. I'm aware of Pagel's delta and the EB model, but I was thinking more in terms of testing if there is a different rate for the entire tree after a specified point in time. For instance, if a snail predator colonizes an island 3.4 Mya, is there evidence for an increased rate of evolution in the prey after that point in time? Something like two lambdas, one for before and one for after that point in time. Thanks! Jason [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Estimating MRCA dates
Ape can do this, function chronopl. In general, one good way to find out which package can do something is by looking at the task view for phylogenetics: http://cran.r-project.org/web/views/Phylogenetics.html . This will also allow you to install all the phylogenetics packages (at least those on CRAN) with just a couple of commands. An updated version should be coming out later today (and if someone notices other things missing, please let me know). Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Funding wanted: Want to collaborate on a grant? On Thu, Feb 2, 2012 at 8:45 AM, Sean Downey s...@codexdata.com wrote: Greetings! I'm new to the list, thanks to Emanuel for bringing it to my attention. I have a general question for the group. Several years ago I used r8s for estimating mrca dates for a tree using an internal node date as a prior. I believe BEAST also does this. But is there an R package for this kind of relaxed-clock rate smoothing? I'm beginning a new project and want to make sure I'm using up-to-date methods. Thanks, Sean -- Sean S. Downey Institute of Archaeology University College London 31-34 Gordon Square London WC1H 0PY s.dow...@ucl.ac.uk mailto:s.dow...@ucl.ac.uk http://www.homepages.ucl.ac.**uk/~tcrnssd/http://www.homepages.ucl.ac.uk/~tcrnssd/ http://www.homepages.ucl.ac.**uk/%7Etcrnssd/http://www.homepages.ucl.ac.uk/%7Etcrnssd/ UK Cell: +44 (0)7980 284 574 __**_ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/**listinfo/r-sig-phylohttps://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Convert instantaneous transition rate to time.
Another way (influenced by some of Elchanan Mossel's work, though it also relates to Cecile's idea of looking for saturation) is to look at information about state at the root. If the data are very informative about it, one state will have most of the relative likelihood. If the rate * time is high, there is less information about the state at the root, and the proportion of relative likelihoods for different states will converge to the equilibrium state frequencies (assuming a time-symmetric model). For example, for a short tree for DNA data, the ratio of relative likelihoods for A, T, G, and C at the root might be c(0.99, 0.01, 0, 0) [most weight on A] but on a longer tree with less information the reconstruction might be c(0.27, 0.25, 0.24, 0.24). If the equilibrium frequency is c(0.25, 0.25, 0.25, 0.25), this suggests the second tree has less signal about the root state than the first. What I'm not sure about is how to actually measure this distance. Let the relative likelihoods (normalized to sum to one) of the reconstructed root state be the vector root.empirical, and let the equilibrium frequencies be root.equilibrium. You could do: 1) Shannon_entropy( root.empirical ): the flatter the distribution, the more entropy, the less we know about the root state. The thing I worry about here is what if the equilibrium frequency isn't flat? Then a root.empirical that is far from root.equilibrium (indicating some information) but is flat is judged as less informative than a root.empirical that exactly equals the root.equilibrium vector, which doesn't seem right. [note that Shannon_entropy() is pseudocode, though I'm sure somewhere on CRAN there's a function of this sort] 2) Euclidean distance between root.empirical and root.equilibrium: but does this put too much weight on what is happening at suboptimal states? 3) Distance between max(root.empirical) and root.equilibrium[which.max(root.empirical)]: but then something that should be maximally informative (max(root.empirical)==1) might not be depending on root.equilibrium. I.e., if root.equilibrium=c(0.4, 0.1, 0.25, 0.25), the information about the root from root.empirical.1=c(1,0,0,0) is 1 - 0.4 = 0.6, whereas the information from root.empirical.2=c(0, 1, 0, 0) is 1 - 0.1 = 0.9, even though for each the weight for the root state is as strong as can be. Perhaps someone else has an idea about a better way to measure this. In practice, I bet all these measures are pretty highly correlated and so which one is used might not have much practical importance, though this would be worth investigating if actual conclusions depend on the results. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Funding wanted: Want to collaborate on a grant? Calendar: http://www.brianomeara.info/calendars/omeara On Tue, Nov 22, 2011 at 1:09 PM, Cecile Ane a...@stat.wisc.edu wrote: Hi Simon, One option could be to look at the expected number of substitutions over time t, and find the smallest time t for which you expect at least 1 substitution. The idea here is that the first substitution is the one that most disrupts the signal. Technically, if Q is your rate matrix and p is the stationary distribution of states associated with Q, then the average number of substitutions over time t is t * r where r is the average jumping rate: r = sum over all states p(i) Q(i,i) in absolute value. So the time to the first substitution is 1/r. Cecile. On 11/22/11 09:01, Simon Greenhill wrote: Hi all, What is the best way to convert an instantaneous transition rate (such as that given by geiger's `fitDiscrete` method) into a measure of stability over time? So, I have a set of traits with a small number of states. I want to fit these onto a set of trees with branches proportional to substitutions. The instantaneous transition rate for these traits give me the relative stability, but I want to be able to quantify this over time e.g. what proportion of signal is remaining after time t? how could I represent this as a decay curve of signal over time? Any ideas/hints would be much appreciated! Simon __**_ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/**listinfo/r-sig-phylohttps://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Unexpected behavior in birth death simulations
As a general approach, when this sort of thing happens, I'll modify and load a copy of the function that has been changed to produce a lot of debugging info (lots of dput and print statements, for example). Then what happens at each simulation step to identify where the problem occurs. [Better developers would use R's built-in debugging functions like browser(), debug(), and trace()]. Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Funding wanted: Want to collaborate on a grant? On Wed, May 18, 2011 at 3:59 PM, Carl Boettiger cboet...@gmail.com wrote: Hi Tanja,list Yes, everything is going extinct, but as you say, that shouldn't happen when mu = 0. sim.bd.age(age=2,numbsim=3,lambda=2,mu=0,frac,mrca=FALSE,complete=FALSE) [[1]] [1] 0 [[2]] [1] 0 [[3]] [1] 0 It's a rather malicious error in that, as you point out, it's a possible outcome when mu=0, so at first it appears nothing is wrong, but it happens every time that everything is extinct even for mu lambda, or mu=0. It's curious that this same problem effects both TreeSim and Geiger, and that this problem appears isolated to certain architectures of computers. My guess is that it may be related to 32 bit architecture and the way ape is handling the tree, but really not sure how to debug this one. -Carl On Wed, May 18, 2011 at 12:26 PM, Stadler Tanja tanja.stad...@env.ethz.chwrote: Hi Carl, the TreeSim return (arrays of 0s) could be because TreeSim returns 0 when the tree goes extinct: treearray Array of numbsim trees with the time since origin / most recent common ancestor being age. If tree goes extinct (only possible when mrca = FALSE), returns 0. If only one extant and no extinct tips, returns 1. Maybe set seed and compare runs with complete=FALSE and complete=TRUE? Cheers Tanja I get a similar problem when running the equivalent command in TreeSim: age-2 lambda - 2.0 mu - 0.5 frac -0.6 numbsim-3 sim.bd.age(age,numbsim,lambda,mu,frac,mrca=FALSE,complete=FALSE) [[1]] [1] 0 [[2]] [1] 0 [[3]] [1] 0 -- Tanja Stadler (nee Gernhard) ETH Zürich Institute of Integrative Biology Universitätsstrasse 16 8092 Zürich eMail: tanja.stad...@env.ethz.chmailto:tanja.stad...@env.ethz.ch Phone: +41 44 632 45 48 Office: CHN H76.1 http://www.tb.ethz.ch/people/tstadler -- Carl Boettiger UC Davis http://www.carlboettiger.info/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Model-Selection vs. Finding Models that Fit Well
I think considering model adequacy is something that would be useful to do and is not done much now. One general way to do this is to simulate under your chosen model and see if the real data look very different from the simulated data. For example, I might try a one rate vs. a two rate Brownian motion model and find the latter fits better. If the actual true model is an OU model with two very distinct peaks and strong selection, which is not in my model set, I'll find that my simulated data under the two rate Brownian model may look very different from my actual data, which will be fairly bimodal. Aha, my model is inadequate. [but then what -- keep adding new models, just report that your model is inadequate, ...?] Of course, you need a method for evaluating how similar the data look. There's been some work on this in models for tree inference using posterior predictive performance (work by Jonathan Bollback and Jeremy Brown come to mind) or using other approaches (some of Peter Waddell's work), but it hasn't really taken off yet. It'd be easy to implement such approaches in R for comparative methods given capabilities in ape, Geiger, and other packages. Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Funding wanted: Want to collaborate on a grant? On Thu, Jan 20, 2011 at 12:27 PM, David Bapst dwba...@uchicago.edu wrote: Hello all, I'd like to pose a question to this group, as a bit of topical discussion. I apologize in advance if I should mangle a concept. In many model-based PCMs and some other analyses (such as paleoTS), we fit models to data by finding the ML estimates of the parameters associated, calculate the maximum support of each model and than compare between models with differing parameters using an information criterion (AICc being probably the most used). Akaike weights can be calculated if we want to consider the relative fit between our models. This is contrary to traditional statistics, where alternative hypotheses are tested against some null hypothesis. Obviously, the later approach has proven to be thorny because rejecting some null hypotheses are very difficult (such as a random walk) and some situations truly lack a clear null model. Recently, I have heard the opinion expressed from workers of disparate fields (philosophy, ecology, etc.) that model-choosing methods may choose the best model, but with no idea of whether any of the models considered fit well to the data or not. In other words, we may have fit models A-D, and the best model may have been model C, but none of the models compared could describe the 'true' process underlying the data at all. This view gives me mixed feelings. Certainly, if we are using a model-selection approach, we should attempt the range of models that make sense for our data, and should particularly include that set of simplistic models that we may accept as the most observed process (Brownian motion and Ornstein-Uhlenbeck with one optima, perhaps, in analyses of trait evolution). Of course, we cannot include models that we haven't even considered or are analytically intractable. That's a fundamental limitation of science, however, not model-selection based analyses. This counter-argument did not seem to satisfy the others, who still wanted a measure of absolute fit, like an R-squared. Now, perhaps I'm confused, but isn't R-squared technically a relative measure of fit between a linear model and a random scatter? I suppose the maximum support for a model is a measure of absolute fit, but it's not useful or interpretable unless I'm comparing it to the support for some other model. So, it seems like the desire for a measure of absolute fit is not well-founded, but maybe I'm wrong. Is there something more we can do to show how that the models we've picked aren't arbitrary? Opinions? -Dave Bapst, UChicago Geosci ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Fst - DNA sequence data
You should look at the CRAN task view for genetics: http://cran.r-project.org/web/views/Genetics.html . I haven't used R much in that domain, but there seems to be a lot of functionality. Also look into Bioconductor. Best, Brian On Mon, Aug 23, 2010 at 8:18 AM, blue jay bluejay...@gmail.com wrote: Hi, (sorry for cross posting!) I want to analyse DNA sequence data (mtDNA) in R as in calculate Fst, Heterozygosity and such summary statistics. Package Adagenet converts the DNA sequence into retaining only retaining the polymorphic sites and then calcuates Fst.. but is there any other way to do this? I mean analyse the DNA sequence as it is.. and calculate the statistics? Thanks! [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] estimate likelihood of data for given model and parameters
I have code for this in my TreEvo package (on R-forge -- see brownieneglnL in http://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/pkg/R/brownie.R?rev=2root=treevoview=markup ) though note that the package itself is not all cleaned up into install.packages()-ready form. You could also dig into geiger's fitContinuous function and use that code (look for foo in fitContinuous.R) -- OUCH must have similar code. Odd if there's nothing more direct. Best, Brian On Mar 19, 2010, at 9:47 PM, Carl Boettiger wrote: Dear r-sig-phylo, I'd like to be able to estimate the likelihood of a model with a given Brownian rate parameter under a given data set. For OU models, this can be accomplished through the ouch package with the flag fit=FALSE. However there does not seem to be a corresponding option for Brownian motion in ouch/geiger/ape algorithms for fitting a Brownian motion model? Perhaps I've overlooked something? -Carl -- Carl Boettiger Population Biology, UC Davis http://two.ucdavis.edu/~cboettig [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Brian C. O'Meara Asst. Prof., Dept. of Ecology and Evolutionary Biology University of Tennessee, Knoxville http://www.brianomeara.info 865-408-TREE (8733) ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] plot.phylo leftwards in ape
I'm trying to plot two trees opposite each other using ape (essentially as in Fig. 4.18 in Paradis' Analysis of Phylogenetics and Evolution with R, with some modifications). I was having trouble seeing the rootmost line on the plot on the right, so decided to try using x.lim as an argument for plot.phylo. Apparently, this doesn't work properly with direction=leftwards (it seems to override the direction, so it plots it rightwards with bad label placement). A quick glance through the code didn't suggest to me how to fix it. This will reproduce the behavior: library(ape) data(bird.orders) plot.phylo(bird.orders,direction=r) #looks okay plot.phylo(bird.orders,direction=r, x.lim=38) #still okay plot.phylo(bird.orders,direction=l) #still okay, points left plot.phylo(bird.orders,direction=l, x.lim=38) #problem It's possible the solution to my original potting issue lies elsewhere (I'll keep poking around at it), but I wanted to point out the potential x.lim+direction=l bug in any case. I'm using ape 2.5. Thanks, Brian -- Brian O'Meara http://www.brianomeara.info Assistant Prof. Dept. Ecology Evolutionary Biology U. of Tennessee, Knoxville [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Bayesian dating in R vs. Multidivtime
Emmanuel, you should check Lagopus (http://www.christophheibl.de/mdt/mdtinr.html ), which seems to connect R with multidivtime (based on documentation -- I haven't used it). Best, Brian On Jan 26, 2010, at 11:19 AM, Emmanuel Paradis wrote: Dear all, I'm planning to do some molecular dating using Bayesian methods, ie, what people usually do with Multidivtime. After reading the doc of Multidivtime, I'm thinking to do my own implementation in R since it is straightforward to specify priors, define and run the Markov chain, and calculate posteriors. I have two questions: 1. Has someone attempted a similar implementation? 2. Has someone written an interface between R and Multidivtime? Many thanks in advance for any reply. Best, Emmanuel -- Emmanuel Paradis IRD, Montpellier, France ph: +33 (0)4 67 16 64 47 fax: +33 (0)4 67 16 64 40 http://ape.mpl.ird.fr/ ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- Brian O'Meara http://www.brianomeara.info Assistant Prof. Dept. Ecology Evolutionary Biology U. of Tennessee, Knoxville ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Phylogenetic reference trees
There's not yet a good single source. TreeBase.org has some topologies, but not all studies go in there yet. The largest one I know of there has 1569 taxa . The PhyLoTA browser (http://loco.biosci.arizona.edu/pb/ ) has trees for many groups and also allows you to download aligned data to make your own. You can download NCBI's taxonomy, which is a sort of tree, though not highly-resolved or with branch lengths, but at least pretty comprehensive. You can also download the tree used by the Tree of Life project (http://tolweb.org/tree/home.pages/downloadtree.html ). There may be a tree available from http://www.timetree.org/, but I haven't been able to find it in a form other than a pdf image. Having a place to download the single best current estimate of the full tree of life would be great, and there are a few research groups working towards this goal, but it's not ready yet, as far as I know. Hope this helps, Brian Brian C. O'Meara Asst. Prof., Dept. of Ecology and Evolutionary Biology University of Tennessee, Knoxville http://www.brianomeara.info 865-408-TREE (8733) On Aug 20, 2009, at 7:31 AM, saikari keitele wrote: Hi, I'm trying to use the picante r package. My aim is to use it to construct phylogenetic trees and calculate diversity statistics from species occurrence data in different geographic regions. So, I have a list of species names with abundance information (number of occurrences) and would like to use Phylomatic to construct a tree. The Phylomatic documentation mentions that a reference tree is needed and different examples of megatrees ( http://svn.phylodiversity.net/tot/megatrees/) are given for this. However, they apply only (I think) to trees and plants. My data includes other kinds of organisms. Do you know of any other downloadable reference trees that include the whole taxonomy? Thank you very much. Saikari [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] Fwd: Other: Software.GLMM_R_package
This may be of interest to some. Brian Begin forwarded message: From: evol...@evol.biology.mcmaster.ca Date: January 29, 2009 4:59:54 AM EST To: omeara.br...@gmail.com Subject: Other: Software.GLMM_R_package Reply-To: br...@helix.biology.mcmaster.ca Dear Evoldir members, I've written a GLMM package for R that is now downloadable (http://cran.r-project.org/web/packages/MCMCglmm/index.html ). It fits standard mixed models to several types of distribution (Gaussian, Poisson, Exponential, Binary, Binomial, Multinomial, Zero- inflated Poisson) More than one response variable can be modeled at the same time and these can come from different distributions. Pedigrees and Phylogenies can be fitted as random effects as in animal models and the comparative method. Meta-analysis models can be fitted (even in conjunction with other terms, e.g. a phylogeny) It uses MCMC, as REML/ML procedures are not very robust for fitting GLMM's. However, I've kept it simple to use and made it fast. There are some other functions in the package for matrix comparisons, matrix visualisation, tensor analysis, random number generators on pedigrees/phylogenies + more. Thanks, Jarrod Jarrod Hadfield j.hadfi...@ed.ac.uk Brian O'Meara NESCent Durham, NC http://www.brianomeara.info [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] tree thievery
as a PNG or GIF OR adjust the PDF window so the figure fills it and take a snapshot of the Window (on Ubuntu: alt-printscreen), save as PNG or GIF open g3data click on two points on the X and Y axis, fill in values click on points if you need to compress the display so that you can see the output actions, use the View menu or function keys to toggle display of zoom area (F5), axis settings (F6), or output properties (F7) for multiple series, either click on points in order (e.g. work left- to-right for each series), then edit your output to put tags on increasing series, or output each series to a separate data file note that by default g3data will save your data to a file named after your graphics file, e.g. mydata.png.dat -- which means that it will show up in Windows as a file called mydata.png, with a DAT file type -- which may be confusing. reading into excel: use Data menu to separate into columns Wish list for g3data: csv format output? series tagging? keyboard shortcuts for Save (Ctrl-S), Save As (Ctrl-A)? built-in documentation? plot(newtree2) ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Brian O'Meara NESCent Durham, NC http://www.brianomeara.info ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo