Re: [R-sig-phylo] Function to Extend Tips?
I think this'll work, depending on how you store your data. # assumes a tree 'phy' and a dataframe, 'df', with first column of tip names, and second column of values for (i in 1:dim(df)[1]) { # find index of edge length idx <- which(phy$edge[,2] == which(phy$tip.label == df[1,i])); # apply it. this assume you want to extend it, not just set it phy$edge.length[idx] <- phy$edge.length[idx] + as.numeric(df[i,2]); } Probably a nicer way to do it without a loop... JWB ____ Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 26 Jul, 2017, at 11:57, William Gearty <wgea...@stanford.edu> wrote: > > That's definitely helpful, Joseph, but I'll need to extend the tips to > varying amounts. > > Basically, I performed a tip-dating analysis using constraints based on the > FADs of a bunch of fossils. > However, now some of the analyses I want to perform require that the tips > extend to the species' extinctions, so I need to extend them to the LADs (or > farther, I suppose). > How would I, given a vector of LAD ages for the tips, extend the tips to > those ages? > > On Wed, Jul 26, 2017 at 8:50 AM, Joseph W. Brown <josep...@umich.edu > <mailto:josep...@umich.edu>> wrote: > If you want to just extend all tips by a constant amount you can do this: > > # extend terminal edges by arbitrary amount (here: 13) > idx <- which(phy$edge[,2] < (phy$Nnode + 1)); > phy$edge.length[idx] <- phy$edge.length[idx] + 13; > > HTH. > Joseph. > > Joseph W. Brown > Post-doctoral Researcher, Smith Laboratory > University of Michigan > Department of Ecology & Evolutionary Biology > Room 2071, Kraus Natural Sciences Building > Ann Arbor MI 48109-1079 > josep...@umich.edu <mailto:josep...@umich.edu> > > > >> On 26 Jul, 2017, at 11:39, David Bapst <dwba...@gmail.com >> <mailto:dwba...@gmail.com>> wrote: >> >> Not sure what you mean by extending tips, Will. Could you describe a >> small example? >> -Dave >> >> On Fri, Jul 21, 2017 at 5:15 PM, William Gearty <wgea...@stanford.edu >> <mailto:wgea...@stanford.edu>> wrote: >>> Hi all, >>> >>> Before I go ahead and wrote a whole script, I was wondering if there is a >>> function or script out there for extending tips (or setting the ages of >>> tips) given a list of taxa and ages? >>> I haven't found anything, but perhaps I'm searching the wrong phrase(s). >>> >>> Thanks, >>> Will >>> >>> -- >>> William Gearty >>> PhD Candidate, Paleobiology >>> Department of Geological Sciences >>> Stanford School of Earth, Energy & Environmental Sciences >>> williamgearty.com <http://williamgearty.com/> >>> >>>[[alternative HTML version deleted]] >>> >>> ___ >>> R-sig-phylo mailing list - R-sig-phylo@r-project.org >>> <mailto:R-sig-phylo@r-project.org> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >>> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> >>> Searchable archive at >>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ >>> <http://www.mail-archive.com/r-sig-phylo@r-project.org/> >> >> >> >> -- >> David W. Bapst, PhD >> https://github.com/dwbapst/paleotree <https://github.com/dwbapst/paleotree> >> >> ___ >> R-sig-phylo mailing list - R-sig-phylo@r-project.org >> <mailto:R-sig-phylo@r-project.org> >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> >> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ >> <http://www.mail-archive.com/r-sig-phylo@r-project.org/> > > > > > -- > William Gearty > PhD Candidate, Paleobiology > Department of Geological Sciences > Stanford School of Earth, Energy & Environmental Sciences > williamgearty.com <http://williamgearty.com/> [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Function to Extend Tips?
If you want to just extend all tips by a constant amount you can do this: # extend terminal edges by arbitrary amount (here: 13) idx <- which(phy$edge[,2] < (phy$Nnode + 1)); phy$edge.length[idx] <- phy$edge.length[idx] + 13; HTH. Joseph. ____ Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 26 Jul, 2017, at 11:39, David Bapst <dwba...@gmail.com> wrote: > > Not sure what you mean by extending tips, Will. Could you describe a > small example? > -Dave > > On Fri, Jul 21, 2017 at 5:15 PM, William Gearty <wgea...@stanford.edu> wrote: >> Hi all, >> >> Before I go ahead and wrote a whole script, I was wondering if there is a >> function or script out there for extending tips (or setting the ages of >> tips) given a list of taxa and ages? >> I haven't found anything, but perhaps I'm searching the wrong phrase(s). >> >> Thanks, >> Will >> >> -- >> William Gearty >> PhD Candidate, Paleobiology >> Department of Geological Sciences >> Stanford School of Earth, Energy & Environmental Sciences >> williamgearty.com >> >>[[alternative HTML version deleted]] >> >> ___ >> R-sig-phylo mailing list - R-sig-phylo@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > > > -- > David W. Bapst, PhD > https://github.com/dwbapst/paleotree > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] A possible alternate MRCA function to APE's getMRCA
Emmanuel and Klaus, After Klaus reminded me that `which` was so inefficient I realized using it in finding the match in the first lineage is unnecessary. Instead, we can capitalize on the fact that shallower nodes have smaller nodeids. So I threw out the vector index `mrcaind` altogether: getMRCA_new <- function(phy, tip) { if (!inherits(phy, "phylo")) stop('object "phy" is not of class "phylo"') if (length(tip) < 2) return(NULL) Ntip <- length(phy$tip.label) rootnd <- Ntip + 1L pars <- integer(phy$Nnode) # worst case assignment, usually far too long tnd <- if (is.character(tip)) match(tip, phy$tip.label) else tip ## build a lookup table to get parents faster pvec <- integer(Ntip + phy$Nnode) pvec[phy$edge[, 2]] <- phy$edge[, 1] ## get entire lineage for first tip nd <- tnd[1] for (k in 1:phy$Nnode) { nd <- pvec[nd] pars[k] <- nd if (nd == rootnd) break } pars <- pars[1:k] # delete the rest mrcand <- pars[1]; # keep track of actual node rather than index in pars ## traverse lineages for remaining tips, stop if hit common ancestor for (i in 2:length(tnd)) { done <- FALSE cnd <- tnd[i] while (!done) { cpar <- pvec[cnd] # get immediate parent if (cpar %in% pars) { if (cpar == rootnd) return(rootnd) # early exit mrcand <- min(mrcand, cpar); # shallowest node has lowest nodeid done <- TRUE } cnd <- cpar # keep going! } } mrcand } I also played around with compiling the function: library(compiler); getMRCA_em_cmp <- cmpfun(getMRCA_em); getMRCA_new_cmp <- cmpfun(getMRCA_new); Here are some timings (*_em is your code, *_new is the code above). For my original example (100K tips, 500 tips in MRCA query) there is not a whole lot of difference: microbenchmark(getMRCA_em(phy, tips), getMRCA_new(phy, tips), getMRCA_em_cmp(phy, tips), getMRCA_new_cmp(phy, tips)); Unit: milliseconds expr min lq mean median uq max neval getMRCA_em(phy, tips) 23.99879 29.81964 33.67437 32.39149 35.61888 136.34521 100 getMRCA_new(phy, tips) 23.76941 28.02500 31.84285 31.37001 34.89640 44.66380 100 getMRCA_em_cmp(phy, tips) 20.81924 23.60118 28.23957 28.25202 30.18535 59.72102 100 getMRCA_new_cmp(phy, tips) 20.16588 23.19571 27.19610 26.65783 29.82149 42.61702 100 However, for the worst-case scenario that Klaus devised the results are very impressive: microbenchmark(getMRCA_em(tree, c(1,2)), getMRCA_new(tree, c(1,2)), getMRCA_em_cmp(tree, c(1,2)), getMRCA_new_cmp(tree, c(1,2))); Unit: milliseconds expr minlq meanmedian uq max neval getMRCA_em(tree, c(1, 2)) 124.79040 140.44473 146.98292 145.39224 150.96044 286.78335 100 getMRCA_new(tree, c(1, 2)) 123.39924 139.41609 145.21311 143.92062 148.34205 297.73846 100 getMRCA_em_cmp(tree, c(1, 2)) 23.72105 26.84944 29.94950 29.87935 32.26490 41.99264 100 getMRCA_new_cmp(tree, c(1, 2)) 23.15223 25.74686 29.21526 29.25370 30.80915 69.68522 100 So it might be worth compiling. HTH. JWB Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 10 Jun, 2017, at 05:26, Emmanuel Paradis <emmanuel.para...@ird.fr> wrote: > > Joseph, Klaus, > > This is great. I slightly edited the code and renamed the argument 'tips' to > 'tip' (as in the original definition). I run on a bunch of random trees and > the results are exactly identical with the current version of getMRCA. I'm > changing the code in ape now. > > Cheers, > > Emmanuel > > getMRCA <- function(phy, tip) > { >if (!inherits(phy, "phylo")) >stop('object "phy" is not of class "phylo"') >if (length(tip) < 2) return(NULL) >Ntip <- length(phy$tip.label) >rootnd <- Ntip + 1L > >pars <- integer(phy$Nnode) # worst case assignment, usually far too long >mrcaind <- 1L # index in pars of the mrca node. will return highest one > traversed by other lineages >tnd <- if (is.character(tip)) match(tip, phy$tip.label) else tip > >## build a lookup table to get parents faster >pvec <- integer(Ntip + phy$Nnode) >pvec[phy$edge[, 2]] <- phy$edge[, 1] > >## get entire lineage for first tip >nd <- tnd[1] >for (k in 1:phy$Nnode) { >nd <- pvec[nd] >pars[k] <- nd >if (nd == rootnd) break >
Re: [R-sig-phylo] A possible alternate MRCA function to APE\'s getMRCA
Sweet! I considered preallocation, but figured it was not worth it. Guess I was wrong! JWB Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 9 Jun, 2017, at 18:28, Klaus Schliep <klaus.schl...@gmail.com> wrote: > > Hi Joseph > > I guess I figured out where some room for improvement was. > You grow the vector pars <- c(pars, nd) > I preallocated pars, but you could even just grow pars using pars[k] <- nd > without pre-allocation. This operation is now (with R >= 3.4) far less costly > as mentioned in the NEWS: > "Assigning to an element of a vector beyond the current length now > over-allocates by a small fraction. The new vector is marked internally as > growable, and the true length of the new vector is stored in the truelength > field. This makes building up a vector result by assigning to the next > element beyond the current length more efficient, though pre-allocating is > still preferred. The implementation is subject to change and not intended to > be used in packages at this time." > > One of the many little improvements recently making R quite a bit faster and > easier. > > So the worst case behavior was pretty bad > > tree <- stree(10, "right") > system.time(get_mrca_fast(tree, c(1,2))) >user system elapsed > 13.424 0.012 13.440 > mrca > [1] 19 > > with the changes it is much better: > > system.time(mrca <- get_mrca_fast(tree, c(1,2))) >user system elapsed > 0.012 0.000 0.012 > > mrca > [1] 19 > > And here is the code: > > get_mrca_fast <- function (phy, tips) { > rootnd <- length(phy$tip.label) + 1; > pars <- integer(phy$Nnode); # worst case assignment, usually far too > long > mrcaind <- 1; # index in pars of the mrca node. will return highest > one traversed by other lineages > tnd <- NULL; # node ids of tips > if ("character" %in% class(tips)) { > tnd <- match(tips, phy$tip.label); > } else { > tnd <- tips; > } > > # build a lookup table to get parents faster > pvec <- integer(max(phy$edge)); > pvec[phy$edge[,2]] <- phy$edge[,1]; > > # get entire lineage for first tip > nd <- tnd[1]; > for(k in 1:phy$Nnode){ > nd <- pvec[nd] > pars[k] <- nd > if (nd == rootnd) break > } > pars <- pars[1:k] #delete the rest > > # traverse lineages for remaining tips, stop if hit common ancestor > for (i in 2:length(tnd)) { > done <- FALSE; > cnd <- tnd[i]; > while (!done) { > cpar <- pvec[cnd]; # get immediate parent > if (cpar %in% pars) { > if (cpar == rootnd) { > # early exit! if the mrca of any set of taxa is the root > then we are done > return(rootnd); > } > idx <- which(pars == cpar); > if (idx > mrcaind) { > mrcaind <- idx; > } > done <- TRUE; > } else { > # keep going! > cnd <- cpar; > } > } > } > return(pars[mrcaind]); > } > > > Have a nice weekend, > Klaus > > On Fri, Jun 9, 2017 at 3:59 PM, Joseph W. Brown <josep...@umich.edu > <mailto:josep...@umich.edu>> wrote: > Liam, > > I put the (updated) code up as a gist > <https://gist.github.com/josephwb/ad61fd29ed4fb06e712e23d67422c813 > <https://gist.github.com/josephwb/ad61fd29ed4fb06e712e23d67422c813>>. Feel > free to use/improve/whatever. Emmanuel sees room for improvement but frankly > taking 0.3 seconds to find the MRCA of 5000 taxa on a million-tip tree is > good enough for my purposes. > > JWB > > Joseph W. Brown > Post-doctoral Researcher, Smith Laboratory > University of Michigan > Department of Ecology & Evolutionary Biology > Room 2071, Kraus Natural Sciences Building > Ann Arbor MI 48109-1079 > josep...@umich.edu <mailto:josep...@umich.edu> > > > > > On 9 Jun, 2017, at 15:20, Liam J. Revell <liam.rev...@umb.edu > > <mailto:liam.rev...@umb.edu>> wrote: > > > > On the other hand, phytools does have a function - the somewhat imprecisely > > named fastMRCA - which can find
Re: [R-sig-phylo] A possible alternate MRCA function to APE\'s getMRCA
Liam, I put the (updated) code up as a gist <https://gist.github.com/josephwb/ad61fd29ed4fb06e712e23d67422c813>. Feel free to use/improve/whatever. Emmanuel sees room for improvement but frankly taking 0.3 seconds to find the MRCA of 5000 taxa on a million-tip tree is good enough for my purposes. JWB ____ Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 9 Jun, 2017, at 15:20, Liam J. Revell <liam.rev...@umb.edu> wrote: > > On the other hand, phytools does have a function - the somewhat imprecisely > named fastMRCA - which can find the MRCA of just a pair of species much > faster than getMRCA (however still slower than or only about as fast as > Joseph & Klaus's solutions). > > Liam J. Revell, Associate Professor of Biology > University of Massachusetts Boston > web: http://faculty.umb.edu/liam.revell/ > email: liam.rev...@umb.edu > blog: http://blog.phytools.org > > On 6/9/2017 5:22 AM, Liam J. Revell wrote: >> >> Juan. findMRCA was written before getMRCA existed, but the latter was >> faster so now it just calls getMRCA internally. All the best, Liam >> >> Liam J. Revell, Associate Professor of Biology >> University of Massachusetts Boston >> web: http://faculty.umb.edu/liam.revell/ >> email: liam.rev...@umb.edu >> blog: http://blog.phytools.org >> >> On 6/9/2017 1:57 AM, Juan Antonio Balbuena wrote: >>> Package phytools includes a function, findMRCA, that is supposed to work >>> very efficiently with large trees. you may wish to compare it with your >>> function. >>> >>> Cheers >>> >>> Juan >>> >>> -- >>> >>> Dr. Juan A. Balbuena >>> Cavanilles Institute of Biodiversity and Evolutionary Biology >>> University of Valencia >>> http://www.uv.es/~balbuena <http://www.uv.es/%7Ebalbuena> >>> P.O. Box 22085 >>> http://www.uv.es/cophylpaco >>> <http://www.uv.es/cavanilles/zoomarin/index.htm> >>> 46071 Valencia, Spain >>> e-mail: j.a.balbu...@uv.es <mailto:j.a.balbu...@uv.es>tel. +34 963 >>> 543 658fax +34 963 543 733 >>> >>> *NOTE!*For shipments by EXPRESS COURIER use the following street address: >>> C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain. >>> >>> >>> >>> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=emailclient> >>> >>> Libre de virus. www.avast.com >>> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=emailclient> >>> >>> >>> >>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >>> >>> >>> ___ >>> R-sig-phylo mailing list - R-sig-phylo@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >>> Searchable archive at >>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ >>> >> >> ___ >> R-sig-phylo mailing list - R-sig-phylo@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at >> http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] A possible alternate MRCA function to APE's getMRCA
w00t! I am aware that `which` is inefficient, but wasn't sure how to get around it in this instance. Thanks! It would be great to have this in the new version! Joseph. Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 7 Jun, 2017, at 19:06, Klaus Schliep <klaus.schl...@gmail.com> wrote: > > Hi Joseph, > > just a few changes on your code and it is actually really fast. Having > multiple calls to which() is often slow. > Replacing which() with a simple lookup table will speed your code up quite a > bit as you can see below. > This is actually how the function Ancestors() in phangorn works. > But let's start with the output from profiling your code and you can see that > alt_mrca() spends most of the time in which(). > > Rprof(tmp <- tempfile()) > nd.alt <- alt_mrca(phy, tips) > Rprof() > summaryRprof(tmp) > unlink(tmp) > > $by.self > self.time self.pct total.time total.pct > "which" 1.56 100 1.56 100 > > $by.total >total.time total.pct self.time self.pct > "which" 1.56 100 1.56 100 > "alt_mrca" 1.56 100 0.000 > > $sample.interval > [1] 0.02 > > $sampling.time > [1] 1.56 > > Now here are some timings in comparison this version with one where I > replaced calls to which() with a lookup table: > > # ALT > system.time(nd.alt <- alt_mrca(phy, tips)); nd.alt; >user system elapsed > 1.552 0.004 1.555 > [1] 13 > > > # ALT FAST > system.time(nd.alt <- alt_mrca_fast(phy, tips)); nd.alt; >user system elapsed > 0.004 0.000 0.004 > [1] 13 > > Not too bad for some pure R code. > And now you really need to start thinking of another excuse for a coffee > break. > > Cheers, > Klaus > > PS: Emmanuel, just another function which could get into next version. > PPS: The plot should better have a logarithmic scale for time. > PPPS: Here is the improved code, just 2 additional lines. > > > alt_mrca_fast <- function (phy, tips) { > rootnd <- length(phy$tip.label) + 1; > pars <- numeric(); # parents in lineage of first queried tip > mrcaind <- 1; # index in pars of the mrca node > tnd <- NULL; # node ids of tips > if ("character" %in% class(tips)) { > tnd <- match(tips, phy$tip.label); > } else { > tnd <- tips; > } > > # build a lookup table to get parents faster > pvec <- integer(max(phy$edge)) > pvec[phy$edge[,2]] = phy$edge[,1] > > # get entire lineage for first tip > nd <- tnd[1]; > repeat { > #nd <- phy$edge[which(phy$edge[,2] == nd),1]; # get immediate parent > nd <- pvec[nd] > pars <- c(pars, nd); > if (nd == rootnd) { > break; > } > } > > > # traverse lineages for remaining tips, stop if hit common ancestor > for (i in 2:length(tnd)) { > done <- FALSE; > cnd <- tnd[i]; > while (!done) { > #cpar <- phy$edge[which(phy$edge[,2] == cnd),1]; # get immediate > parent > cpar <- pvec[cnd] > if (cpar %in% pars) { > if (cpar == rootnd) { > # early exit! if the mrca of any set of taxa is the root > then we are done > return(rootnd); > } > idx <- which(pars == cpar); > if (idx > mrcaind) { > mrcaind <- idx; > } > done <- TRUE; > } else { > # keep going! > cnd <- cpar; > } > } > } > return(pars[mrcaind]); > } > > > > > On Wed, Jun 7, 2017 at 4:24 PM, Joseph W. Brown <josep...@umich.edu > <mailto:josep...@umich.edu>> wrote: > I've been working with some very large trees (tens to hundreds of thousands > of tips) where the speed of APE's getMRCA function has been prohibitively > slow. I therefore R-ified an MRCA function I developed in Java for use with > the Open Tree of Life tree (~3 million tips). I don't pretend to be an > algorithm guy, but this seems to work. > > Given a tree `phy` and a set of query taxa `tips`, here is the algorithm > (note that the terminology has the root at the top of th
[R-sig-phylo] A possible alternate MRCA function to APE's getMRCA
I've been working with some very large trees (tens to hundreds of thousands of tips) where the speed of APE's getMRCA function has been prohibitively slow. I therefore R-ified an MRCA function I developed in Java for use with the Open Tree of Life tree (~3 million tips). I don't pretend to be an algorithm guy, but this seems to work. Given a tree `phy` and a set of query taxa `tips`, here is the algorithm (note that the terminology has the root at the top of the tree): 1) For the first tip, drill up through the tree, recording its entire lineage (parental nodes) to the root in a vector. This seems wasteful (i.e., if the MRCA is not the root) but in practice is actually fast. Call this vector `pars`. 2) For each subsequent tip, traverse its lineage up in the tree until encountering an ancestor in `pars`. 3) Return the highest indexed node in `pars` that was traversed by the other lineages. A convenient "early exit" of this approach is that if any of the MRCAs are the root then we can stop immediately. Another benefit with this is that the query nodes do not necessarily have to be terminal taxa: MRCAs can be found among terminal and internal nodes. So, how does it fare? Pretty good! Here are some results on a 100K tip tree (the tree I am actually working with has 350K tips). # set up example set.seed(13); phy <- rtree(10); node <- 13; # make sure MRCA is not the root tt <- unlist(Descendants(phy, 13)); set.seed(13); tips <- sample(tt, 50); # APE system.time(nd.ape <- getMRCA(phy, tips)); nd.ape; user system elapsed 14.459 0.066 14.482 [1] 13 # ALT system.time(nd.alt <- alt_mrca(phy, tips)); nd.alt; user system elapsed 2.557 0.534 3.063 [1] 13 Ok, that was for a shallow MRCA node (i.e. close to the root). How does it fare for recent MRCAs? Here is a timing for sister tips: # APE system.time(nd.ape <- getMRCA(phy, c(13,14))); nd.ape; user system elapsed 14.715 0.063 14.727 [1] 100027 # ALT system.time(nd.alt <- alt_mrca(phy, c(13,14))); nd.alt; user system elapsed 0.047 0.015 0.061 [1] 100027 Way faster! Getting the entire lineage of the first tip does not seem to hurt us. We can also look how the timing scales as the size of the query set changes (I've made sure that the MRCA is identical in every case): (APE is blue, alt_mrca is red). So the APE version seems insensitive to the number of tips query. I suppose there must be a point where the timings cross and the alt_mrca function takes more time, but I have not found it. Besides, if the query set becomes large enough then the MRCA is likely the root itself, and in that case we may exit early: # APE with just 2 tips system.time(nd.ape <- getMRCA(phy, c(1,10))); nd.ape; user system elapsed 14.627 0.055 14.642 [1] 11 # ALT with a random 500 tips system.time(nd.alt <- alt_mrca(phy, sample(1:10, 500))); nd.alt; user system elapsed 0.292 0.058 0.346 [1] 11 Not bad. The code for the alternate MRCA is below. I've considered lapply-ing this (i.e. across query nodes), but that would negate the potential early exit (as all tip-to-root lineages would have to be traversed). Note that in all of these timings above the APE getMRCA function uses C code but the alternative is pure R. I imagine the alt_mrca could be made even speedier if coded in C. HTH. Joseph. alt_mrca <- function (phy, tips) { rootnd <- length(phy$tip.label) + 1; pars <- numeric(); # parents in lineage of first queried tip mrcaind <- 1; # index in pars of the mrca node tnd <- NULL; # node ids of tips if ("character" %in% class(tips)) { tnd <- match(tips, phy$tip.label); } else { tnd <- tips; } # get entire lineage for first tip nd <- tnd[1]; repeat { nd <- phy$edge[which(phy$edge[,2] == nd),1]; # get immediate parent pars <- c(pars, nd); if (nd == rootnd) { break; } } # traverse lineages for remaining tips, stop if hit common ancestor for (i in 2:length(tnd)) { done <- FALSE; cnd <- tnd[i]; while (!done) { cpar <- phy$edge[which(phy$edge[,2] == cnd),1]; # get immediate parent if (cpar %in% pars) { if (cpar == rootnd) { # early exit! if the mrca of any set of taxa is the root then we are done return(rootnd); } idx <- which(pars == cpar); if (idx > mrcaind) { mrcaind <- idx; } done <- TRUE; } else { # keep going! cnd <- cpar; } } } return(pars[mrcaind]); } Joseph W. Brown Post-doctoral Researcher, Smith Laboratory Univer
Re: [R-sig-phylo] dating cladograms in R
Fabio you might have luck with Brian O'Meara's datelife: https://github.com/phylotastic/datelife <https://github.com/phylotastic/datelife> Essentially it grabs dates from published trees and applies it to your own. Brian can provide more details. (^_<)〜☆ HTH. JWB ____ Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 17 May, 2017, at 08:12, Fábio Machado <macfa...@gmail.com> wrote: > > Dear community, > > I was wondering if there is any methodology implemented in R to date > cladograms, either using only the tree topology or using additional molecular > data. If not, what would be the best solution to obtain a dated phylogeny, > given that I have only a tree topology to work with. > > Best regards, > > Fabio Andrade Machado > Becario Postdoctoral - CONICET > Museo Argentino de Ciencias Naturales “Bernardino Rivadavia” > División de Mastozoología, > Buenos Aires, Av. Ángel Gallardo 470 (C1405DJR) > Argentina > f.mach...@usp.br <mailto:f.mach...@usp.br> ; macfa...@gmail.com > <mailto:macfa...@gmail.com> > +55 11 982631029 > skype: fabio_a_machado > > Lattes: http://lattes.cnpq.br/3673327633303737 > <http://lattes.cnpq.br/3673327633303737> > Google Scholar: http://scholar.google.com/citations?hl=en=2l6-VrQJ > <http://scholar.google.com/citations?hl=en=2l6-VrQJ> > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Extracting a clade for LTT
Hi David. Sounds like you want drop.tip from ape to get rid of the outgroup. You need only pass in the tree and the outgroup(s) name(s). extract.clade will so something similar, but you need to calculate the mrca for each tree first to pass in the node id. HHT. JWB Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 8 May, 2017, at 07:31, David Buckley <dbuck...@mncn.csic.es> wrote: > > Probably a very easy question, but I could’t find a straightforward answer > for it… > I have a posterior distribution of trees all rooted with a > not-so-closely-related outgroup. I’d like to perform some diversification > analyses (LTT, etc.) just for the ingroup, obviating the outgroup. Is there > an easy way to extract the ingroup (or remove the outgroup) from all the > trees before performing the analyses? I have tried to do it ‘manually’, > deleting the outgroup taxa in Mesquite, but it looks to me that the LTT plots > are still considering some branch lengths from the ingroup to the outgroup > clade (kind of considering a stem-origin, not a crown-origin for the ingroup > clade…). > > Sorry if the question is too naïve… > > best > > david > > > > David Buckley > Dpt. Biodiversity and Evolutionary Biology > Museo Nacional de Ciencias Naturales, MNCN-CSIC > c/José Gutiérrez Abascal 2 > 28006-Madrid > Spain > Phone: +34 91 411 13 28 ext. 1126 > dbuck...@mncn.csic.es > https://www.researchgate.net/profile/David_Buckley4 > <https://www.researchgate.net/profile/David_Buckley4> > http://scholar.google.com/citations?user=qEFTmfkJ=en > <http://scholar.google.com/citations?user=qEFTmfkJ=en> > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Extract all possible clades from a tree
Hmm. Maybe something wonky with your tree? I simulated a 17-tip tree and tried things out: phy <- rtree(17); rootID <- length(phy$tip.label) + 1; counter <- 1; for (i in rootID:max(phy$edge[,1])) { clade <- extract.clade(phy, i); # do something. just printing clade properties here print(paste0("clade ", counter, " (node ", i, ") has ", length(clade$tip.label), " tips.")); counter <- counter + 1; } [1] "clade 1 (node 18) has 17 tips." [1] "clade 2 (node 19) has 11 tips." [1] "clade 3 (node 20) has 10 tips." [1] "clade 4 (node 21) has 4 tips." [1] "clade 5 (node 22) has 3 tips." [1] "clade 6 (node 23) has 2 tips." [1] "clade 7 (node 24) has 6 tips." [1] "clade 8 (node 25) has 5 tips." [1] "clade 9 (node 26) has 3 tips." [1] "clade 10 (node 27) has 2 tips." [1] "clade 11 (node 28) has 2 tips." [1] "clade 12 (node 29) has 6 tips." [1] "clade 13 (node 30) has 2 tips." [1] "clade 14 (node 31) has 4 tips." [1] "clade 15 (node 32) has 3 tips." [1] "clade 16 (node 33) has 2 tips." and had no problem. Maybe post your tree here? JWB Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 5 Jan, 2017, at 17:18, Kamila Naxerova <knaxer...@partners.org> wrote: > > Dear Joseph, > > thanks so much. This is exactly what I need! > > I am running into some problems that I don’t understand though. In my case, > rootID is 18, and max(phy$edge[,1]) is 33. When I try to execute your loop, > this happens: > > > extract.clade(phy, 18) > > Phylogenetic tree with 17 tips and 16 internal nodes. > > Tip labels: > X1, X8, X9, X10 ... > > Rooted; includes branch lengths. > > So far so good… but then I keep getting these errors: > > > extract.clade(phy, 19) > Error in phy$edge[, 2] : incorrect number of dimensions > > extract.clade(phy, 20) > Error in phy$edge[, 2] : incorrect number of dimensions > > Not sure why extract.clade produces these errors. 19-23 don’t work, 24-26 > work, 27 produces the error again, 28 works etc. > > Thanks again. > > Kamila > > >> On Jan 5, 2017, at 4:12 PM, Joseph W. Brown <josep...@umich.edu >> <mailto:josep...@umich.edu>> wrote: >> >> Not sure if I understand the problem completely, but this should allow you >> to examine all of the clades (and should work if polytomies are involved): >> >> # for tree phy >> rootID <- length(phy$tip.label) + 1; >> for (i in rootID:max(phy$edge[,1])) { >> clade <- extract.clade(phy, i); >> # do something >> } >> >> This includes the root node (i.e. whole tree), but that can be changed. This >> can be rewritten as an lapply if necessary. >> >> HTH. >> JWB >> >> Joseph W. Brown >> Post-doctoral Researcher, Smith Laboratory >> University of Michigan >> Department of Ecology & Evolutionary Biology >> Room 2071, Kraus Natural Sciences Building >> Ann Arbor MI 48109-1079 >> josep...@umich.edu <mailto:josep...@umich.edu> >> >> >> >>> On 5 Jan, 2017, at 15:50, Kamila Naxerova <knaxer...@partners.org >>> <mailto:knaxer...@partners.org>> wrote: >>> >>> Hi all, >>> >>> I would like to break a phylogenetic tree into all possible clades and then >>> examine each one of them for certain characteristics. >>> >>> I am writing some code to do this from scratch, but it’s getting pretty >>> cumbersome quickly. >>> >>> I was wondering whether there are some functions out there already that >>> could help me with this task? >>> >>> Thanks so much for any help. >>> >>> Cheers, >>> Kamila >>> >>> >>> The information in this e-mail is intended only for the person to whom it is >>> addressed. If you believe this e-mail was sent to you in error and the >>> e-mail >>> contains patient information, please contact the Partners Compliance >>> HelpLine at >>> http://www.partners.org/complianceline >>> <http://www.partners.org/complianceline> . If the e-mail was sent to you in >>> error >>> but does not contain patient information, please contact the sender and >>> properly >>> dispose of the e-mail. >>> ___ >>> R-sig-phylo mailing list - R-sig-phylo@r-project.org >>> <mailto:R-sig-phylo@r-project.org> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >>> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> >>> Searchable archive at >>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ >>> <http://www.mail-archive.com/r-sig-phylo@r-project.org/> > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Extract all possible clades from a tree
Not sure if I understand the problem completely, but this should allow you to examine all of the clades (and should work if polytomies are involved): # for tree phy rootID <- length(phy$tip.label) + 1; for (i in rootID:max(phy$edge[,1])) { clade <- extract.clade(phy, i); # do something } This includes the root node (i.e. whole tree), but that can be changed. This can be rewritten as an lapply if necessary. HTH. JWB ____ Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 5 Jan, 2017, at 15:50, Kamila Naxerova <knaxer...@partners.org> wrote: > > Hi all, > > I would like to break a phylogenetic tree into all possible clades and then > examine each one of them for certain characteristics. > > I am writing some code to do this from scratch, but it’s getting pretty > cumbersome quickly. > > I was wondering whether there are some functions out there already that could > help me with this task? > > Thanks so much for any help. > > Cheers, > Kamila > > > The information in this e-mail is intended only for th...{{dropped:21}} ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] compressTipLabel as an option to read.trees()
I wonder if reading in a Nexus file with a translation table bypasses this problem? JWB Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 14 Dec, 2016, at 10:16, Yan Wong <y...@yanwong.me> wrote: > > Hi, > > I’m reading in a large number of newick trees with the same tips, all from a > single file. If I do trees<-read.trees() followed by trees <- > .compressTipLabel(trees), it reduces the memory footprint well, but takes an > age to run. I can’t help thinking this could be sped up during the reading > process by passing an option to read.trees() to specify that the tip labels > are the same in each tree in the multiPhylo object. Has anyone implemented > such an option? > > Cheers > > Yan > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] 99 Ways to ruin an open source project
Here is a cheeky post <http://opensoul.org/2015/07/22/99ways/> by Brandon Keepers on how to do things wrong when developing open source software. Lots of good stuff in there! JWB ____ Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Anyone knows how to concatenate aligned genes sequences so as to create whole genome alignments?
We have a non-R tool that can do the job: https://github.com/FePhyFoFum/phyx After compiling, command is (assuming fasta (extension ".fas") input, but any input format will work): ./pxcat -s *.fas -o my_concatenated_alignment.fas -p partition_info.txt (The partition_info.txt logs how sites/partitions are ordered). If you need these in out another format (say, phylip), do: ./pxcat -s *.fas -o my_concatenated_alignment.fas | ./pxs2phy -o my_concatenated_alignment.phy HTH. JWB ________ Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 12 Sep, 2016, at 06:41, Bhuller, Ravneet > <ravneet.bhulle...@imperial.ac.uk> wrote: > > Dear Members, > > Any suggestions on how to concatenate the aligned gene sequences in fasta > format so as to get whole genome alignments? > I need whole genome alignments as an input to a phylogenetic tool. > > Many thanks, > > Rav > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Ultrametric tree not recognized
Comparing floating point numbers is (at least for me) pretty scary when you get down into it. Here are some more accessible treatments (although Goldberg’s paper seems to be the definitive treatment on the subject): What Every Programmer Should Know About Floating-Point Arithmetic: http://floating-point-gui.de Comparing Floating Point Numbers, 2012 Edition: https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ HTH. JWB Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 16 Aug, 2016, at 12:15, Klaus Schliep <klaus.schl...@gmail.com> wrote: > > We should consider to improve is.ultrametric to use as tolerance a certain > number significant figures (e.g. 8-10 digits). This way tol would only depend > on the tip-to-root distance and not a machine dependent minimal value. I find > the use of significant figures suits really well as a stopping criterion for > example in my ML functions, especially as it works for small and large trees > equally well. > > write.tree has an option to define the number of digits, it will be generally > less than the internal representation and I suspect one could possible export > with higher precision from other programs, but will result in larger file > sizes which one has to put in consideration for MCMC / bootstrap tree files. > > As a side note on the developer site of the r-project.org > <http://r-project.org/> homepage (not CRAN) is a link to this nice ancient > paper (http://www.validlab.com/goldberg/paper.pdf > <http://www.validlab.com/goldberg/paper.pdf>) > > Regards, > Klaus > > > PS: And Martin of course I am happy with Liam's solution as it uses phangorn > ;) > > On Tue, Aug 16, 2016 at 10:22 AM, Joseph W. Brown <josep...@umich.edu > <mailto:josep...@umich.edu>> wrote: > That’s a good tip, but the issue here is that many packages use > is.ultrametric internally with a hard-coded (likely, default) value of tol. > In this case, Martin simply cannot perform the analyses he wants to run > because this option is inaccessible. > > JWB > > Joseph W. Brown > Post-doctoral Researcher, Smith Laboratory > University of Michigan > Department of Ecology & Evolutionary Biology > Room 2071, Kraus Natural Sciences Building > Ann Arbor MI 48109-1079 > josep...@umich.edu <mailto:josep...@umich.edu> > > > >> On 16 Aug, 2016, at 10:15, Klaus Schliep <klaus.schl...@gmail.com >> <mailto:klaus.schl...@gmail.com>> wrote: >> >> Hello all, >> this may come be surprising to many, but consulting the manual >> ?is.ultrametric can be helpful. Why not simply try e.g. is.ultrametric(tree, >> tol=.01) ???? >> So in this sense RTFM >> Regards, >> Klaus >> >> >> On Aug 16, 2016 9:31 AM, "Martin Dohrmann" <m.dohrm...@lrz.uni-muenchen.de >> <mailto:m.dohrm...@lrz.uni-muenchen.de>> wrote: >> >> Am 16.08.2016 um 15:20 schrieb Joseph W. Brown: >> >> > I agree that it is almost certainly numerical precision, as rescaling the >> > tree fixes things: >> > >> > > is.ultrametric(phy); >> > [1] FALSE >> > > fie <- phy; >> > > fie$edge.length <- fie$edge.length * 0.1; >> > > is.ultrametric(fie); >> > [1] TRUE >> > >> > I’ve also tested the ultrametricity(?) with a non-R program and the >> > results are the same. So can we assume this is not a PhyloBayes issue, and >> > rather an unavoidable problem with large, old trees? But the fact that >> > MrBayes trees are ultrametric is confusing; does MrBayes use less precise >> > edge lengths? >> >> Actually the MrBayes branch lengths appear to be MORE precise. >> >> Martin >> >> >> >> >> On 16 Aug, 2016, at 08:53, Liam J. Revell <liam.rev...@umb.edu >> >> <mailto:liam.rev...@umb.edu>> wrote: >> >> >> >> Hi Martin. >> >> >> >> Since you are writing & reading trees to file, my guess is that it has to >> >> do with numerical precision - that is, the rounding of your edge lengths >> >> when they are written to file. >> >> >> >> Does your tree look ultrametric when plotted in R? If so, this is >> >> probably the case. >> >> >> >> My recommendation is that y
Re: [R-sig-phylo] Ultrametric tree not recognized
That’s a good tip, but the issue here is that many packages use is.ultrametric internally with a hard-coded (likely, default) value of tol. In this case, Martin simply cannot perform the analyses he wants to run because this option is inaccessible. JWB Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 16 Aug, 2016, at 10:15, Klaus Schliep <klaus.schl...@gmail.com> wrote: > > Hello all, > this may come be surprising to many, but consulting the manual > ?is.ultrametric can be helpful. Why not simply try e.g. is.ultrametric(tree, > tol=.01) > So in this sense RTFM > Regards, > Klaus > > > On Aug 16, 2016 9:31 AM, "Martin Dohrmann" <m.dohrm...@lrz.uni-muenchen.de > <mailto:m.dohrm...@lrz.uni-muenchen.de>> wrote: > > Am 16.08.2016 um 15:20 schrieb Joseph W. Brown: > > > I agree that it is almost certainly numerical precision, as rescaling the > > tree fixes things: > > > > > is.ultrametric(phy); > > [1] FALSE > > > fie <- phy; > > > fie$edge.length <- fie$edge.length * 0.1; > > > is.ultrametric(fie); > > [1] TRUE > > > > I’ve also tested the ultrametricity(?) with a non-R program and the results > > are the same. So can we assume this is not a PhyloBayes issue, and rather > > an unavoidable problem with large, old trees? But the fact that MrBayes > > trees are ultrametric is confusing; does MrBayes use less precise edge > > lengths? > > Actually the MrBayes branch lengths appear to be MORE precise. > > Martin > > >> > >> On 16 Aug, 2016, at 08:53, Liam J. Revell <liam.rev...@umb.edu > >> <mailto:liam.rev...@umb.edu>> wrote: > >> > >> Hi Martin. > >> > >> Since you are writing & reading trees to file, my guess is that it has to > >> do with numerical precision - that is, the rounding of your edge lengths > >> when they are written to file. > >> > >> Does your tree look ultrametric when plotted in R? If so, this is probably > >> the case. > >> > >> My recommendation is that you use phangorn to compute the non-negative > >> least-squares edge lengths with the condition that the tree is > >> ultrametric. This will give you the edge lengths that result in the > >> distances between taxa with minimum sum of squared differences from the > >> distances implied by your input tree, under the criterion that the > >> resulting tree is ultrametric. > >> > >> To do this you need to merely run: > >> > >> library(phytools) > >> library(phangorn) > >> is.ultrametric(tree) ## fails > >> plotTree(tree,ftype="off") ## does my tree look ultrametric? > >> nnls<-nnls.tree(cophenetic(tree),tree,rooted=TRUE) > >> is.ultrametric(tree) ## should pass > >> > >> Let us know if this works. All the best, Liam > >> > >> Liam J. Revell, Associate Professor of Biology > >> University of Massachusetts Boston > >> web: http://faculty.umb.edu/liam.revell/ > >> <http://faculty.umb.edu/liam.revell/> > >> email: liam.rev...@umb.edu <mailto:liam.rev...@umb.edu> > >> blog: http://blog.phytools.org <http://blog.phytools.org/> > >> > >> On 8/16/2016 6:41 AM, Martin Dohrmann wrote: > >>> Hi, > >>> I want to do some diversification pattern analyses with various R > >>> packages. I'm using a time-calibrated tree produced by PhyloBayes. Branch > >>> lengths are in millions of years and all taxa are extant, so the tree > >>> should be ultrametric. However, when I call "is.ultrametric", it returns > >>> "FALSE". > >>> > >>> Has anybody encountered something like this? Any ideas about what's going > >>> on/how to solve this? > >>> > >>> Some further information: > >>> I also tried other PhyloBayes time trees, with the same result. In > >>> contrast, MrBayes time trees I tried for comparison are recognized as > >>> ultrametric. Regarding my diversification analyses, TESS would not run, > >>> telling me "The likelihood function is only defined for ultrametric > >>> trees!". On the other hand, BAMMtools doesn't seem to have a problem with > >>> my tree. I haven't tried other packages yet,
Re: [R-sig-phylo] Ultrametric tree not recognized
I agree that it is almost certainly numerical precision, as rescaling the tree fixes things: > is.ultrametric(phy); [1] FALSE > fie <- phy; > fie$edge.length <- fie$edge.length * 0.1; > is.ultrametric(fie); [1] TRUE I’ve also tested the ultrametricity(?) with a non-R program and the results are the same. So can we assume this is not a PhyloBayes issue, and rather an unavoidable problem with large, old trees? But the fact that MrBayes trees are ultrametric is confusing; does MrBayes use less precise edge lengths? JWB ____________ Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 16 Aug, 2016, at 08:53, Liam J. Revell <liam.rev...@umb.edu> wrote: > > Hi Martin. > > Since you are writing & reading trees to file, my guess is that it has to do > with numerical precision - that is, the rounding of your edge lengths when > they are written to file. > > Does your tree look ultrametric when plotted in R? If so, this is probably > the case. > > My recommendation is that you use phangorn to compute the non-negative > least-squares edge lengths with the condition that the tree is ultrametric. > This will give you the edge lengths that result in the distances between taxa > with minimum sum of squared differences from the distances implied by your > input tree, under the criterion that the resulting tree is ultrametric. > > To do this you need to merely run: > > library(phytools) > library(phangorn) > is.ultrametric(tree) ## fails > plotTree(tree,ftype="off") ## does my tree look ultrametric? > nnls<-nnls.tree(cophenetic(tree),tree,rooted=TRUE) > is.ultrametric(tree) ## should pass > > Let us know if this works. All the best, Liam > > Liam J. Revell, Associate Professor of Biology > University of Massachusetts Boston > web: http://faculty.umb.edu/liam.revell/ > email: liam.rev...@umb.edu > blog: http://blog.phytools.org > > On 8/16/2016 6:41 AM, Martin Dohrmann wrote: >> Hi, >> I want to do some diversification pattern analyses with various R packages. >> I'm using a time-calibrated tree produced by PhyloBayes. Branch lengths are >> in millions of years and all taxa are extant, so the tree should be >> ultrametric. However, when I call "is.ultrametric", it returns "FALSE". >> >> Has anybody encountered something like this? Any ideas about what's going >> on/how to solve this? >> >> Some further information: >> I also tried other PhyloBayes time trees, with the same result. In contrast, >> MrBayes time trees I tried for comparison are recognized as ultrametric. >> Regarding my diversification analyses, TESS would not run, telling me "The >> likelihood function is only defined for ultrametric trees!". On the other >> hand, BAMMtools doesn't seem to have a problem with my tree. I haven't tried >> other packages yet, but I suspect RPANDA, TreePar etc. might also have >> issues if they don't recognize my tree as ultrametric. >> >> I'd appreciate any help! >> >> Best wishes, >> Martin >> >> Dr. Martin Dohrmann >> Ludwig-Maximilians-University Munich >> Dept. of Earth & Environmental Sciences >> Palaeontology & Geobiology >> Molecular Geo- & Palaeobiology Lab >> Richard-Wagner-Str. 10 >> 80333 Munich, Germany >> Phone: +49-(0)89-2180-6593 >> >> >> [[alternative HTML version deleted]] >> >> ___ >> R-sig-phylo mailing list - R-sig-phylo@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ >> > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] chronos nlminb error
Franz, Yes, Mac has changed enough recently that installing the dependencies can be a horrific hassle (specifically: the default use of clang instead of gcc). A couple of options: 1. Install dependencies with homebrew http://brew.sh/. 2. Override the default use of clang, and instead use gcc. 3. If all else fails, and it is not too much trouble, use a virtual box, and install/use treePL in some flavour of linux. This should work flawlessly. HTH. JWB. Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu On 31 Dec, 2014, at 11:34, Franz Krah f.k...@mailbox.org wrote: Thanks a lot Brian! Took me the whole day but finally treePL works fine although there were some difficulties with Yosemite and the version of c++. The tree - reorder(tree) trick didn't do the trick. Cheers, Franz Brian O'Meara omeara.br...@gmail.com hat am 30. Dezember 2014 um 19:56 geschrieben: The programs r8s, pathd8, and treepl can all take a phylogram and return a chronogram, but they're not within R (though they could be wrapped). You could also infer a chronogram from the start using Beast or MrBayes, though the large size of the tree may make it difficult. One simple thing to try before all this, though, is pass your tree through a reorder() function call (using default arguments) before passing it to chronos: (i.e., tree - read.tree(full_Cipres_Data/RAxML_bestTree.tre) *tree - reorder(tree) tree - chronos(tree, lambda = 1, model = correlated, quiet = FALSE, calibration = makeChronosCalib(tree), control = chronos.control()) Sometimes the imported tree structure, while valid, can be an issue for ape and I find doing this reordering can fix weird errors. Best, Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Postdoc collaborators wanted: http://nimbios.org/postdocs/ Calendar: http://www.brianomeara.info/calendars/omeara On Tue, Dec 30, 2014 at 4:18 AM, Franz Krah f.k...@mailbox.org wrote: Dear all, I used chronos to make a large phylogeny (~3600 species) ultrametric. The tree is rooted and binary. With the following code I get the error: tree - read.tree(full_Cipres_Data/RAxML_bestTree.tre) tree - chronos(tree, lambda = 1, model = correlated, quiet = FALSE, calibration = makeChronosCalib(tree), control = chronos.control()) Error: In nlminb(start.para, f, g, control = opt.ctrl, lower = LOW, upper = UP) : NA/NaN function evaluation This error was posted before some times but I didn't find a satisfying answer. I tried out some things but didn't helped: altering the root in different ways, altering the model, altering lambda. The functions worked fine with smaller phylogenies before! Hope someone found a solution! Are there other ways of making my tree ultrametric? Cheers, Franz ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Analysis with Multiple cores on Mac Workstation
obvious utility. As Daniel says, the average branch lengths across a set of trees is more difficult to see a use case for, but you could imagine doing something like taking the ratogram output from r8s on a set of trees and summarizing the rate average and rate sd on a given, best, tree as two sets of branch lengths on that tree. I've put the function source at https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/R/consensusBrl en.R?revision=110root=omearalab . You can source the file for the function (consensusBrlen() ) and other functions it needs. It also uses phylobase. Note that this is alpha-quality code -- it's been checked a bit, but verify it's doing what you want. Here's an example of how to use it library(ape) library(phylobase) phy.a-rcoal(15) phy.b-phy.a phy.b$edge.length-phy.b$edge.length+runif(length(phy.b$edge.length), 0, 0.1) phy.c-rcoal(15) phy.list-list(phy.a, phy.b, phy.c) phy.consensus-consensusBrlen(phy.a, list(phy.a, phy.b, phy.c), type=mean_brlen) -- José Hidasi Neto Graduated in Biological Sciences - Universidade Federal de Goiás (UFG) Master's candidate in Ecology and Evolution - Community Ecology and Functioning Lab - UFG Lattes: http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4293841A0 [[alternative HTML version deleted]] -- Daniel Barker http://bio.st-andrews.ac.uk/staff/db60.htm The University of St Andrews is a charity registered in Scotland : No SC013532 ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ -- José Hidasi Neto Graduated in Biological Sciences - Universidade Federal de Goiás (UFG) Master's candidate in Ecology and Evolution - Community Ecology and Functioning Lab - UFG Lattes: http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4293841A0 Website: http://rfunctions.blogspot.com.br/ [[alternative HTML version deleted]] -- Klaus Schliep Phylogenomics Lab at the University of Vigo, Spain ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ _ Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/