Hi Joseph I guess I figured out where some room for improvement was. You grow the vector pars <- c(pars, nd) I preallocated pars, but you could even just grow pars using pars[k] <- nd without pre-allocation. This operation is now (with R >= 3.4) far less costly as mentioned in the NEWS: "Assigning to an element of a vector beyond the current length now over-allocates by a small fraction. The new vector is marked internally as growable, and the true length of the new vector is stored in the truelength field. This makes building up a vector result by assigning to the next element beyond the current length more efficient, though pre-allocating is still preferred. The implementation is subject to change and not intended to be used in packages at this time."
One of the many little improvements recently making R quite a bit faster and easier. So the worst case behavior was pretty bad tree <- stree(100000, "right") system.time(get_mrca_fast(tree, c(1,2))) user system elapsed 13.424 0.012 13.440 mrca [1] 199999 with the changes it is much better: system.time(mrca <- get_mrca_fast(tree, c(1,2))) user system elapsed 0.012 0.000 0.012 > mrca [1] 199999 And here is the code: get_mrca_fast <- function (phy, tips) { rootnd <- length(phy$tip.label) + 1; pars <- integer(phy$Nnode); # worst case assignment, usually far too long mrcaind <- 1; # index in pars of the mrca node. will return highest one traversed by other lineages tnd <- NULL; # node ids of tips if ("character" %in% class(tips)) { tnd <- match(tips, phy$tip.label); } else { tnd <- tips; } # build a lookup table to get parents faster pvec <- integer(max(phy$edge)); pvec[phy$edge[,2]] <- phy$edge[,1]; # get entire lineage for first tip nd <- tnd[1]; for(k in 1:phy$Nnode){ nd <- pvec[nd] pars[k] <- nd if (nd == rootnd) break } pars <- pars[1:k] #delete the rest # traverse lineages for remaining tips, stop if hit common ancestor for (i in 2:length(tnd)) { done <- FALSE; cnd <- tnd[i]; while (!done) { cpar <- pvec[cnd]; # get immediate parent if (cpar %in% pars) { if (cpar == rootnd) { # early exit! if the mrca of any set of taxa is the root then we are done return(rootnd); } idx <- which(pars == cpar); if (idx > mrcaind) { mrcaind <- idx; } done <- TRUE; } else { # keep going! cnd <- cpar; } } } return(pars[mrcaind]); } Have a nice weekend, Klaus On Fri, Jun 9, 2017 at 3:59 PM, Joseph W. Brown <josep...@umich.edu> wrote: > Liam, > > I put the (updated) code up as a gist <https://gist.github.com/josephwb/ > ad61fd29ed4fb06e712e23d67422c813>. Feel free to use/improve/whatever. > Emmanuel sees room for improvement but frankly taking 0.3 seconds to find > the MRCA of 5000 taxa on a million-tip tree is good enough for my purposes. > > JWB > ________________________________________ > Joseph W. Brown > Post-doctoral Researcher, Smith Laboratory > University of Michigan > Department of Ecology & Evolutionary Biology > Room 2071, Kraus Natural Sciences Building > Ann Arbor MI 48109-1079 > josep...@umich.edu > > > > > On 9 Jun, 2017, at 15:20, Liam J. Revell <liam.rev...@umb.edu> wrote: > > > > On the other hand, phytools does have a function - the somewhat > imprecisely named fastMRCA - which can find the MRCA of just a pair of > species much faster than getMRCA (however still slower than or only about > as fast as Joseph & Klaus's solutions). > > > > Liam J. Revell, Associate Professor of Biology > > University of Massachusetts Boston > > web: http://faculty.umb.edu/liam.revell/ > > email: liam.rev...@umb.edu > > blog: http://blog.phytools.org > > > > On 6/9/2017 5:22 AM, Liam J. Revell wrote: > >> > >> Juan. findMRCA was written before getMRCA existed, but the latter was > >> faster so now it just calls getMRCA internally. All the best, Liam > >> > >> Liam J. Revell, Associate Professor of Biology > >> University of Massachusetts Boston > >> web: http://faculty.umb.edu/liam.revell/ > >> email: liam.rev...@umb.edu > >> blog: http://blog.phytools.org > >> > >> On 6/9/2017 1:57 AM, Juan Antonio Balbuena wrote: > >>> Package phytools includes a function, findMRCA, that is supposed to > work > >>> very efficiently with large trees. you may wish to compare it with your > >>> function. > >>> > >>> Cheers > >>> > >>> Juan > >>> > >>> -- > >>> > >>> Dr. Juan A. Balbuena > >>> Cavanilles Institute of Biodiversity and Evolutionary Biology > >>> University of Valencia > >>> http://www.uv.es/~balbuena <http://www.uv.es/%7Ebalbuena> > >>> P.O. Box 22085 > >>> http://www.uv.es/cophylpaco > >>> <http://www.uv.es/cavanilles/zoomarin/index.htm> > >>> 46071 Valencia, Spain > >>> e-mail: j.a.balbu...@uv.es <mailto:j.a.balbu...@uv.es> tel. +34 963 > >>> 543 658 fax +34 963 543 733 > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> *NOTE!*For shipments by EXPRESS COURIER use the following street > address: > >>> C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain. > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> > >>> > >>> <https://www.avast.com/sig-email?utm_medium=email&utm_ > source=link&utm_campaign=sig-email&utm_content=emailclient> > >>> > >>> Libre de virus. www.avast.com > >>> <https://www.avast.com/sig-email?utm_medium=email&utm_ > source=link&utm_campaign=sig-email&utm_content=emailclient> > >>> > >>> > >>> > >>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > >>> > >>> > >>> _______________________________________________ > >>> R-sig-phylo mailing list - R-sig-phylo@r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > >>> Searchable archive at > >>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ > >>> > >> > >> _______________________________________________ > >> R-sig-phylo mailing list - R-sig-phylo@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > >> Searchable archive at > >> http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r- > sig-ph...@r-project.org/ > -- Klaus Schliep Postdoctoral Fellow Revell Lab, University of Massachusetts Boston http://www.phangorn.org/ [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/