Re: [R-sig-phylo] Function to Extend Tips?

2017-07-26 Thread Joseph W. Brown
I think this'll work, depending on how you store your data. 

# assumes a tree 'phy' and a dataframe, 'df', with first column of tip names, 
and second column of values
for (i in 1:dim(df)[1]) {
# find index of edge length
idx <- which(phy$edge[,2] == which(phy$tip.label == df[1,i]));
# apply it. this assume you want to extend it, not just set it
phy$edge.length[idx] <- phy$edge.length[idx] + as.numeric(df[i,2]);
}

Probably a nicer way to do it without a loop...

JWB
____
Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 26 Jul, 2017, at 11:57, William Gearty <wgea...@stanford.edu> wrote:
> 
> That's definitely helpful, Joseph, but I'll need to extend the tips to 
> varying amounts.
> 
> Basically, I performed a tip-dating analysis using constraints based on the 
> FADs of a bunch of fossils.
> However, now some of the analyses I want to perform require that the tips 
> extend to the species' extinctions, so I need to extend them to the LADs (or 
> farther, I suppose).
> How would I, given a vector of LAD ages for the tips, extend the tips to 
> those ages?
> 
> On Wed, Jul 26, 2017 at 8:50 AM, Joseph W. Brown <josep...@umich.edu 
> <mailto:josep...@umich.edu>> wrote:
> If you want to just extend all tips by a constant amount you can do this:
> 
> # extend terminal edges by arbitrary amount (here: 13)
> idx <- which(phy$edge[,2] < (phy$Nnode + 1));
> phy$edge.length[idx] <- phy$edge.length[idx] + 13;
> 
> HTH.
> Joseph.
> 
> Joseph W. Brown
> Post-doctoral Researcher, Smith Laboratory
> University of Michigan
> Department of Ecology & Evolutionary Biology
> Room 2071, Kraus Natural Sciences Building
> Ann Arbor MI 48109-1079
> josep...@umich.edu <mailto:josep...@umich.edu>
> 
> 
> 
>> On 26 Jul, 2017, at 11:39, David Bapst <dwba...@gmail.com 
>> <mailto:dwba...@gmail.com>> wrote:
>> 
>> Not sure what you mean by extending tips, Will. Could you describe a
>> small example?
>> -Dave
>> 
>> On Fri, Jul 21, 2017 at 5:15 PM, William Gearty <wgea...@stanford.edu 
>> <mailto:wgea...@stanford.edu>> wrote:
>>> Hi all,
>>> 
>>> Before I go ahead and wrote a whole script, I was wondering if there is a
>>> function or script out there for extending tips (or setting the ages of
>>> tips) given a list of taxa and ages?
>>> I haven't found anything, but perhaps I'm searching the wrong phrase(s).
>>> 
>>> Thanks,
>>> Will
>>> 
>>> --
>>> William Gearty
>>> PhD Candidate, Paleobiology
>>> Department of Geological Sciences
>>> Stanford School of Earth, Energy & Environmental Sciences
>>> williamgearty.com <http://williamgearty.com/>
>>> 
>>>[[alternative HTML version deleted]]
>>> 
>>> ___
>>> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
>>> <mailto:R-sig-phylo@r-project.org>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
>>> Searchable archive at 
>>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
>>> <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
>> 
>> 
>> 
>> -- 
>> David W. Bapst, PhD
>> https://github.com/dwbapst/paleotree <https://github.com/dwbapst/paleotree>
>> 
>> ___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
>> <mailto:R-sig-phylo@r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
>> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
>> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
>> <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
> 
> 
> 
> 
> -- 
> William Gearty
> PhD Candidate, Paleobiology
> Department of Geological Sciences
> Stanford School of Earth, Energy & Environmental Sciences
> williamgearty.com <http://williamgearty.com/>


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Function to Extend Tips?

2017-07-26 Thread Joseph W. Brown
If you want to just extend all tips by a constant amount you can do this:

# extend terminal edges by arbitrary amount (here: 13)
idx <- which(phy$edge[,2] < (phy$Nnode + 1));
phy$edge.length[idx] <- phy$edge.length[idx] + 13;

HTH.
Joseph.
____
Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 26 Jul, 2017, at 11:39, David Bapst <dwba...@gmail.com> wrote:
> 
> Not sure what you mean by extending tips, Will. Could you describe a
> small example?
> -Dave
> 
> On Fri, Jul 21, 2017 at 5:15 PM, William Gearty <wgea...@stanford.edu> wrote:
>> Hi all,
>> 
>> Before I go ahead and wrote a whole script, I was wondering if there is a
>> function or script out there for extending tips (or setting the ages of
>> tips) given a list of taxa and ages?
>> I haven't found anything, but perhaps I'm searching the wrong phrase(s).
>> 
>> Thanks,
>> Will
>> 
>> --
>> William Gearty
>> PhD Candidate, Paleobiology
>> Department of Geological Sciences
>> Stanford School of Earth, Energy & Environmental Sciences
>> williamgearty.com
>> 
>>[[alternative HTML version deleted]]
>> 
>> ___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
> 
> 
> 
> -- 
> David W. Bapst, PhD
> https://github.com/dwbapst/paleotree
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] A possible alternate MRCA function to APE's getMRCA

2017-06-10 Thread Joseph W. Brown
Emmanuel and Klaus,

After Klaus reminded me that `which` was so inefficient I realized using it in 
finding the match in the first lineage is unnecessary. Instead, we can 
capitalize on the fact that shallower nodes have smaller nodeids. So I threw 
out the vector index `mrcaind` altogether:

getMRCA_new <- function(phy, tip) {
   if (!inherits(phy, "phylo"))
   stop('object "phy" is not of class "phylo"')
   if (length(tip) < 2) return(NULL)
   Ntip <- length(phy$tip.label)
   rootnd <- Ntip + 1L

   pars <- integer(phy$Nnode) # worst case assignment, usually far too long
   tnd <- if (is.character(tip)) match(tip, phy$tip.label) else tip

   ## build a lookup table to get parents faster
   pvec <- integer(Ntip + phy$Nnode)
   pvec[phy$edge[, 2]] <- phy$edge[, 1]

   ## get entire lineage for first tip
   nd <- tnd[1]
   for (k in 1:phy$Nnode) {
   nd <- pvec[nd]
   pars[k] <- nd
   if (nd == rootnd) break
   }
   pars <- pars[1:k] # delete the rest
   mrcand <- pars[1]; # keep track of actual node rather than index in pars
   
   ## traverse lineages for remaining tips, stop if hit common ancestor
   for (i in 2:length(tnd)) {
   done <- FALSE
   cnd <- tnd[i]
   while (!done) {
   cpar <- pvec[cnd] # get immediate parent
   if (cpar %in% pars) {
   if (cpar == rootnd) return(rootnd) # early exit
   mrcand <- min(mrcand, cpar); # shallowest node has lowest nodeid
   done <- TRUE
   }
   cnd <- cpar # keep going!
   }
   }
   mrcand
}

I also played around with compiling the function:

library(compiler);
getMRCA_em_cmp <- cmpfun(getMRCA_em);
getMRCA_new_cmp <- cmpfun(getMRCA_new);

Here are some timings (*_em is your code, *_new is the code above). For my 
original example (100K tips, 500 tips in MRCA query) there is not a whole lot 
of difference:

microbenchmark(getMRCA_em(phy, tips), getMRCA_new(phy, tips), 
getMRCA_em_cmp(phy, tips), getMRCA_new_cmp(phy, tips));
Unit: milliseconds
   expr  min   lq mean   median   uq   
max neval
  getMRCA_em(phy, tips) 23.99879 29.81964 33.67437 32.39149 35.61888 
136.34521   100
 getMRCA_new(phy, tips) 23.76941 28.02500 31.84285 31.37001 34.89640  
44.66380   100
  getMRCA_em_cmp(phy, tips) 20.81924 23.60118 28.23957 28.25202 30.18535  
59.72102   100
 getMRCA_new_cmp(phy, tips) 20.16588 23.19571 27.19610 26.65783 29.82149  
42.61702   100

However, for the worst-case scenario that Klaus devised the results are very 
impressive:

microbenchmark(getMRCA_em(tree, c(1,2)), getMRCA_new(tree, c(1,2)), 
getMRCA_em_cmp(tree, c(1,2)), getMRCA_new_cmp(tree, c(1,2)));
Unit: milliseconds
   expr   minlq  meanmedian
uq   max neval
  getMRCA_em(tree, c(1, 2)) 124.79040 140.44473 146.98292 145.39224 
150.96044 286.78335   100
 getMRCA_new(tree, c(1, 2)) 123.39924 139.41609 145.21311 143.92062 
148.34205 297.73846   100
  getMRCA_em_cmp(tree, c(1, 2))  23.72105  26.84944  29.94950  29.87935  
32.26490  41.99264   100
 getMRCA_new_cmp(tree, c(1, 2))  23.15223  25.74686  29.21526  29.25370  
30.80915  69.68522   100

So it might be worth compiling.

HTH.
JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 10 Jun, 2017, at 05:26, Emmanuel Paradis <emmanuel.para...@ird.fr> wrote:
> 
> Joseph, Klaus,
> 
> This is great. I slightly edited the code and renamed the argument 'tips' to 
> 'tip' (as in the original definition). I run on a bunch of random trees and 
> the results are exactly identical with the current version of getMRCA. I'm 
> changing the code in ape now.
> 
> Cheers,
> 
> Emmanuel
> 
> getMRCA <- function(phy, tip)
> {
>if (!inherits(phy, "phylo"))
>stop('object "phy" is not of class "phylo"')
>if (length(tip) < 2) return(NULL)
>Ntip <- length(phy$tip.label)
>rootnd <- Ntip + 1L
> 
>pars <- integer(phy$Nnode) # worst case assignment, usually far too long
>mrcaind <- 1L # index in pars of the mrca node. will return highest one 
> traversed by other lineages
>tnd <- if (is.character(tip)) match(tip, phy$tip.label) else tip
> 
>## build a lookup table to get parents faster
>pvec <- integer(Ntip + phy$Nnode)
>pvec[phy$edge[, 2]] <- phy$edge[, 1]
> 
>## get entire lineage for first tip
>nd <- tnd[1]
>for (k in 1:phy$Nnode) {
>nd <- pvec[nd]
>pars[k] <- nd
>if (nd == rootnd) break
>

Re: [R-sig-phylo] A possible alternate MRCA function to APE\'s getMRCA

2017-06-09 Thread Joseph W. Brown
Sweet! I considered preallocation, but figured it was not worth it. Guess I was 
wrong!

JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 9 Jun, 2017, at 18:28, Klaus Schliep <klaus.schl...@gmail.com> wrote:
> 
> Hi Joseph
> 
> I guess I figured out where some room for improvement was.
> You grow the vector pars <- c(pars, nd)
> I preallocated pars, but you could even just grow pars using pars[k] <- nd 
> without pre-allocation. This operation is now (with R >= 3.4) far less costly 
> as mentioned in the NEWS:
> "Assigning to an element of a vector beyond the current length now 
> over-allocates by a small fraction. The new vector is marked internally as 
> growable, and the true length of the new vector is stored in the truelength 
> field. This makes building up a vector result by assigning to the next 
> element beyond the current length more efficient, though pre-allocating is 
> still preferred. The implementation is subject to change and not intended to 
> be used in packages at this time." 
> 
> One of the many little improvements recently making R quite a bit faster and 
> easier.
> 
> So the worst case behavior was pretty bad 
> 
> tree <- stree(10, "right")
> system.time(get_mrca_fast(tree, c(1,2)))
>user  system elapsed 
>  13.424   0.012  13.440 
>  mrca
> [1] 19
> 
> with the changes it is much better:
> 
> system.time(mrca <- get_mrca_fast(tree, c(1,2)))
>user  system elapsed 
>   0.012   0.000   0.012 
> > mrca
> [1] 19
> 
> And here is the code:
> 
> get_mrca_fast <- function (phy, tips) {
> rootnd <- length(phy$tip.label) + 1;
> pars <- integer(phy$Nnode);  # worst case assignment, usually far too 
> long 
> mrcaind <- 1;  # index in pars of the mrca node. will return highest 
> one traversed by other lineages
> tnd <- NULL;   # node ids of tips
> if ("character" %in% class(tips)) {
> tnd <- match(tips, phy$tip.label);
> } else {
> tnd <- tips;
> }
> 
> # build a lookup table to get parents faster
> pvec <- integer(max(phy$edge));
> pvec[phy$edge[,2]] <- phy$edge[,1];
> 
> # get entire lineage for first tip
> nd <- tnd[1];
> for(k in 1:phy$Nnode){
> nd <- pvec[nd]
> pars[k] <- nd
> if (nd == rootnd) break
> }
> pars <- pars[1:k] #delete the rest
> 
> # traverse lineages for remaining tips, stop if hit common ancestor
> for (i in 2:length(tnd)) {
> done <- FALSE;
> cnd <- tnd[i];
> while (!done) {
> cpar <- pvec[cnd]; # get immediate parent
> if (cpar %in% pars) {
> if (cpar == rootnd) {
> # early exit! if the mrca of any set of taxa is the root 
> then we are done
> return(rootnd);
> }
> idx <- which(pars == cpar);
> if (idx > mrcaind) {
> mrcaind <- idx;
> }
> done <- TRUE;
> } else {
> # keep going!
> cnd <- cpar;
> }
> }
> }
> return(pars[mrcaind]);
> }
> 
> 
> Have a nice weekend, 
> Klaus
> 
> On Fri, Jun 9, 2017 at 3:59 PM, Joseph W. Brown <josep...@umich.edu 
> <mailto:josep...@umich.edu>> wrote:
> Liam,
> 
> I put the (updated) code up as a gist 
> <https://gist.github.com/josephwb/ad61fd29ed4fb06e712e23d67422c813 
> <https://gist.github.com/josephwb/ad61fd29ed4fb06e712e23d67422c813>>. Feel 
> free to use/improve/whatever. Emmanuel sees room for improvement but frankly 
> taking 0.3 seconds to find the MRCA of 5000 taxa on a million-tip tree is 
> good enough for my purposes.
> 
> JWB
> 
> Joseph W. Brown
> Post-doctoral Researcher, Smith Laboratory
> University of Michigan
> Department of Ecology & Evolutionary Biology
> Room 2071, Kraus Natural Sciences Building
> Ann Arbor MI 48109-1079
> josep...@umich.edu <mailto:josep...@umich.edu>
> 
> 
> 
> > On 9 Jun, 2017, at 15:20, Liam J. Revell <liam.rev...@umb.edu 
> > <mailto:liam.rev...@umb.edu>> wrote:
> >
> > On the other hand, phytools does have a function - the somewhat imprecisely 
> > named fastMRCA - which can find

Re: [R-sig-phylo] A possible alternate MRCA function to APE\'s getMRCA

2017-06-09 Thread Joseph W. Brown
Liam,

I put the (updated) code up as a gist 
<https://gist.github.com/josephwb/ad61fd29ed4fb06e712e23d67422c813>. Feel free 
to use/improve/whatever. Emmanuel sees room for improvement but frankly taking 
0.3 seconds to find the MRCA of 5000 taxa on a million-tip tree is good enough 
for my purposes.

JWB
____
Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 9 Jun, 2017, at 15:20, Liam J. Revell <liam.rev...@umb.edu> wrote:
> 
> On the other hand, phytools does have a function - the somewhat imprecisely 
> named fastMRCA - which can find the MRCA of just a pair of species much 
> faster than getMRCA (however still slower than or only about as fast as 
> Joseph & Klaus's solutions).
> 
> Liam J. Revell, Associate Professor of Biology
> University of Massachusetts Boston
> web: http://faculty.umb.edu/liam.revell/
> email: liam.rev...@umb.edu
> blog: http://blog.phytools.org
> 
> On 6/9/2017 5:22 AM, Liam J. Revell wrote:
>> 
>> Juan. findMRCA was written before getMRCA existed, but the latter was
>> faster so now it just calls getMRCA internally. All the best, Liam
>> 
>> Liam J. Revell, Associate Professor of Biology
>> University of Massachusetts Boston
>> web: http://faculty.umb.edu/liam.revell/
>> email: liam.rev...@umb.edu
>> blog: http://blog.phytools.org
>> 
>> On 6/9/2017 1:57 AM, Juan Antonio Balbuena wrote:
>>> Package phytools includes a function, findMRCA, that is supposed to work
>>> very efficiently with large trees. you may wish to compare it with your
>>> function.
>>> 
>>> Cheers
>>> 
>>> Juan
>>> 
>>> --
>>> 
>>> Dr. Juan A. Balbuena
>>> Cavanilles Institute of Biodiversity and Evolutionary Biology
>>> University of Valencia
>>> http://www.uv.es/~balbuena <http://www.uv.es/%7Ebalbuena>
>>> P.O. Box 22085
>>> http://www.uv.es/cophylpaco
>>> <http://www.uv.es/cavanilles/zoomarin/index.htm>
>>> 46071 Valencia, Spain
>>> e-mail: j.a.balbu...@uv.es <mailto:j.a.balbu...@uv.es>tel. +34 963
>>> 543 658fax +34 963 543 733
>>> 
>>> *NOTE!*For shipments by EXPRESS COURIER use the following street address:
>>> C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.
>>> 
>>> 
>>> 
>>> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=emailclient>
>>> 
>>>  Libre de virus. www.avast.com
>>> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=emailclient>
>>> 
>>> 
>>> 
>>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>> 
>>> 
>>> ___
>>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>> Searchable archive at
>>> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>>> 
>> 
>> ___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at
>> http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] A possible alternate MRCA function to APE's getMRCA

2017-06-07 Thread Joseph W. Brown
w00t!

I am aware that `which` is inefficient, but wasn't sure how to get around it in 
this instance. Thanks!

It would be great to have this in the new version!

Joseph.

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 7 Jun, 2017, at 19:06, Klaus Schliep <klaus.schl...@gmail.com> wrote:
> 
> Hi Joseph, 
> 
> just a few changes on your code and it is actually really fast. Having 
> multiple calls to which() is often slow.  
> Replacing which() with a simple lookup table will speed your code up quite a 
> bit as you can see below. 
> This is actually how the function Ancestors() in phangorn works. 
> But let's start with the output from profiling your code and you can see that 
> alt_mrca() spends most of the time in which(). 
> 
> Rprof(tmp <- tempfile())
> nd.alt <- alt_mrca(phy, tips)
> Rprof()
> summaryRprof(tmp)
> unlink(tmp)
> 
> $by.self
> self.time self.pct total.time total.pct
> "which"  1.56  100   1.56   100
> 
> $by.total
>total.time total.pct self.time self.pct
> "which"  1.56   100  1.56  100
> "alt_mrca"   1.56   100  0.000
> 
> $sample.interval
> [1] 0.02
> 
> $sampling.time
> [1] 1.56
> 
> Now here are some timings in comparison this version with one where I 
> replaced calls to which() with a lookup table:
> 
> # ALT
> system.time(nd.alt <- alt_mrca(phy, tips)); nd.alt;
>user  system elapsed 
>   1.552   0.004   1.555 
> [1] 13
> > 
> # ALT FAST
> system.time(nd.alt <- alt_mrca_fast(phy, tips)); nd.alt;
>user  system elapsed 
>   0.004   0.000   0.004 
> [1] 13 
> 
> Not too bad for some pure R code. 
> And now you really need to start thinking of another excuse for a coffee 
> break. 
> 
> Cheers, 
> Klaus
> 
> PS: Emmanuel, just another function which could get into next version. 
> PPS: The plot should better have a logarithmic scale for time.
> PPPS: Here is the improved code, just 2 additional lines.
> 
> 
> alt_mrca_fast <- function (phy, tips) {
> rootnd <- length(phy$tip.label) + 1;
> pars <- numeric(); # parents in lineage of first queried tip
> mrcaind <- 1;  # index in pars of the mrca node
> tnd <- NULL;   # node ids of tips
> if ("character" %in% class(tips)) {
> tnd <- match(tips, phy$tip.label);
> } else {
> tnd <- tips;
> }
> 
> # build a lookup table to get parents faster
> pvec <- integer(max(phy$edge))
> pvec[phy$edge[,2]] = phy$edge[,1]
> 
> # get entire lineage for first tip
> nd <- tnd[1];
> repeat {
> #nd <- phy$edge[which(phy$edge[,2] == nd),1]; # get immediate parent
> nd <- pvec[nd]
> pars <- c(pars, nd);
> if (nd == rootnd) {
> break;
> }
> }
> 
> 
> # traverse lineages for remaining tips, stop if hit common ancestor
> for (i in 2:length(tnd)) {
> done <- FALSE;
> cnd <- tnd[i];
> while (!done) {
> #cpar <- phy$edge[which(phy$edge[,2] == cnd),1]; # get immediate 
> parent
> cpar <- pvec[cnd]
> if (cpar %in% pars) {
> if (cpar == rootnd) {
> # early exit! if the mrca of any set of taxa is the root 
> then we are done
> return(rootnd);
> }
> idx <- which(pars == cpar);
> if (idx > mrcaind) {
> mrcaind <- idx;
> }
> done <- TRUE;
> } else {
> # keep going!
> cnd <- cpar;
> }
> }
> }
> return(pars[mrcaind]);
> }
> 
> 
> 
> 
> On Wed, Jun 7, 2017 at 4:24 PM, Joseph W. Brown <josep...@umich.edu 
> <mailto:josep...@umich.edu>> wrote:
> I've been working with some very large trees (tens to hundreds of thousands 
> of tips) where the speed of APE's getMRCA function has been prohibitively 
> slow. I therefore R-ified an MRCA function I developed in Java for use with 
> the Open Tree of Life tree (~3 million tips). I don't pretend to be an 
> algorithm guy, but this seems to work.
> 
> Given a tree `phy` and a set of query taxa `tips`, here is the algorithm 
> (note that the terminology has the root at the top of th

[R-sig-phylo] A possible alternate MRCA function to APE's getMRCA

2017-06-07 Thread Joseph W. Brown
I've been working with some very large trees (tens to hundreds of thousands of 
tips) where the speed of APE's getMRCA function has been prohibitively slow. I 
therefore R-ified an MRCA function I developed in Java for use with the Open 
Tree of Life tree (~3 million tips). I don't pretend to be an algorithm guy, 
but this seems to work.

Given a tree `phy` and a set of query taxa `tips`, here is the algorithm (note 
that the terminology has the root at the top of the tree):
1) For the first tip, drill up through the tree, recording its entire lineage 
(parental nodes) to the root in a vector. This seems wasteful (i.e., if the 
MRCA is not the root) but in practice is actually fast. Call this vector `pars`.
2) For each subsequent tip, traverse its lineage up in the tree until 
encountering an ancestor in `pars`. 
3) Return the highest indexed node in `pars` that was traversed by the other 
lineages.

A convenient "early exit" of this approach is that if any of the MRCAs are the 
root then we can stop immediately. Another benefit with this is that the query 
nodes do not necessarily have to be terminal taxa: MRCAs can be found among 
terminal and internal nodes.

So, how does it fare? Pretty good! Here are some results on a 100K tip tree 
(the tree I am actually working with has 350K tips).

# set up example
set.seed(13);
phy <- rtree(10);
node <- 13; # make sure MRCA is not the root
tt <- unlist(Descendants(phy, 13));
set.seed(13);
tips <- sample(tt, 50);

# APE
system.time(nd.ape <- getMRCA(phy, tips)); nd.ape;
   user  system elapsed 
 14.459   0.066  14.482 
[1] 13

# ALT
system.time(nd.alt <- alt_mrca(phy, tips)); nd.alt;
   user  system elapsed 
  2.557   0.534   3.063 
[1] 13

Ok, that was for a shallow MRCA node (i.e. close to the root). How does it fare 
for recent MRCAs? Here is a timing for sister tips:

# APE
system.time(nd.ape <- getMRCA(phy, c(13,14))); nd.ape;
   user  system elapsed 
 14.715   0.063  14.727 
[1] 100027

# ALT
system.time(nd.alt <- alt_mrca(phy, c(13,14))); nd.alt;
   user  system elapsed 
  0.047   0.015   0.061
[1] 100027

Way faster! Getting the entire lineage of the first tip does not seem to hurt 
us.

We can also look how the timing scales as the size of the query set changes 
(I've made sure that the MRCA is identical in every case):

(APE is blue, alt_mrca is red). So the APE version seems insensitive to the 
number of tips query. I suppose there must be a point where the timings cross 
and the alt_mrca function takes more time, but I have not found it. Besides, if 
the query set becomes large enough then the MRCA is likely the root itself, and 
in that case we may exit early:

# APE with just 2 tips
system.time(nd.ape <- getMRCA(phy, c(1,10))); nd.ape;
   user  system elapsed 
 14.627   0.055  14.642 
[1] 11

# ALT with a random 500 tips
system.time(nd.alt <- alt_mrca(phy, sample(1:10, 500))); nd.alt;
   user  system elapsed 
  0.292   0.058   0.346 
[1] 11

Not bad. The code for the alternate MRCA is below. I've considered lapply-ing 
this (i.e. across query nodes), but that would negate the potential early exit 
(as all tip-to-root lineages would have to be traversed). Note that in all of 
these timings above the APE getMRCA function uses C code but the alternative is 
pure R. I imagine the alt_mrca could be made even speedier if coded in C.

HTH.
Joseph.

alt_mrca <- function (phy, tips) {
rootnd <- length(phy$tip.label) + 1;
pars <- numeric(); # parents in lineage of first queried tip
mrcaind <- 1;  # index in pars of the mrca node
tnd <- NULL;   # node ids of tips
if ("character" %in% class(tips)) {
tnd <- match(tips, phy$tip.label);
} else {
tnd <- tips;
}

# get entire lineage for first tip
nd <- tnd[1];
repeat {
nd <- phy$edge[which(phy$edge[,2] == nd),1]; # get immediate parent
pars <- c(pars, nd);
if (nd == rootnd) {
break;
}
}

# traverse lineages for remaining tips, stop if hit common ancestor
for (i in 2:length(tnd)) {
done <- FALSE;
cnd <- tnd[i];
while (!done) {
cpar <- phy$edge[which(phy$edge[,2] == cnd),1]; # get immediate 
parent
if (cpar %in% pars) {
if (cpar == rootnd) {
# early exit! if the mrca of any set of taxa is the root 
then we are done
return(rootnd);
}
idx <- which(pars == cpar);
if (idx > mrcaind) {
mrcaind <- idx;
}
done <- TRUE;
} else {
# keep going!
cnd <- cpar;
        }
}
}
return(pars[mrcaind]);
}


Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
Univer

Re: [R-sig-phylo] dating cladograms in R

2017-05-17 Thread Joseph W. Brown
Fabio you might have luck with Brian O'Meara's datelife: 
https://github.com/phylotastic/datelife 
<https://github.com/phylotastic/datelife> Essentially it grabs dates from 
published trees and applies it to your own.

Brian can provide more details. (^_<)〜☆

HTH.
JWB
____
Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 17 May, 2017, at 08:12, Fábio Machado <macfa...@gmail.com> wrote:
> 
> Dear community,
> 
> I was wondering if there is any methodology implemented in R to date 
> cladograms, either using only the tree topology or using additional molecular 
> data. If not, what would be the best solution to obtain a dated phylogeny, 
> given that I have only a tree topology to work with.
> 
> Best regards,
> 
> Fabio Andrade Machado
> Becario Postdoctoral - CONICET
> Museo Argentino de Ciencias Naturales “Bernardino Rivadavia”
> División de Mastozoología, 
> Buenos Aires, Av. Ángel Gallardo 470 (C1405DJR)
> Argentina
> f.mach...@usp.br <mailto:f.mach...@usp.br> ; macfa...@gmail.com 
> <mailto:macfa...@gmail.com> 
> +55 11 982631029
> skype: fabio_a_machado
> 
> Lattes: http://lattes.cnpq.br/3673327633303737 
> <http://lattes.cnpq.br/3673327633303737>
> Google Scholar: http://scholar.google.com/citations?hl=en=2l6-VrQJ 
> <http://scholar.google.com/citations?hl=en=2l6-VrQJ>
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Extracting a clade for LTT

2017-05-08 Thread Joseph W. Brown
Hi David.

Sounds like you want drop.tip from ape to get rid of the outgroup. You need 
only pass in the tree and the outgroup(s) name(s).

extract.clade will so something similar, but you need to calculate the mrca for 
each tree first to pass in the node id.

HHT.
JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 8 May, 2017, at 07:31, David Buckley <dbuck...@mncn.csic.es> wrote:
> 
> Probably a very easy question, but I could’t find a straightforward answer 
> for it…
> I have a posterior distribution of trees all rooted with a 
> not-so-closely-related outgroup. I’d like to perform some diversification 
> analyses (LTT, etc.) just for the ingroup, obviating the outgroup. Is there 
> an easy way to extract the ingroup (or remove the outgroup) from all the 
> trees before performing the analyses? I have tried to do it ‘manually’, 
> deleting the outgroup taxa in Mesquite, but it looks to me that the LTT plots 
> are still considering some branch lengths from the ingroup to the outgroup 
> clade (kind of considering a stem-origin, not a crown-origin for the ingroup 
> clade…).
> 
> Sorry if the question is too naïve…
> 
> best
> 
> david
> 
> 
> 
> David Buckley
> Dpt. Biodiversity and Evolutionary Biology
> Museo Nacional de Ciencias Naturales, MNCN-CSIC
> c/José Gutiérrez Abascal 2
> 28006-Madrid
> Spain
> Phone: +34 91 411 13 28 ext. 1126
> dbuck...@mncn.csic.es
> https://www.researchgate.net/profile/David_Buckley4 
> <https://www.researchgate.net/profile/David_Buckley4>
> http://scholar.google.com/citations?user=qEFTmfkJ=en 
> <http://scholar.google.com/citations?user=qEFTmfkJ=en>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Extract all possible clades from a tree

2017-01-05 Thread Joseph W. Brown
Hmm. Maybe something wonky with your tree? I simulated a 17-tip tree and tried 
things out:

phy <- rtree(17);
rootID <- length(phy$tip.label) + 1;
counter <- 1;
for (i in rootID:max(phy$edge[,1])) {
  clade <- extract.clade(phy, i);
  # do something. just printing clade properties here
  print(paste0("clade ", counter, " (node ", i, ") has ", 
length(clade$tip.label), " tips."));
  counter <- counter + 1;
}

[1] "clade 1 (node 18) has 17 tips."
[1] "clade 2 (node 19) has 11 tips."
[1] "clade 3 (node 20) has 10 tips."
[1] "clade 4 (node 21) has 4 tips."
[1] "clade 5 (node 22) has 3 tips."
[1] "clade 6 (node 23) has 2 tips."
[1] "clade 7 (node 24) has 6 tips."
[1] "clade 8 (node 25) has 5 tips."
[1] "clade 9 (node 26) has 3 tips."
[1] "clade 10 (node 27) has 2 tips."
[1] "clade 11 (node 28) has 2 tips."
[1] "clade 12 (node 29) has 6 tips."
[1] "clade 13 (node 30) has 2 tips."
[1] "clade 14 (node 31) has 4 tips."
[1] "clade 15 (node 32) has 3 tips."
[1] "clade 16 (node 33) has 2 tips."

and had no problem. Maybe post your tree here?

JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 5 Jan, 2017, at 17:18, Kamila Naxerova <knaxer...@partners.org> wrote:
> 
> Dear Joseph,
> 
> thanks so much. This is exactly what I need!
> 
> I am running into some problems that I don’t understand though. In my case, 
> rootID is 18, and max(phy$edge[,1]) is 33. When I try to execute your loop, 
> this happens:
> 
> > extract.clade(phy, 18)
> 
> Phylogenetic tree with 17 tips and 16 internal nodes.
> 
> Tip labels:
>   X1, X8, X9, X10 ...
> 
> Rooted; includes branch lengths.
> 
> So far so good… but then I keep getting these errors:
> 
> > extract.clade(phy, 19)
> Error in phy$edge[, 2] : incorrect number of dimensions
> > extract.clade(phy, 20)
> Error in phy$edge[, 2] : incorrect number of dimensions
> 
> Not sure why extract.clade produces these errors. 19-23 don’t work, 24-26 
> work, 27 produces the error again, 28 works etc. 
> 
> Thanks again. 
> 
> Kamila
> 
> 
>> On Jan 5, 2017, at 4:12 PM, Joseph W. Brown <josep...@umich.edu 
>> <mailto:josep...@umich.edu>> wrote:
>> 
>> Not sure if I understand the problem completely, but this should allow you 
>> to examine all of the clades (and should work if polytomies are involved):
>> 
>> # for tree phy
>> rootID <- length(phy$tip.label) + 1;
>> for (i in rootID:max(phy$edge[,1])) {
>>   clade <- extract.clade(phy, i);
>>   # do something
>> }
>> 
>> This includes the root node (i.e. whole tree), but that can be changed. This 
>> can be rewritten as an lapply if necessary.
>> 
>> HTH.
>> JWB
>> 
>> Joseph W. Brown
>> Post-doctoral Researcher, Smith Laboratory
>> University of Michigan
>> Department of Ecology & Evolutionary Biology
>> Room 2071, Kraus Natural Sciences Building
>> Ann Arbor MI 48109-1079
>> josep...@umich.edu <mailto:josep...@umich.edu>
>> 
>> 
>> 
>>> On 5 Jan, 2017, at 15:50, Kamila Naxerova <knaxer...@partners.org 
>>> <mailto:knaxer...@partners.org>> wrote:
>>> 
>>> Hi all, 
>>> 
>>> I would like to break a phylogenetic tree into all possible clades and then 
>>> examine each one of them for certain characteristics.
>>> 
>>> I am writing some code to do this from scratch, but it’s getting pretty 
>>> cumbersome quickly. 
>>> 
>>> I was wondering whether there are some functions out there already that 
>>> could help me with this task?
>>> 
>>> Thanks so much for any help.
>>> 
>>> Cheers,
>>> Kamila
>>> 
>>> 
>>> The information in this e-mail is intended only for the person to whom it is
>>> addressed. If you believe this e-mail was sent to you in error and the 
>>> e-mail
>>> contains patient information, please contact the Partners Compliance 
>>> HelpLine at
>>> http://www.partners.org/complianceline 
>>> <http://www.partners.org/complianceline> . If the e-mail was sent to you in 
>>> error
>>> but does not contain patient information, please contact the sender and 
>>> properly
>>> dispose of the e-mail.
>>> ___
>>> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
>>> <mailto:R-sig-phylo@r-project.org>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
>>> Searchable archive at 
>>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
>>> <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
> 


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Extract all possible clades from a tree

2017-01-05 Thread Joseph W. Brown
Not sure if I understand the problem completely, but this should allow you to 
examine all of the clades (and should work if polytomies are involved):

# for tree phy
rootID <- length(phy$tip.label) + 1;
for (i in rootID:max(phy$edge[,1])) {
  clade <- extract.clade(phy, i);
  # do something
}

This includes the root node (i.e. whole tree), but that can be changed. This 
can be rewritten as an lapply if necessary.

HTH.
JWB
____
Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 5 Jan, 2017, at 15:50, Kamila Naxerova <knaxer...@partners.org> wrote:
> 
> Hi all, 
> 
> I would like to break a phylogenetic tree into all possible clades and then 
> examine each one of them for certain characteristics.
> 
> I am writing some code to do this from scratch, but it’s getting pretty 
> cumbersome quickly. 
> 
> I was wondering whether there are some functions out there already that could 
> help me with this task?
> 
> Thanks so much for any help.
> 
> Cheers,
> Kamila
> 
> 
> The information in this e-mail is intended only for th...{{dropped:21}}

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Joseph W. Brown
I wonder if reading in a Nexus file with a translation table bypasses this 
problem?

JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 14 Dec, 2016, at 10:16, Yan Wong <y...@yanwong.me> wrote:
> 
> Hi,
> 
> I’m reading in a large number of newick trees with the same tips, all from a 
> single file. If I do trees<-read.trees() followed by trees <- 
> .compressTipLabel(trees), it reduces the memory footprint well, but takes an 
> age to run. I can’t help thinking this could be sped up during the reading 
> process by passing an option to read.trees() to specify that the tip labels 
> are the same in each tree in the multiPhylo object. Has anyone implemented 
> such an option?
> 
> Cheers
> 
> Yan
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] 99 Ways to ruin an open source project

2016-11-03 Thread Joseph W. Brown
Here is a cheeky post <http://opensoul.org/2015/07/22/99ways/> by Brandon 
Keepers on how to do things wrong when developing open source software. Lots of 
good stuff in there!

JWB
____
Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Anyone knows how to concatenate aligned genes sequences so as to create whole genome alignments?

2016-09-12 Thread Joseph W. Brown
We have a non-R tool that can do the job: https://github.com/FePhyFoFum/phyx

After compiling, command is (assuming fasta  (extension ".fas") input, but any 
input format will work):

./pxcat -s *.fas -o my_concatenated_alignment.fas -p partition_info.txt

(The partition_info.txt logs how sites/partitions are ordered).

If you need these in out another format (say, phylip), do:

./pxcat -s *.fas -o my_concatenated_alignment.fas | ./pxs2phy -o 
my_concatenated_alignment.phy

HTH.
JWB
________
Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 12 Sep, 2016, at 06:41, Bhuller, Ravneet 
> <ravneet.bhulle...@imperial.ac.uk> wrote:
> 
> Dear Members,
> 
> Any suggestions on how to concatenate the aligned gene sequences in fasta 
> format so as to get whole genome alignments?
> I need whole genome alignments as an input to a phylogenetic tool.
> 
> Many thanks,
> 
> Rav
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Ultrametric tree not recognized

2016-08-16 Thread Joseph W. Brown
Comparing floating point numbers is (at least for me) pretty scary when you get 
down into it.

Here are some more accessible treatments (although Goldberg’s paper seems to be 
the definitive treatment on the subject):

What Every Programmer Should Know About Floating-Point Arithmetic: 
http://floating-point-gui.de

Comparing Floating Point Numbers, 2012 Edition: 
https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

HTH.
JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 16 Aug, 2016, at 12:15, Klaus Schliep <klaus.schl...@gmail.com> wrote:
> 
> We should consider to improve is.ultrametric to use as tolerance a certain 
> number significant figures (e.g. 8-10 digits). This way tol would only depend 
> on the tip-to-root distance and not a machine dependent minimal value. I find 
> the use of significant figures suits really well as a stopping criterion for 
> example in my ML functions, especially as it works for small and large trees 
> equally well. 
> 
> write.tree has an option to define the number of digits, it will be generally 
> less than the internal representation and I suspect one could possible export 
> with higher precision from other programs, but will result in larger file 
> sizes which one has to put in consideration for MCMC / bootstrap tree files.  
>   
> As a side note on the developer site of the r-project.org 
> <http://r-project.org/> homepage (not CRAN) is a link to this nice ancient 
> paper (http://www.validlab.com/goldberg/paper.pdf 
> <http://www.validlab.com/goldberg/paper.pdf>)
> 
> Regards, 
> Klaus
> 
> 
> PS: And Martin of course I am happy with Liam's solution as it uses phangorn 
> ;)
> 
> On Tue, Aug 16, 2016 at 10:22 AM, Joseph W. Brown <josep...@umich.edu 
> <mailto:josep...@umich.edu>> wrote:
> That’s a good tip, but the issue here is that many packages use 
> is.ultrametric internally with a hard-coded (likely, default) value of tol. 
> In this case, Martin simply cannot perform the analyses he wants to run 
> because this option is inaccessible.
> 
> JWB
> 
> Joseph W. Brown
> Post-doctoral Researcher, Smith Laboratory
> University of Michigan
> Department of Ecology & Evolutionary Biology
> Room 2071, Kraus Natural Sciences Building
> Ann Arbor MI 48109-1079
> josep...@umich.edu <mailto:josep...@umich.edu>
> 
> 
> 
>> On 16 Aug, 2016, at 10:15, Klaus Schliep <klaus.schl...@gmail.com 
>> <mailto:klaus.schl...@gmail.com>> wrote:
>> 
>> Hello all,
>> this may come be surprising to many, but consulting the manual 
>> ?is.ultrametric can be helpful. Why not simply try e.g. is.ultrametric(tree, 
>> tol=.01) ????
>> So in this sense RTFM 
>> Regards,
>> Klaus
>> 
>> 
>> On Aug 16, 2016 9:31 AM, "Martin Dohrmann" <m.dohrm...@lrz.uni-muenchen.de 
>> <mailto:m.dohrm...@lrz.uni-muenchen.de>> wrote:
>> 
>> Am 16.08.2016 um 15:20 schrieb Joseph W. Brown:
>> 
>> > I agree that it is almost certainly numerical precision, as rescaling the 
>> > tree fixes things:
>> >
>> > > is.ultrametric(phy);
>> > [1] FALSE
>> > > fie <- phy;
>> > > fie$edge.length <- fie$edge.length * 0.1;
>> > > is.ultrametric(fie);
>> > [1] TRUE
>> >
>> > I’ve also tested the ultrametricity(?) with a non-R program and the 
>> > results are the same. So can we assume this is not a PhyloBayes issue, and 
>> > rather an unavoidable problem with large, old trees? But the fact that 
>> > MrBayes trees are ultrametric is confusing; does MrBayes use less precise 
>> > edge lengths?
>> 
>> Actually the MrBayes branch lengths appear to be MORE precise.
>> 
>> Martin
>> 
>> >>
>> >> On 16 Aug, 2016, at 08:53, Liam J. Revell <liam.rev...@umb.edu 
>> >> <mailto:liam.rev...@umb.edu>> wrote:
>> >>
>> >> Hi Martin.
>> >>
>> >> Since you are writing & reading trees to file, my guess is that it has to 
>> >> do with numerical precision - that is, the rounding of your edge lengths 
>> >> when they are written to file.
>> >>
>> >> Does your tree look ultrametric when plotted in R? If so, this is 
>> >> probably the case.
>> >>
>> >> My recommendation is that y

Re: [R-sig-phylo] Ultrametric tree not recognized

2016-08-16 Thread Joseph W. Brown
That’s a good tip, but the issue here is that many packages use is.ultrametric 
internally with a hard-coded (likely, default) value of tol. In this case, 
Martin simply cannot perform the analyses he wants to run because this option 
is inaccessible.

JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 16 Aug, 2016, at 10:15, Klaus Schliep <klaus.schl...@gmail.com> wrote:
> 
> Hello all,
> this may come be surprising to many, but consulting the manual 
> ?is.ultrametric can be helpful. Why not simply try e.g. is.ultrametric(tree, 
> tol=.01) 
> So in this sense RTFM 
> Regards,
> Klaus
> 
> 
> On Aug 16, 2016 9:31 AM, "Martin Dohrmann" <m.dohrm...@lrz.uni-muenchen.de 
> <mailto:m.dohrm...@lrz.uni-muenchen.de>> wrote:
> 
> Am 16.08.2016 um 15:20 schrieb Joseph W. Brown:
> 
> > I agree that it is almost certainly numerical precision, as rescaling the 
> > tree fixes things:
> >
> > > is.ultrametric(phy);
> > [1] FALSE
> > > fie <- phy;
> > > fie$edge.length <- fie$edge.length * 0.1;
> > > is.ultrametric(fie);
> > [1] TRUE
> >
> > I’ve also tested the ultrametricity(?) with a non-R program and the results 
> > are the same. So can we assume this is not a PhyloBayes issue, and rather 
> > an unavoidable problem with large, old trees? But the fact that MrBayes 
> > trees are ultrametric is confusing; does MrBayes use less precise edge 
> > lengths?
> 
> Actually the MrBayes branch lengths appear to be MORE precise.
> 
> Martin
> 
> >>
> >> On 16 Aug, 2016, at 08:53, Liam J. Revell <liam.rev...@umb.edu 
> >> <mailto:liam.rev...@umb.edu>> wrote:
> >>
> >> Hi Martin.
> >>
> >> Since you are writing & reading trees to file, my guess is that it has to 
> >> do with numerical precision - that is, the rounding of your edge lengths 
> >> when they are written to file.
> >>
> >> Does your tree look ultrametric when plotted in R? If so, this is probably 
> >> the case.
> >>
> >> My recommendation is that you use phangorn to compute the non-negative 
> >> least-squares edge lengths with the condition that the tree is 
> >> ultrametric. This will give you the edge lengths that result in the 
> >> distances between taxa with minimum sum of squared differences from the 
> >> distances implied by your input tree, under the criterion that the 
> >> resulting tree is ultrametric.
> >>
> >> To do this you need to merely run:
> >>
> >> library(phytools)
> >> library(phangorn)
> >> is.ultrametric(tree) ## fails
> >> plotTree(tree,ftype="off") ## does my tree look ultrametric?
> >> nnls<-nnls.tree(cophenetic(tree),tree,rooted=TRUE)
> >> is.ultrametric(tree) ## should pass
> >>
> >> Let us know if this works. All the best, Liam
> >>
> >> Liam J. Revell, Associate Professor of Biology
> >> University of Massachusetts Boston
> >> web: http://faculty.umb.edu/liam.revell/ 
> >> <http://faculty.umb.edu/liam.revell/>
> >> email: liam.rev...@umb.edu <mailto:liam.rev...@umb.edu>
> >> blog: http://blog.phytools.org <http://blog.phytools.org/>
> >>
> >> On 8/16/2016 6:41 AM, Martin Dohrmann wrote:
> >>> Hi,
> >>> I want to do some diversification pattern analyses with various R 
> >>> packages. I'm using a time-calibrated tree produced by PhyloBayes. Branch 
> >>> lengths are in millions of years and all taxa are extant, so the tree 
> >>> should be ultrametric. However, when I call "is.ultrametric", it returns 
> >>> "FALSE".
> >>>
> >>> Has anybody encountered something like this? Any ideas about what's going 
> >>> on/how to solve this?
> >>>
> >>> Some further information:
> >>> I also tried other PhyloBayes time trees, with the same result. In 
> >>> contrast, MrBayes time trees I tried for comparison are recognized as 
> >>> ultrametric. Regarding my diversification analyses, TESS would not run, 
> >>> telling me "The likelihood function is only defined for ultrametric 
> >>> trees!". On the other hand, BAMMtools doesn't seem to have a problem with 
> >>> my tree. I haven't tried other packages yet, 

Re: [R-sig-phylo] Ultrametric tree not recognized

2016-08-16 Thread Joseph W. Brown
I agree that it is almost certainly numerical precision, as rescaling the tree 
fixes things:

> is.ultrametric(phy);
[1] FALSE
> fie <- phy;
> fie$edge.length <- fie$edge.length * 0.1;
> is.ultrametric(fie);
[1] TRUE

I’ve also tested the ultrametricity(?) with a non-R program and the results are 
the same. So can we assume this is not a PhyloBayes issue, and rather an 
unavoidable problem with large, old trees? But the fact that MrBayes trees are 
ultrametric is confusing; does MrBayes use less precise edge lengths?

JWB
____________
Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 16 Aug, 2016, at 08:53, Liam J. Revell <liam.rev...@umb.edu> wrote:
> 
> Hi Martin.
> 
> Since you are writing & reading trees to file, my guess is that it has to do 
> with numerical precision - that is, the rounding of your edge lengths when 
> they are written to file.
> 
> Does your tree look ultrametric when plotted in R? If so, this is probably 
> the case.
> 
> My recommendation is that you use phangorn to compute the non-negative 
> least-squares edge lengths with the condition that the tree is ultrametric. 
> This will give you the edge lengths that result in the distances between taxa 
> with minimum sum of squared differences from the distances implied by your 
> input tree, under the criterion that the resulting tree is ultrametric.
> 
> To do this you need to merely run:
> 
> library(phytools)
> library(phangorn)
> is.ultrametric(tree) ## fails
> plotTree(tree,ftype="off") ## does my tree look ultrametric?
> nnls<-nnls.tree(cophenetic(tree),tree,rooted=TRUE)
> is.ultrametric(tree) ## should pass
> 
> Let us know if this works. All the best, Liam
> 
> Liam J. Revell, Associate Professor of Biology
> University of Massachusetts Boston
> web: http://faculty.umb.edu/liam.revell/
> email: liam.rev...@umb.edu
> blog: http://blog.phytools.org
> 
> On 8/16/2016 6:41 AM, Martin Dohrmann wrote:
>> Hi,
>> I want to do some diversification pattern analyses with various R packages. 
>> I'm using a time-calibrated tree produced by PhyloBayes. Branch lengths are 
>> in millions of years and all taxa are extant, so the tree should be 
>> ultrametric. However, when I call "is.ultrametric", it returns "FALSE".
>> 
>> Has anybody encountered something like this? Any ideas about what's going 
>> on/how to solve this?
>> 
>> Some further information:
>> I also tried other PhyloBayes time trees, with the same result. In contrast, 
>> MrBayes time trees I tried for comparison are recognized as ultrametric. 
>> Regarding my diversification analyses, TESS would not run, telling me "The 
>> likelihood function is only defined for ultrametric trees!". On the other 
>> hand, BAMMtools doesn't seem to have a problem with my tree. I haven't tried 
>> other packages yet, but I suspect RPANDA, TreePar etc. might also have 
>> issues if they don't recognize my tree as ultrametric.
>> 
>> I'd appreciate any help!
>> 
>> Best wishes,
>> Martin
>> 
>> Dr. Martin Dohrmann
>> Ludwig-Maximilians-University Munich
>> Dept. of Earth & Environmental Sciences
>> Palaeontology & Geobiology
>> Molecular Geo- & Palaeobiology Lab
>> Richard-Wagner-Str. 10
>> 80333 Munich, Germany
>> Phone: +49-(0)89-2180-6593
>> 
>> 
>>  [[alternative HTML version deleted]]
>> 
>> ___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
>> 
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] chronos nlminb error

2014-12-31 Thread Joseph W. Brown
Franz,

Yes, Mac has changed enough recently that installing the dependencies can be a 
horrific hassle (specifically: the default use of clang instead of gcc). 

A couple of options:

1. Install dependencies with homebrew http://brew.sh/.
2. Override the default use of clang, and instead use gcc.
3. If all else fails, and it is not too much trouble, use a virtual box, and 
install/use treePL in some flavour of linux. This should work flawlessly.

HTH.
JWB.

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology  Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



 On 31 Dec, 2014, at 11:34, Franz Krah f.k...@mailbox.org wrote:
 
 Thanks a lot Brian!
 Took me the whole day but finally treePL works fine although there were some
 difficulties with Yosemite and the version of c++.
 
 The tree - reorder(tree) trick didn't do the trick.
 
 Cheers,
 Franz
 
 
 
 Brian O'Meara omeara.br...@gmail.com hat am 30. Dezember 2014 um 19:56
 geschrieben:
 
 
 The programs r8s, pathd8, and treepl can all take a phylogram and return a
 chronogram, but they're not within R (though they could be wrapped). You
 could also infer a chronogram from the start using Beast or MrBayes, though
 the large size of the tree may make it difficult.
 
 One simple thing to try before all this, though, is pass your tree through
 a reorder() function call (using default arguments) before passing it to
 chronos: (i.e.,
 
 tree - read.tree(full_Cipres_Data/RAxML_bestTree.tre)
 
 *tree - reorder(tree)
 tree - chronos(tree, lambda = 1, model =
 correlated, quiet = FALSE,
calibration = makeChronosCalib(tree),
control = chronos.control())
 
 Sometimes the imported  tree structure, while valid, can be an issue for
 ape and I find doing this reordering can fix weird errors.
 
 Best,
 Brian
 
 ___
 Brian O'Meara
 Assistant Professor
 Dept. of Ecology  Evolutionary Biology
 U. of Tennessee, Knoxville
 http://www.brianomeara.info
 
 Postdoc collaborators wanted: http://nimbios.org/postdocs/
 Calendar: http://www.brianomeara.info/calendars/omeara
 
 On Tue, Dec 30, 2014 at 4:18 AM, Franz Krah f.k...@mailbox.org wrote:
 
 Dear all,
 
 I used chronos to make a large phylogeny (~3600 species) ultrametric.
 The tree is rooted and binary.
 
 With the following code I get the error:
 
 tree - read.tree(full_Cipres_Data/RAxML_bestTree.tre)
 tree - chronos(tree, lambda = 1, model = correlated, quiet = FALSE,
calibration = makeChronosCalib(tree),
control = chronos.control())
 
 
 Error:
 
 In nlminb(start.para, f, g, control = opt.ctrl, lower = LOW, upper = UP) :
  NA/NaN function evaluation
 
 This error was posted before some times but I didn't find a satisfying
 answer.
 
 I tried out some things but didn't helped: altering the root in different
 ways,
 altering the model, altering lambda. The functions worked fine with smaller
 phylogenies before!
 
 Hope someone found a solution!
 Are there other ways of making my tree ultrametric?
 
 Cheers, Franz
 
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/
 
 
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Analysis with Multiple cores on Mac Workstation

2012-12-10 Thread Joseph W. Brown
 obvious utility.
 As Daniel says, the average branch lengths across a set of trees is
 more difficult to see a use case for, but you could imagine doing
 something
 like taking the ratogram output from r8s on a set of trees and
 summarizing
 the rate average and rate sd on a given, best, tree as two sets of
 branch
 lengths on that tree.
 
 I've put the function source at
 
 https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/R/consensusBrl
 en.R?revision=110root=omearalab
 .
 You can source the file for the function (consensusBrlen() ) and
 other
 functions it needs. It also uses phylobase. Note that this is
 alpha-quality
 code -- it's been checked a bit, but verify it's doing what you want.
 
 Here's an example of how to use it
 
 library(ape)
 
 library(phylobase)
 
 phy.a-rcoal(15)
 
 phy.b-phy.a
 
 phy.b$edge.length-phy.b$edge.length+runif(length(phy.b$edge.length),
 0,
 0.1)
 
 phy.c-rcoal(15)
 
 phy.list-list(phy.a, phy.b, phy.c)
 
 phy.consensus-consensusBrlen(phy.a, list(phy.a, phy.b, phy.c),
 type=mean_brlen)
 
 --
 José Hidasi Neto
 Graduated in Biological Sciences - Universidade Federal de Goiás (UFG)
 Master's candidate in Ecology and Evolution - Community Ecology and
 Functioning Lab - UFG
 Lattes:
 http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4293841A0
 
  [[alternative HTML version deleted]]
 
 
 
 
 --
 Daniel Barker
 http://bio.st-andrews.ac.uk/staff/db60.htm
 The University of St Andrews is a charity registered in Scotland : No
 SC013532
 
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/
 
 
 
 
 
 --
 José Hidasi Neto
 Graduated in Biological Sciences - Universidade Federal de Goiás (UFG)
 Master's candidate in Ecology and Evolution - Community Ecology and
 Functioning Lab - UFG
 Lattes:
 http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4293841A0
 Website: http://rfunctions.blogspot.com.br/
 
  [[alternative HTML version deleted]]
 
 
 
 
 -- 
 Klaus Schliep
 Phylogenomics Lab at the University of Vigo, Spain
 
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_
Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology  Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/