Hi Dave.
It seems like one way to get these values faster would be to count the
number of descendants as the tree is read in to R. This is possible
because when the ")" character is reached by the Newick parser, all
descendant nodes (and tips) have already been created in memory.
This was simple to add to my tree reading function read.newick() [code
here:
http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/read.newick/v0.3/read.newick.R].
Now v0.3 of the function returns a modified "phylo" object with the
vector $Ndesc containing the number of descendants from each internal
and terminal node above the root (in the order of $edge[,2]). This is
exactly the same vector as would be obtained by the following commands:
> desc<-sapply(tree$edge[,2],function(x) node.leaves(tree,x))
> Ndesc<-sapply(desc,function(x) length(x))
Note that read.newick() only reads one tree at a time in "phylip" format
(and without node labels) - but reading multiple trees or node labels,
or reading from "nexus" format, would quite easy to add if you find that
this is indeed faster.
In addition, I believe that newick2phylog() creates this list in reading
a Newick tree into memory as a "phylog" object. This is stored as
phy$Aparam$nlea for "phylog" object phy. Unfortunately, it seems like
newick2phylog() is very slow for large trees.
- Liam
--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com
On 3/6/2011 12:47 PM, David Bapst wrote:
Hello all,
I'm currently trying to measure a parameter over a large number of
large trees (>1700 tips), and part of this calculation requires
knowing the tip taxa descended from each node (I must compare the
difference in tip values among taxa descended from a node). Because I
must do this many times, I decided to compare the efficiency of
several methods for doing this in various R libraries with
system.time() (as Liam did recently with some BM simulation functions
in some recent blog posts). As I feel that others may benefit from
this comparison of methods, I am posting the results to this list.
Geiger's node.leaves() is much faster than the alternatives, although
at ~13s a run, it is still not a particularly speedy process. I didn't
need the actual tip.labels, so I took node.leaves() and made it as
lean as possible, to produce node.tips(), below. That cut the run time
down to ~9 sec.
node.tips<-function (phy, node){
n<- length(phy$tip.label)
if (node<= n){node}else{
l<- numeric()
d<- phy$edge[which(phy$edge[,1]==node),2]
for(j in d){if(j<= n){l<- c(l, j)}else{l<-c(l,
node.tips(phy,j))}}
l}}
If anyone knows of another alternative that might be faster, I would
greatly appreciate any suggestions.
-Dave Bapst, UChicago Geosci
Using the modified node.leaves() function from geiger, node.tips()
system.time(desc<-sapply(edge_end,function(x) node.tips(res_tree,x)))
user system elapsed
9.39 0.00 9.44
Using node.leaves() in geiger
system.time(desc<-sapply(edge_end,function(x) node.leaves(res_tree,x)))
user system elapsed
13.29 0.02 13.34
Using Descendants() in phangorn
system.time(desc<-sapply(edge_end,function(x) Descendants(res_tree,x)))
user system elapsed
75.60 0.10 76.83
Using listTips() in adephylo
system.time(desc_list<-c(as.list(1:Ntip(res_tree)),listTips(res_tree)))
user system elapsed
75.27 2.93 87.28
Using descendants() in phylobase
system.time(desc<-sapply(edge_end,function(x)
descendants(res_tree4,x,type="tips")))
user system elapsed
149.56 0.67 155.15
Using nodeDecendants() in maticce (note that translating the tree
into ouchtree format was prohibitively very lengthy)
system.time(res_tree_ou<-ape2ouch(res_tree))
user system elapsed
84.01 0.13 85.34
system.time(desc<-sapply(edge_end,function(x) nodeDescendents(res_tree_ou,x)))
user system elapsed
65.91 0.23 68.47
_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo