Hi David and others,
prop.part() with a single tree does what you want:
set.seed(123)
res_tree <- rtree(1700)
system.time(desc2 <- prop.part(res_tree))
user system elapsed
0.050 0.000 0.053
edge_end <- unique(res_tree$edge[,1])
system.time(desc1 <- Descendants(res_tree,edge_end))
user system elapsed
0.400 0.000 0.406
This gives the same answer than Descendants:
for (i in seq_along(desc1))
+ stopifnot(all(sort(desc1[[i]]) == sort(desc2[[i]])))
Klaus makes an excellent reminder about R programming: use vectorisation
whenever possible (you'll almost always win to do it).
Your examples are very instructive. One lesson is that it is quite easy
to write R code that tends to be slow. Another one is that it is also
easy to write different functions doing the same thing. I'm quite
impressed to see that several packages have re-implemented something
that has been in ape for some time.
I wish developpers (and users as well) could use these tools from ape
which have been developed specifically for this kind of objectives
(fast, efficient, and reliable). Is it an issue of 'visibility' of these
functions? This could be a project idea for the next GSoC.
Cheers,
Emmanuel
PS: if you want to speed-up your computations and you have a multi-core
processor on your machine (which are common on laptops nowadays), you
could try the multicore package (phangorn already supports it):
> library(multicore)
> job <- parallel(prop.part(res_tree))
> system.time(out <- collect(job))
user system elapsed
0.080 0.010 0.002
> out <- out[[1]]
> identical(out, desc2)
[1] TRUE
This might be even more interesting with lists of trees.
David Bapst wrote on 07/03/2011 04:32:
Klaus-
Oh, that worked rather splendid! Thanks for letting me know.
system.time(desc<-Descendants(res_tree,edge_end))
user system elapsed
1.56 0.00 1.56
-Dave
On Sun, Mar 6, 2011 at 3:22 PM, Klaus Schliep <klaus.schl...@gmail.com> wrote:
Hi David,
you can supply Descendents (from phangorn) with a vector instead of
using sapply. I should have mentioned this in the help file.
If your vector (edge_end) is long enough (I don't have a exact number
here) this can be much faster than using sapply as some results are
reused.
Here are some timings:
set.seed(123)
res_tree = rtree(1700)
edge_end = unique(res_tree$edge[,1])
system.time(desc1<-sapply(edge_end,function(x) Descendants(res_tree,x)))
user system elapsed
54.51 0.00 54.87
system.time(desc2<-Descendants(res_tree,edge_end))
user system elapsed
0.75 0.00 0.75
system.time(desc3<-sapply(edge_end,function(x) node.tips(res_tree,x)))
user system elapsed
4.07 0.00 4.07
Cheers,
Klaus
On 3/6/11, David Bapst <dwba...@uchicago.edu> wrote:
Hello all,
I'm currently trying to measure a parameter over a large number of
large trees (>1700 tips), and part of this calculation requires
knowing the tip taxa descended from each node (I must compare the
difference in tip values among taxa descended from a node). Because I
must do this many times, I decided to compare the efficiency of
several methods for doing this in various R libraries with
system.time() (as Liam did recently with some BM simulation functions
in some recent blog posts). As I feel that others may benefit from
this comparison of methods, I am posting the results to this list.
Geiger's node.leaves() is much faster than the alternatives, although
at ~13s a run, it is still not a particularly speedy process. I didn't
need the actual tip.labels, so I took node.leaves() and made it as
lean as possible, to produce node.tips(), below. That cut the run time
down to ~9 sec.
node.tips<-function (phy, node){
n <- length(phy$tip.label)
if (node <= n){node}else{
l<- numeric()
d<- phy$edge[which(phy$edge[,1]==node),2]
for(j in d){if(j <= n){l <- c(l, j)}else{l<-c(l,
node.tips(phy,j))}}
l}}
If anyone knows of another alternative that might be faster, I would
greatly appreciate any suggestions.
-Dave Bapst, UChicago Geosci
Using the modified node.leaves() function from geiger, node.tips()
system.time(desc<-sapply(edge_end,function(x) node.tips(res_tree,x)))
user system elapsed
9.39 0.00 9.44
Using node.leaves() in geiger
system.time(desc<-sapply(edge_end,function(x) node.leaves(res_tree,x)))
user system elapsed
13.29 0.02 13.34
Using Descendants() in phangorn
system.time(desc<-sapply(edge_end,function(x) Descendants(res_tree,x)))
user system elapsed
75.60 0.10 76.83
Using listTips() in adephylo
system.time(desc_list<-c(as.list(1:Ntip(res_tree)),listTips(res_tree)))
user system elapsed
75.27 2.93 87.28
Using descendants() in phylobase
system.time(desc<-sapply(edge_end,function(x)
descendants(res_tree4,x,type="tips")))
user system elapsed
149.56 0.67 155.15
Using nodeDecendants() in maticce (note that translating the tree
into ouchtree format was prohibitively very lengthy)
system.time(res_tree_ou<-ape2ouch(res_tree))
user system elapsed
84.01 0.13 85.34
system.time(desc<-sapply(edge_end,function(x)
nodeDescendents(res_tree_ou,x)))
user system elapsed
65.91 0.23 68.47
--
David Bapst
Dept of Geophysical Sciences
University of Chicago
5734 S. Ellis
Chicago, IL 60637
http://home.uchicago.edu/~dwbapst/
_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
--
Klaus Schliep
Université Paris 6 (Pierre et Marie Curie)
9, Quai Saint-Bernard, 75005 Paris
--
Emmanuel Paradis
IRD, Jakarta, Indonesia
http://ape.mpl.ird.fr/
_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo