Hi David and others,

prop.part() with a single tree does what you want:

set.seed(123)
res_tree <- rtree(1700)
system.time(desc2 <- prop.part(res_tree))
   user  system elapsed
  0.050   0.000   0.053
edge_end <- unique(res_tree$edge[,1])
system.time(desc1 <- Descendants(res_tree,edge_end))
   user  system elapsed
  0.400   0.000   0.406

This gives the same answer than Descendants:

for (i in seq_along(desc1))
+ stopifnot(all(sort(desc1[[i]]) == sort(desc2[[i]])))


Klaus makes an excellent reminder about R programming: use vectorisation
whenever possible (you'll almost always win to do it).

Your examples are very instructive. One lesson is that it is quite easy
to write R code that tends to be slow. Another one is that it is also
easy to write different functions doing the same thing. I'm quite
impressed to see that several packages have re-implemented something
that has been in ape for some time.

I wish developpers (and users as well) could use these tools from ape
which have been developed specifically for this kind of objectives (fast, efficient, and reliable). Is it an issue of 'visibility' of these functions? This could be a project idea for the next GSoC.

Cheers,

Emmanuel

PS: if you want to speed-up your computations and you have a multi-core processor on your machine (which are common on laptops nowadays), you could try the multicore package (phangorn already supports it):

> library(multicore)
> job <- parallel(prop.part(res_tree))
> system.time(out <- collect(job))
   user  system elapsed
  0.080   0.010   0.002
> out <- out[[1]]
> identical(out, desc2)
[1] TRUE

This might be even more interesting with lists of trees.


David Bapst wrote on 07/03/2011 04:32:
Klaus-
Oh, that worked rather splendid! Thanks for letting me know.

system.time(desc<-Descendants(res_tree,edge_end))
   user  system elapsed
   1.56    0.00    1.56

-Dave

On Sun, Mar 6, 2011 at 3:22 PM, Klaus Schliep <klaus.schl...@gmail.com> wrote:
Hi David,

you can supply Descendents (from phangorn) with a vector instead of
using sapply. I should have mentioned this in the help file.
If your vector (edge_end) is long enough (I don't have a exact number
here) this can be much faster than using sapply as some results are
reused.
Here are some timings:

set.seed(123)
res_tree = rtree(1700)
edge_end = unique(res_tree$edge[,1])
system.time(desc1<-sapply(edge_end,function(x) Descendants(res_tree,x)))
  user  system elapsed
 54.51    0.00   54.87
system.time(desc2<-Descendants(res_tree,edge_end))
  user  system elapsed
  0.75    0.00    0.75
system.time(desc3<-sapply(edge_end,function(x) node.tips(res_tree,x)))
  user  system elapsed
  4.07    0.00    4.07


Cheers,
Klaus



On 3/6/11, David Bapst <dwba...@uchicago.edu> wrote:
Hello all,
I'm currently trying to measure a parameter over a large number of
large trees (>1700 tips), and part of this calculation requires
knowing the tip taxa descended from each node (I must compare the
difference in tip values among taxa descended from a node). Because I
must do this many times,  I decided to compare the efficiency of
several methods for doing this in various R libraries with
system.time() (as Liam did recently with some BM simulation functions
in some recent blog posts). As I feel that others may benefit from
this comparison of methods, I am posting the results to this list.

Geiger's node.leaves() is much faster than the alternatives, although
at ~13s a run, it is still not a particularly speedy process. I didn't
need the actual tip.labels, so I took node.leaves() and made it as
lean as possible, to produce node.tips(), below. That cut the run time
down to ~9 sec.

node.tips<-function (phy, node){
      n <- length(phy$tip.label)
      if (node <= n){node}else{
              l<- numeric()
              d<- phy$edge[which(phy$edge[,1]==node),2]
              for(j in d){if(j <= n){l <- c(l, j)}else{l<-c(l, 
node.tips(phy,j))}}
       l}}

If anyone knows of another alternative that might be faster, I would
greatly appreciate any suggestions.
-Dave Bapst, UChicago Geosci

Using the modified node.leaves() function from geiger, node.tips()

system.time(desc<-sapply(edge_end,function(x) node.tips(res_tree,x)))
   user  system elapsed
   9.39    0.00    9.44

Using node.leaves() in geiger

system.time(desc<-sapply(edge_end,function(x) node.leaves(res_tree,x)))
   user  system elapsed
  13.29    0.02   13.34

Using Descendants() in phangorn

system.time(desc<-sapply(edge_end,function(x) Descendants(res_tree,x)))
   user  system elapsed
  75.60    0.10   76.83

Using listTips() in adephylo

system.time(desc_list<-c(as.list(1:Ntip(res_tree)),listTips(res_tree)))
   user  system elapsed
  75.27    2.93   87.28

Using descendants() in phylobase

system.time(desc<-sapply(edge_end,function(x)
descendants(res_tree4,x,type="tips")))
   user  system elapsed
 149.56    0.67  155.15

Using nodeDecendants() in maticce  (note that translating the tree
into ouchtree format was prohibitively very lengthy)

system.time(res_tree_ou<-ape2ouch(res_tree))
   user  system elapsed
  84.01    0.13   85.34
system.time(desc<-sapply(edge_end,function(x)
nodeDescendents(res_tree_ou,x)))
   user  system elapsed
  65.91    0.23   68.47

--
David Bapst
Dept of Geophysical Sciences
University of Chicago
5734 S. Ellis
Chicago, IL 60637
http://home.uchicago.edu/~dwbapst/

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


--
Klaus Schliep
Université Paris 6 (Pierre et Marie Curie)
9, Quai Saint-Bernard, 75005 Paris





--
Emmanuel Paradis
IRD, Jakarta, Indonesia
http://ape.mpl.ird.fr/

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Reply via email to