Emmanuel, all-
That's pretty fast, even on my cheap laptop!

> system.time(desc1<-c(as.list(1:Ntip(res_tree)),prop.part(res_tree)))
   user  system elapsed
   0.65    0.00    0.66

As far as why I didn't try prop.part, to be honest, I had no idea that
prop.part() did that. The help file says:

Description
These functions analyse bipartitions found in a series of trees.
prop.part counts the number of bipartitions found in a series of trees
given as ....

and

Value
prop.part returns an object of class "prop.part" which is a list with
an attribute "number". The elements of this list are the observed
clades, and the attribute their respective numbers. If the default
check.labels = FALSE is used, an attribute "labels" is added, and the
vectors of the returned object contains the indices of these labels
instead of the labels themselves.

That doesn't suggest to me that it gives the tip descendants of every
internal node, although that is part of what it does. Until you
suggested I run it, I had no idea it could give the information I
wanted. Perhaps a little bit of clarifying of the help file,
particularly the specific bits of the output, would go a long way.

Cheers!
-Dave



On Sun, Mar 6, 2011 at 10:11 PM, Emmanuel Paradis
<emmanuel.para...@ird.fr> wrote:
> Hi David and others,
>
> prop.part() with a single tree does what you want:
>
>> set.seed(123)
>> res_tree <- rtree(1700)
>> system.time(desc2 <- prop.part(res_tree))
>
>   user  system elapsed
>  0.050   0.000   0.053
>>
>> edge_end <- unique(res_tree$edge[,1])
>> system.time(desc1 <- Descendants(res_tree,edge_end))
>
>   user  system elapsed
>  0.400   0.000   0.406
>
> This gives the same answer than Descendants:
>
>> for (i in seq_along(desc1))
>
> + stopifnot(all(sort(desc1[[i]]) == sort(desc2[[i]])))
>>
>
> Klaus makes an excellent reminder about R programming: use vectorisation
> whenever possible (you'll almost always win to do it).
>
> Your examples are very instructive. One lesson is that it is quite easy
> to write R code that tends to be slow. Another one is that it is also
> easy to write different functions doing the same thing. I'm quite
> impressed to see that several packages have re-implemented something
> that has been in ape for some time.
>
> I wish developpers (and users as well) could use these tools from ape
> which have been developed specifically for this kind of objectives (fast,
> efficient, and reliable). Is it an issue of 'visibility' of these functions?
> This could be a project idea for the next GSoC.
>
> Cheers,
>
> Emmanuel
>
> PS: if you want to speed-up your computations and you have a multi-core
> processor on your machine (which are common on laptops nowadays), you could
> try the multicore package (phangorn already supports it):
>
>> library(multicore)
>> job <- parallel(prop.part(res_tree))
>> system.time(out <- collect(job))
>   user  system elapsed
>  0.080   0.010   0.002
>> out <- out[[1]]
>> identical(out, desc2)
> [1] TRUE
>
> This might be even more interesting with lists of trees.
>
>
> David Bapst wrote on 07/03/2011 04:32:
>>
>> Klaus-
>> Oh, that worked rather splendid! Thanks for letting me know.
>>
>>> system.time(desc<-Descendants(res_tree,edge_end))
>>
>>   user  system elapsed
>>   1.56    0.00    1.56
>>
>> -Dave
>>
>> On Sun, Mar 6, 2011 at 3:22 PM, Klaus Schliep <klaus.schl...@gmail.com>
>> wrote:
>>>
>>> Hi David,
>>>
>>> you can supply Descendents (from phangorn) with a vector instead of
>>> using sapply. I should have mentioned this in the help file.
>>> If your vector (edge_end) is long enough (I don't have a exact number
>>> here) this can be much faster than using sapply as some results are
>>> reused.
>>> Here are some timings:
>>>
>>>> set.seed(123)
>>>> res_tree = rtree(1700)
>>>> edge_end = unique(res_tree$edge[,1])
>>>> system.time(desc1<-sapply(edge_end,function(x) Descendants(res_tree,x)))
>>>
>>>  user  system elapsed
>>>  54.51    0.00   54.87
>>>>
>>>> system.time(desc2<-Descendants(res_tree,edge_end))
>>>
>>>  user  system elapsed
>>>  0.75    0.00    0.75
>>>>
>>>> system.time(desc3<-sapply(edge_end,function(x) node.tips(res_tree,x)))
>>>
>>>  user  system elapsed
>>>  4.07    0.00    4.07
>>>
>>>
>>> Cheers,
>>> Klaus
>>>
>>>
>>>
>>> On 3/6/11, David Bapst <dwba...@uchicago.edu> wrote:
>>>>
>>>> Hello all,
>>>> I'm currently trying to measure a parameter over a large number of
>>>> large trees (>1700 tips), and part of this calculation requires
>>>> knowing the tip taxa descended from each node (I must compare the
>>>> difference in tip values among taxa descended from a node). Because I
>>>> must do this many times,  I decided to compare the efficiency of
>>>> several methods for doing this in various R libraries with
>>>> system.time() (as Liam did recently with some BM simulation functions
>>>> in some recent blog posts). As I feel that others may benefit from
>>>> this comparison of methods, I am posting the results to this list.
>>>>
>>>> Geiger's node.leaves() is much faster than the alternatives, although
>>>> at ~13s a run, it is still not a particularly speedy process. I didn't
>>>> need the actual tip.labels, so I took node.leaves() and made it as
>>>> lean as possible, to produce node.tips(), below. That cut the run time
>>>> down to ~9 sec.
>>>>
>>>> node.tips<-function (phy, node){
>>>>      n <- length(phy$tip.label)
>>>>      if (node <= n){node}else{
>>>>              l<- numeric()
>>>>              d<- phy$edge[which(phy$edge[,1]==node),2]
>>>>              for(j in d){if(j <= n){l <- c(l, j)}else{l<-c(l,
>>>> node.tips(phy,j))}}
>>>>       l}}
>>>>
>>>> If anyone knows of another alternative that might be faster, I would
>>>> greatly appreciate any suggestions.
>>>> -Dave Bapst, UChicago Geosci
>>>>
>>>> Using the modified node.leaves() function from geiger, node.tips()
>>>>
>>>>> system.time(desc<-sapply(edge_end,function(x) node.tips(res_tree,x)))
>>>>
>>>>   user  system elapsed
>>>>   9.39    0.00    9.44
>>>>
>>>> Using node.leaves() in geiger
>>>>
>>>>> system.time(desc<-sapply(edge_end,function(x) node.leaves(res_tree,x)))
>>>>
>>>>   user  system elapsed
>>>>  13.29    0.02   13.34
>>>>
>>>> Using Descendants() in phangorn
>>>>
>>>>> system.time(desc<-sapply(edge_end,function(x) Descendants(res_tree,x)))
>>>>
>>>>   user  system elapsed
>>>>  75.60    0.10   76.83
>>>>
>>>> Using listTips() in adephylo
>>>>
>>>>> system.time(desc_list<-c(as.list(1:Ntip(res_tree)),listTips(res_tree)))
>>>>
>>>>   user  system elapsed
>>>>  75.27    2.93   87.28
>>>>
>>>> Using descendants() in phylobase
>>>>
>>>>> system.time(desc<-sapply(edge_end,function(x)
>>>>> descendants(res_tree4,x,type="tips")))
>>>>
>>>>   user  system elapsed
>>>>  149.56    0.67  155.15
>>>>
>>>> Using nodeDecendants() in maticce  (note that translating the tree
>>>> into ouchtree format was prohibitively very lengthy)
>>>>
>>>>> system.time(res_tree_ou<-ape2ouch(res_tree))
>>>>
>>>>   user  system elapsed
>>>>  84.01    0.13   85.34
>>>>>
>>>>> system.time(desc<-sapply(edge_end,function(x)
>>>>> nodeDescendents(res_tree_ou,x)))
>>>>
>>>>   user  system elapsed
>>>>  65.91    0.23   68.47
>>>>
>>>> --
>>>> David Bapst
>>>> Dept of Geophysical Sciences
>>>> University of Chicago
>>>> 5734 S. Ellis
>>>> Chicago, IL 60637
>>>> http://home.uchicago.edu/~dwbapst/
>>>>
>>>> _______________________________________________
>>>> R-sig-phylo mailing list
>>>> R-sig-phylo@r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>>>
>>>
>>> --
>>> Klaus Schliep
>>> Université Paris 6 (Pierre et Marie Curie)
>>> 9, Quai Saint-Bernard, 75005 Paris
>>>
>>
>>
>>
>
> --
> Emmanuel Paradis
> IRD, Jakarta, Indonesia
> http://ape.mpl.ird.fr/
>
>
>



-- 
David Bapst
Dept of Geophysical Sciences
University of Chicago
5734 S. Ellis
Chicago, IL 60637
http://home.uchicago.edu/~dwbapst/

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Reply via email to