This is extremely helpful, thanks Emmanuel. I think it might be useful to note 
this in the documentation for multi2di (or give a pointer to the description), 
as it wasn’t obvious to me how to find out this information from within R. Even 
better would be an option that allowed equiprobable topologies to be generated 
if need be, although that’s obviously more work.

In my use case I could easily have polytomies with thousands or tens of 
thousands of edges, so the algorithmic approach is much more suitable than the 
enumerative one, and thanks for the tips on making it work. But on the topic of 
enumeration, an extremely capable student of ours has been working out ways of 
using integer partitions to enumerate topologies, even for very large trees, 
and sampling from a list of tree ranks, then unranking the result, would 
probably be a lot more scalable than actually creating the trees using allTrees 
(our treatment is in Python, however, and we haven’t published it yet: details 
in https://tskit.readthedocs.io/en/latest/combinatorics.html). I’d be 
interested in knowing what prior art there is in this area, and if what our 
student has done is of any use to APE? The concepts scale quite well to 
mid-sized trees, and there are also separate applications to trees with 
millions of tips (which we occasionally have to deal with). 

Cheers

Yan

> On 27 Jun 2020, at 10:33, Emmanuel Paradis <emmanuel.para...@ird.fr> wrote:
> 
> Hi Yan,
> 
> multi2di() calls rtree() if random = TRUE. As you rightly guessed, the 
> algorithm used by this latter function is (described in APER2, p. 313):
> 
> 1. Draw randomly an integer a on the interval [1, n − 1]. Set b = n − a.
> 2. If a > 1, apply (recursively) step 1 after substituting n by a.
> 3. Repeat step 2 with b in place of a.
> 4. Assign randomly the n tip labels to the tips.
> 
> This is indeed results in unequal frequencies of the possible topologies. If 
> step 1 "n − 1" is replaced by "floor(n/2)", then all topologies are generated 
> with equal probability.
> 
> Practically, I see two possibilities for you. If pratical, you generate a 
> list with all possible topologies, for instance with phangorn's function 
> allTrees:
> 
> R> library(phangorn)
> R> TR <- allTrees(4, rooted = TRUE)
> R> TR
> 15 phylogenetic trees
> 
> Then you just draw randomly some integers between 1 and 15 and use them as 
> indices:
> 
> R> s <- sample.int(length(TR), size = 1e5, replace = TRUE)
> R> TR[s]
> 100000 phylogenetic trees
> 
> If you want, you can play with the option 'prob' of sample.int().
> 
> The other solution is to make copies of rtree() and multi2di() in your 
> workspace: only rtree() needs to be modified and only where sample.int() is 
> called in the recursive function foo. You also need to copy multi2di() even 
> unchanged because of the package's namespace.
> 
> HTH
> 
> Best,

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to