Re: [R-sig-phylo] Breaking polytomies such that all topologies are equiprobable

2020-06-29 Thread Emmanuel Paradis
- Le 27 Juin 20, à 16:53, Yan Wong y...@pixie.org.uk a écrit :
> This is extremely helpful, thanks Emmanuel. I think it might be useful to note
> this in the documentation for multi2di (or give a pointer to the description),
> as it wasn’t obvious to me how to find out this information from within R. 
> Even
> better would be an option that allowed equiprobable topologies to be generated
> if need be, although that’s obviously more work.

I'll add an option to both rtree() and multi2di().

> In my use case I could easily have polytomies with thousands or tens of
> thousands of edges, so the algorithmic approach is much more suitable than the
> enumerative one, and thanks for the tips on making it work. But on the topic 
> of
> enumeration, an extremely capable student of ours has been working out ways of
> using integer partitions to enumerate topologies, even for very large trees,
> and sampling from a list of tree ranks, then unranking the result, would
> probably be a lot more scalable than actually creating the trees using 
> allTrees

allTrees() does not accept n > 10 tips to prevent you filling your computer's 
memory.

> (our treatment is in Python, however, and we haven’t published it yet: details
> in https://tskit.readthedocs.io/en/latest/combinatorics.html). I’d be
> interested in knowing what prior art there is in this area, and if what our
> student has done is of any use to APE? The concepts scale quite well to
> mid-sized trees, and there are also separate applications to trees with
> millions of tips (which we occasionally have to deal with).

ape has howmanytrees() which computes the number of possible topologies for 
most cases: (un)labelled, (un)rooted, binary or not.

Ranking topologies seems related to previous works on matchings by Diaconis & 
Holmes (see the reference cited in ?as.matching). A related work is the 
geodesic distance implemented in the package distory (which I maintain now). 
The latter is more general since it considers branch lengths.

Maybe there is also a connection between tree ranking and topological distances 
(see ?nni in phangorn).

Best,

Emmanuel

> Cheers
> 
> Yan
> 
>> On 27 Jun 2020, at 10:33, Emmanuel Paradis  wrote:
>> 
>> Hi Yan,
>> 
>> multi2di() calls rtree() if random = TRUE. As you rightly guessed, the 
>> algorithm
>> used by this latter function is (described in APER2, p. 313):
>> 
>> 1. Draw randomly an integer a on the interval [1, n − 1]. Set b = n − a.
>> 2. If a > 1, apply (recursively) step 1 after substituting n by a.
>> 3. Repeat step 2 with b in place of a.
>> 4. Assign randomly the n tip labels to the tips.
>> 
>> This is indeed results in unequal frequencies of the possible topologies. If
>> step 1 "n − 1" is replaced by "floor(n/2)", then all topologies are generated
>> with equal probability.
>> 
>> Practically, I see two possibilities for you. If pratical, you generate a 
>> list
>> with all possible topologies, for instance with phangorn's function allTrees:
>> 
>> R> library(phangorn)
>> R> TR <- allTrees(4, rooted = TRUE)
>> R> TR
>> 15 phylogenetic trees
>> 
>> Then you just draw randomly some integers between 1 and 15 and use them as
>> indices:
>> 
>> R> s <- sample.int(length(TR), size = 1e5, replace = TRUE)
>> R> TR[s]
>> 10 phylogenetic trees
>> 
>> If you want, you can play with the option 'prob' of sample.int().
>> 
>> The other solution is to make copies of rtree() and multi2di() in your
>> workspace: only rtree() needs to be modified and only where sample.int() is
>> called in the recursive function foo. You also need to copy multi2di() even
>> unchanged because of the package's namespace.
>> 
>> HTH
>> 
> > Best,

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Breaking polytomies such that all topologies are equiprobable

2020-06-29 Thread Yan Wong
Thanks very much Martin. I’m glad that I brought this up (I think the 
documentation should probably make this clearer). Your extra functions, 
together with the corrected Dendropy code and Emmanuel’s hints, should all be 
useful for me to check I’m doing it right for larger trees.

Cheers

Yan

> On 29 Jun 2020, at 13:34, Martin R. Smith  wrote:
> 
> Dear Yan,
> 
> I had assumed that `multi2di` (and `rtree`) sampled tree topologies with 
> equal probability – thanks for bringing to my attention that this is not the 
> case!  I should have read the 'ape' documentation more carefully...
> 
> I have added the functions `RandomTree()` and `MakeTreeBinary()` to my 
> 'TreeTools' package, which sample binary trees from the uniform distribution 
> as you desire.
> 
> Until the next CRAN release of the package, you can download the development 
> version with
> 
> ```
> devtools::install_github('ms609/TreeTools')
> ```
> 
> Then you can go:
> 
> ```
> library('TreeTools')
> structure(lapply(1:10, function(x) 
> MakeTreeBinary(StarTree(c('a','b','c','d'))), class = 'multiPhylo')
> ```
> 
> I hope this works for you – do let me know how you get on.
> 
> - Martin
> 
> --
>  
> Dr. Martin R. Smith
> Assistant Professor in Palaeontology
> Durham University
> Department of Earth Sciences
> Mountjoy Site, South Road, Durham, DH1 3LE United Kingdom
>  
> T: +44 (0)191 334 2320
> M: +44 (0)774 353 7510
> E: martin.sm...@durham.ac.uk 
>  
> community.dur.ac.uk/martin.smith 
> twitter.com/PalaeoSmith 
[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Breaking polytomies such that all topologies are equiprobable

2020-06-27 Thread Yan Wong


This is extremely helpful, thanks Emmanuel. I think it might be useful to note 
this in the documentation for multi2di (or give a pointer to the description), 
as it wasn’t obvious to me how to find out this information from within R. Even 
better would be an option that allowed equiprobable topologies to be generated 
if need be, although that’s obviously more work.

In my use case I could easily have polytomies with thousands or tens of 
thousands of edges, so the algorithmic approach is much more suitable than the 
enumerative one, and thanks for the tips on making it work. But on the topic of 
enumeration, an extremely capable student of ours has been working out ways of 
using integer partitions to enumerate topologies, even for very large trees, 
and sampling from a list of tree ranks, then unranking the result, would 
probably be a lot more scalable than actually creating the trees using allTrees 
(our treatment is in Python, however, and we haven’t published it yet: details 
in https://tskit.readthedocs.io/en/latest/combinatorics.html). I’d be 
interested in knowing what prior art there is in this area, and if what our 
student has done is of any use to APE? The concepts scale quite well to 
mid-sized trees, and there are also separate applications to trees with 
millions of tips (which we occasionally have to deal with). 

Cheers

Yan

> On 27 Jun 2020, at 10:33, Emmanuel Paradis  wrote:
> 
> Hi Yan,
> 
> multi2di() calls rtree() if random = TRUE. As you rightly guessed, the 
> algorithm used by this latter function is (described in APER2, p. 313):
> 
> 1. Draw randomly an integer a on the interval [1, n − 1]. Set b = n − a.
> 2. If a > 1, apply (recursively) step 1 after substituting n by a.
> 3. Repeat step 2 with b in place of a.
> 4. Assign randomly the n tip labels to the tips.
> 
> This is indeed results in unequal frequencies of the possible topologies. If 
> step 1 "n − 1" is replaced by "floor(n/2)", then all topologies are generated 
> with equal probability.
> 
> Practically, I see two possibilities for you. If pratical, you generate a 
> list with all possible topologies, for instance with phangorn's function 
> allTrees:
> 
> R> library(phangorn)
> R> TR <- allTrees(4, rooted = TRUE)
> R> TR
> 15 phylogenetic trees
> 
> Then you just draw randomly some integers between 1 and 15 and use them as 
> indices:
> 
> R> s <- sample.int(length(TR), size = 1e5, replace = TRUE)
> R> TR[s]
> 10 phylogenetic trees
> 
> If you want, you can play with the option 'prob' of sample.int().
> 
> The other solution is to make copies of rtree() and multi2di() in your 
> workspace: only rtree() needs to be modified and only where sample.int() is 
> called in the recursive function foo. You also need to copy multi2di() even 
> unchanged because of the package's namespace.
> 
> HTH
> 
> Best,

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Breaking polytomies such that all topologies are equiprobable

2020-06-27 Thread Emmanuel Paradis
Hi Yan,

multi2di() calls rtree() if random = TRUE. As you rightly guessed, the 
algorithm used by this latter function is (described in APER2, p. 313):

1. Draw randomly an integer a on the interval [1, n − 1]. Set b = n − a.
2. If a > 1, apply (recursively) step 1 after substituting n by a.
3. Repeat step 2 with b in place of a.
4. Assign randomly the n tip labels to the tips.

This is indeed results in unequal frequencies of the possible topologies. If 
step 1 "n − 1" is replaced by "floor(n/2)", then all topologies are generated 
with equal probability.

Practically, I see two possibilities for you. If pratical, you generate a list 
with all possible topologies, for instance with phangorn's function allTrees:

R> library(phangorn)
R> TR <- allTrees(4, rooted = TRUE)
R> TR
15 phylogenetic trees

Then you just draw randomly some integers between 1 and 15 and use them as 
indices:

R> s <- sample.int(length(TR), size = 1e5, replace = TRUE)
R> TR[s]
10 phylogenetic trees

If you want, you can play with the option 'prob' of sample.int().

The other solution is to make copies of rtree() and multi2di() in your 
workspace: only rtree() needs to be modified and only where sample.int() is 
called in the recursive function foo. You also need to copy multi2di() even 
unchanged because of the package's namespace.

HTH

Best,

Emmanuel

- Le 27 Juin 20, à 4:46, Yan Wong y...@pixie.org.uk a écrit :

> I’m trying to figure out how to randomly resolve polytomies such that there is
> an equal probability of any topology being generated. I thought that the ape
> function “multi2di” did this, but when I have tried it repeatedly with a
> 4-tomy, multi2di seems to produce the 3 balanced trees [((a,b),(c,d)) ;
> ((a,c),(b,d)); ((a,d),(b,c))] twice as often as the 12 possible unbalanced
> dichotomous 4-tip rooted topologies. The R code I’ve used to produce the 
> sample
> topologies is something like this:
> 
> do.call(c, lapply(1:10, function(x) 
> multi2di(starTree(c('a','b','c','d')
> 
> Firstly, is this expected, or am I doing something wrong (if expected, it 
> would
> be useful to note this in the docs)? Secondly, is there an function somewhere
> that *will* break polytomies to produce equiprobable topologies? If not,
> thirdly, is there an algorithm that will do this? I think the standard
> “repeatedly pick 2 random edges from the polytomy node and pair them off”
> results in the non-equiprobable distribution that I find using multi2di. I
> think I’ve found a similar problem with the Dendropy algorithm, which does
> claim to result in equiprobable topologies, and have posted to their mailing
> list in case I’m misunderstanding something.
> 
> Cheers
> 
> Yan Wong
> Big Data Institute, Oxford University
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Breaking polytomies such that all topologies are equiprobable

2020-06-26 Thread Holder, Mark Travis
Hi Yan,
I'm useless in R, so I won't comment on that aspect of your email...


I just posted a fix on a branch of dendropy.  The algorithm that DendroPy uses 
is basically stepwise addition of leaves. The code we had pruned the polytomy 
down to having 2 children, and then adding leaves back randomly in the subtree 
that subtends the node that had been the polytomy.

We (dendropy folks) had just failed to include the "root" of the original 
polytomy as an attachment point for a new node. So, that method was 
conceptually correct, but implemented in a buggy way.

The star-decomposition-like method (pinching off pairs of children) doesn't 
work for equiprobable topologies. I think that (if you added branch lengths 
during simulation) that method will give you an equiprobable distribution over 
labelled histories (where the depth of the nodes you are pinching off matters) 
rather than topologies. That pinching off pairs is essentially what we do when 
simulating coalescent histories.

I hope that helps, but I'm definitely being sloppy in my description. So don't 
hesitate to let me know if this explanation doesn't make sens.

all the best,
Mark

Mark Holder

mthol...@gmail.com
mthol...@ku.edu
http://phylo.bio.ku.edu/mark-holder

==
Department of Ecology and Evolutionary Biology
University of Kansas
6031 Haworth Hall
1200 Sunnyside Avenue
Lawrence, Kansas 66045

lab phone:  785.864.5789
fax (shared): 785.864.5860
==

From: R-sig-phylo  on behalf of Yan Wong 

Sent: Friday, June 26, 2020 4:46 PM
To: r-sig-phylo@r-project.org 
Subject: [R-sig-phylo] Breaking polytomies such that all topologies are 
equiprobable

I�m trying to figure out how to randomly resolve polytomies such that there is 
an equal probability of any topology being generated. I thought that the ape 
function �multi2di� did this, but when I have tried it repeatedly with a 
4-tomy, multi2di seems to produce the 3 balanced trees [((a,b),(c,d)) ; 
((a,c),(b,d)); ((a,d),(b,c))] twice as often as the 12 possible unbalanced 
dichotomous 4-tip rooted topologies. The R code I�ve used to produce the sample 
topologies is something like this:

do.call(c, lapply(1:10, function(x) multi2di(starTree(c('a','b','c','d')

Firstly, is this expected, or am I doing something wrong (if expected, it would 
be useful to note this in the docs)? Secondly, is there an function somewhere 
that *will* break polytomies to produce equiprobable topologies? If not, 
thirdly, is there an algorithm that will do this? I think the standard 
�repeatedly pick 2 random edges from the polytomy node and pair them off� 
results in the non-equiprobable distribution that I find using multi2di. I 
think I�ve found a similar problem with the Dendropy algorithm, which does 
claim to result in equiprobable topologies, and have posted to their mailing 
list in case I�m misunderstanding something.

Cheers

Yan Wong
Big Data Institute, Oxford University
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=02%7C01%7Cmtholder%40ku.edu%7Cc9cac10fb66d4f6a054f08d81a1a7574%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C637288048254353234&sdata=kXOqMU5IQY4Pa2BVsHZnvm01mpVieHIFFa9yg720bhc%3D&reserved=0
Searchable archive at 
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=02%7C01%7Cmtholder%40ku.edu%7Cc9cac10fb66d4f6a054f08d81a1a7574%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C637288048254353234&sdata=ir2rXEIKyLeaPF4MbY36uvqn6Z2o1pT8sApoZQXZnPc%3D&reserved=0

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/