Re: [R-sig-phylo] Question on ace ML reconstruction of discrete binary character

2013-07-30 Thread Tom Wenseleers
Dear Liam,
Many thanks for your message and the clarification - that was indeed not clear 
to me from the ace help page!
Would ancRECON in package corHMM also be OK by any chance for my purposes? I 
see that that one also allows one to specify the prior for the root and I am 
dealing with binary characters with symmetric transition rates...

Cheers  thanks again for the advice!
Tom

-Original Message-
From: Liam J. Revell [mailto:liam.rev...@umb.edu] 
Sent: 30 July 2013 05:31
To: Tom Wenseleers
Cc: r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] Question on ace ML reconstruction of discrete binary 
character

Hi Tom.

This was the subject of discussion recently on this list. ace does not do 
marginal ancestral state reconstruction (which is probably what you
want) - it computes the conditional scaled likelihoods of the subtree. 
These are the same as the marginal reconstructions only at the root node. If 
your transition matrix is symmetric, then you can get the marginal 
reconstructions by rerooting at all the internal nodes. This is in the phytools 
function rerootingMethod 
(http://www.phytools.org/static.help/rerootingMethod.html). If you want to use 
a more complicated model, you will have to use another package - such as 
diversitree.

An alternative is to use stochastic mapping and then compute the posterior 
frequencies from the sample of stochastic maps. This makes it easy to put an 
explicit prior on the root and to integrate over uncertainty in the transition 
matrix. This is implemented in phytools also 
(http://www.phytools.org/static.help/make.simmap.html).

All the best, Liam

Liam J. Revell, Assistant Professor of Biology University of Massachusetts 
Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 7/29/2013 5:45 PM, Tom Wenseleers wrote:
 Dear all,

 @Arne: yes I think it has to do something with the priors for the root.
 I'm not sure what prior ace uses - I think equal, which in my case 
 would not be so appropriate given that nearly all species have the 
 trait.Would anyone know by any chance whether in ape it is possible to 
 haveace use a prior for the root which would reflect the frequency at 
 the tips, and if so, how one could specify this?

 Cheers,

 Tom

 *From:*Arne Mooers [mailto:amoo...@sfu.ca]
 *Sent:* 29 July 2013 20:10
 *To:* Tom Wenseleers
 *Subject:* Question on ace ML reconstruction of discrete binary 
 character

 Hoi Tom,

 What is the default prior on the root in ace? Different approaches use 
 different priors (=observed frequency at tips, equal, equal to tested 
 ratio of q's, etc.) That has had a big affect on reconstructions I 
 have done in the past.

 Cheers,

 Arne Mooers

 Begin forwarded message:

 *From: *Tom Wenseleers tom.wensele...@bio.kuleuven.be 
 mailto:tom.wensele...@bio.kuleuven.be

 *Date: *29 July, 2013 9:00:28 AM PDT

 *To: *r-sig-phylo@r-project.org mailto:r-sig-phylo@r-project.org
 r-sig-phylo@r-project.org mailto:r-sig-phylo@r-project.org

 *Subject: [R-sig-phylo] Question on ace ML reconstruction of discrete 
 binary character*

 Dear all,

 I just did some ancestral state reconstructions of binary characters 
 (screenshot attached) using ace (using an equal rate discrete 
 character
 reconstruction) . Everything seems to make sense to me, except the two 
 basal nodes, where I end up with quite low likelihoods for my red 
 character being 1 (cf. the pie charts), even though I get higher 
 likelihoods at practically all of the more shallow nodes in the tree.
 Any ideas why one can get a result like this, and what I could 
 potentially do about it, since it doesn't seem quite right to me?

 Cheers,

 Tom

 /_
 __/

 /Prof. Tom Wenseleers/

 */  Lab. of Socioecology and Social Evolution/

 /   Dept. of Biology/

 /   Zoological Institute/

 /   K.U.Leuven/

 ///Naamsestraat 59, box 2466/

 /   B-3000 Leuven/

 /   Belgium
 /(/+32 (0)16 32 39 64 / +32 (0)472 40 45 96/

 8/tom.wensele...@bio.kuleuven.be 
 mailto:tom.wensele...@bio.kuleuven.be/

 */http://bio.kuleuven.be/ento/wenseleers/twenseleers.htm/*

 ___
 R-sig-phylo mailing list -R-sig-phylo@r-project.org 
 mailto:R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive 
 athttp://www.mail-archive.com/r-sig-phylo@r-project.org/

 _
 Dr. Arne Mooers
 Biology, Simon Fraser University
  University Drive., Burnaby BC V5A 1S6 Canada amoo...@sfu.ca 
 mailto:amoo...@sfu.ca
 +1 778 782 3979
 skype: arnemooers
 www.sfu.ca/~amooers http://www.sfu.ca/~amooers www.sfu.ca/fabstar 
 http://www.sfu.ca/fabstar www.scientists-4-species.org 
 http://www.scientists-4-species.org
 hesp.irmacs.sfu.ca
 7billionandyou.org











 ___
 R-sig-phylo mailing

Re: [R-sig-phylo] Question on ace ML reconstruction of discrete binary character

2013-07-30 Thread Tom Wenseleers
Dear all,
Many thanks for all your advice so far. I have now moved to using rayDISC in 
package corHMM to reconstruct marginal maximum likelihood ancestral state 
reconstructions, using the method of Maddison et al (2007) and FitzJohn et al 
(2009) to fix the prior probabilities at the root (setting it to the observed 
frequency at the tips doesn't change much).
The code I have is

library(ape)
library(corHMM)
tree=read.tree(http://www.kuleuven.be/bio/ento/temp/tree.tre;)
data=read.csv(file=http://www.kuleuven.be/bio/ento/temp/data.csv;)
rownames(data)=data[,1]
ASR=rayDISC(tree,data,ntraits=1,charnum=1,model=ER,node.states=marginal,root.p=maddfitz)
plot(tree, cex=0.6, show.tip.label=TRUE, ljoin=2,lend=2,label.offset=0.02)
nodelabels(pie=ASR$states,piecol=c(white,red), cex=0.45)
tiplabels(pch = 22, bg = ifelse(data[tree$tip.label, ][,2],red,white), 
col=black,adj = c(0.51, 0.5), cex = 0.6)

I still get unusually low marginal ML values for the trait being 1 at the basal 
nodes though (ca. 0.7, which is very low considering that 89% of my species 
have the trait).
Would anyone be able to offer advice on why one could get the reconstructed 
root ML value to be so much lower than the actual observed frequency of the 
trait at the tips, and what could be a solution to obtaining a more realistic 
ML reconstruction? (I also tried diversitree and phangorn, but they all give 
similar results)

Cheers,
Tom



-Original Message-
From: Jack Viljoen [mailto:javilj...@gmail.com] 
Sent: 30 July 2013 10:23
To: Tom Wenseleers
Subject: Re: [R-sig-phylo] Question on ace ML reconstruction of discrete binary 
character

Hello, Tom.

I was just wondering if the higher uncertainty at the basal nodes isn't 
expected, particularly given the long branches descended from them?

Since this is an ML estimate and not a Bayesian one, surely the concept of 
priors does not apply? My understanding is that ace() actually only estimates 
the root node and that other methods are required to properly estimate the 
states at the other nodes. I'm basing this on these posts from Liam Revell 
earlier this year:
http://blog.phytools.org/2013/03/conditional-scaled-likelihoods-in-ace.html
http://blog.phytools.org/2013/03/a-little-more-on-ancestral-state.html

I hope those links shed some light on the matter, or that someone who knows 
about this stuff has responded to you off-list as well.

Good luck,
Jack


 --

 Message: 1
 Date: Mon, 29 Jul 2013 16:00:28 +
 From: Tom Wenseleers tom.wensele...@bio.kuleuven.be
 To: r-sig-phylo@r-project.org r-sig-phylo@r-project.org
 Subject: [R-sig-phylo] Question on ace ML reconstruction of discrete
 binary  character
 Message-ID:
 
 37efc97028f3e44082acc5cbec00563011294...@icts-s-mbx7.luna.kuleuven.be
 

 Content-Type: text/plain; charset=us-ascii

 Dear all,

 I just did some ancestral state reconstructions of binary characters 
 (screenshot attached) using ace (using an equal rate discrete character 
 reconstruction) . Everything seems to make sense to me, except the two basal 
 nodes, where I end up with quite low likelihoods for my red character being 1 
 (cf. the pie charts), even though I get higher likelihoods at practically all 
 of the more shallow nodes in the tree. Any ideas why one can get a result 
 like this, and what I could potentially do about it, since it doesn't seem 
 quite right to me?



 Cheers,

 Tom

 __
 _

 Prof. Tom Wenseleers
 *  Lab. of Socioecology and Social Evolution
Dept. of Biology
Zoological Institute
K.U.Leuven
Naamsestraat 59, box 2466
B-3000 Leuven
Belgium
 * +32 (0)16 32 39 64 / +32 (0)472 40 45 96
 * tom.wensele...@bio.kuleuven.be
 http://bio.kuleuven.be/ento/wenseleers/twenseleers.htm





 -- next part -- An HTML attachment was 
 scrubbed...
 URL: 
 https://stat.ethz.ch/pipermail/r-sig-phylo/attachments/20130729/609e4
 f89/attachment-0001.html
 -- next part -- A non-text attachment was 
 scrubbed...
 Name: ace ML reconstruction.jpg
 Type: image/jpeg
 Size: 196149 bytes
 Desc: ace ML reconstruction.jpg
 URL: 
 https://stat.ethz.ch/pipermail/r-sig-phylo/attachments/20130729/609e4
 f89/attachment-0001.jpg

 --

 Message: 2
 Date: Mon, 29 Jul 2013 21:45:53 +
 From: Tom Wenseleers tom.wensele...@bio.kuleuven.be
 To: r-sig-phylo@r-project.org r-sig-phylo@r-project.org
 Subject: Re: [R-sig-phylo] Question on ace ML reconstruction of
 discretebinary character
 Message-ID:
 
 37efc97028f3e44082acc5cbec00563011294...@icts-s-mbx7.luna.kuleuven.be
 

 Content-Type: text/plain; charset=us-ascii

 Dear all,
 @Arne: yes I think it has to do something with the priors for the root. I'm 
 not sure what prior ace uses - I think equal

Re: [R-sig-phylo] Question on ace ML reconstruction of discrete binary character

2013-07-30 Thread Liam J. Revell

Hi Tom.

There is no reason to expect that the marginal ancestral state 
reconstructions at the root node (empirical Bayesian posterior 
probabilities) should match your tip frequencies or prior probabilities. 
Imagine the following scenario: you have one diverse clade comprising 
50% of extant taxa that all diverged recently from a common ancestor 
share state B; whereas state A is found in all the other tips of the 
tree, some of which are in clades originating near the root. We would 
not expect posterior probabilities at the root node to mach the 
empirical frequencies of our state at the tips (50:50). In fact, we 
might expect that our reconstructed state at the root of the tree would 
be strongly A.


In your specific case, state 0 is found in three clades that originate 
nearer to the root, whereas more nested clades are exclusively in state 
1. This is why - in spite of its relative rarity across the tips of 
the tree - there is still some reasonable (PP~0.3) posterior probability 
under the model that the root is in state 0. This is not an error that 
needs to be corrected - it is just what your data, model, and tree tell 
us about the ancestral node of the phylogeny.


All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 7/30/2013 6:10 AM, Tom Wenseleers wrote:

Dear all,
Many thanks for all your advice so far. I have now moved to using rayDISC in 
package corHMM to reconstruct marginal maximum likelihood ancestral state 
reconstructions, using the method of Maddison et al (2007) and FitzJohn et al 
(2009) to fix the prior probabilities at the root (setting it to the observed 
frequency at the tips doesn't change much).
The code I have is

library(ape)
library(corHMM)
tree=read.tree(http://www.kuleuven.be/bio/ento/temp/tree.tre;)
data=read.csv(file=http://www.kuleuven.be/bio/ento/temp/data.csv;)
rownames(data)=data[,1]
ASR=rayDISC(tree,data,ntraits=1,charnum=1,model=ER,node.states=marginal,root.p=maddfitz)
plot(tree, cex=0.6, show.tip.label=TRUE, ljoin=2,lend=2,label.offset=0.02)
nodelabels(pie=ASR$states,piecol=c(white,red), cex=0.45)
tiplabels(pch = 22, bg = ifelse(data[tree$tip.label, ][,2],red,white), 
col=black,adj = c(0.51, 0.5), cex = 0.6)

I still get unusually low marginal ML values for the trait being 1 at the basal 
nodes though (ca. 0.7, which is very low considering that 89% of my species 
have the trait).
Would anyone be able to offer advice on why one could get the reconstructed 
root ML value to be so much lower than the actual observed frequency of the 
trait at the tips, and what could be a solution to obtaining a more realistic 
ML reconstruction? (I also tried diversitree and phangorn, but they all give 
similar results)

Cheers,
Tom



-Original Message-
From: Jack Viljoen [mailto:javilj...@gmail.com]
Sent: 30 July 2013 10:23
To: Tom Wenseleers
Subject: Re: [R-sig-phylo] Question on ace ML reconstruction of discrete binary 
character

Hello, Tom.

I was just wondering if the higher uncertainty at the basal nodes isn't 
expected, particularly given the long branches descended from them?

Since this is an ML estimate and not a Bayesian one, surely the concept of 
priors does not apply? My understanding is that ace() actually only estimates 
the root node and that other methods are required to properly estimate the 
states at the other nodes. I'm basing this on these posts from Liam Revell 
earlier this year:
http://blog.phytools.org/2013/03/conditional-scaled-likelihoods-in-ace.html
http://blog.phytools.org/2013/03/a-little-more-on-ancestral-state.html

I hope those links shed some light on the matter, or that someone who knows 
about this stuff has responded to you off-list as well.

Good luck,
Jack



--

Message: 1
Date: Mon, 29 Jul 2013 16:00:28 +
From: Tom Wenseleers tom.wensele...@bio.kuleuven.be
To: r-sig-phylo@r-project.org r-sig-phylo@r-project.org
Subject: [R-sig-phylo] Question on ace ML reconstruction of discrete
 binary  character
Message-ID:

37efc97028f3e44082acc5cbec00563011294...@icts-s-mbx7.luna.kuleuven.be




Content-Type: text/plain; charset=us-ascii

Dear all,

I just did some ancestral state reconstructions of binary characters 
(screenshot attached) using ace (using an equal rate discrete character 
reconstruction) . Everything seems to make sense to me, except the two basal 
nodes, where I end up with quite low likelihoods for my red character being 1 
(cf. the pie charts), even though I get higher likelihoods at practically all 
of the more shallow nodes in the tree. Any ideas why one can get a result like 
this, and what I could potentially do about it, since it doesn't seem quite 
right to me?



Cheers,

Tom

__
_

Prof. Tom

Re: [R-sig-phylo] Question on ace ML reconstruction of discrete binary character

2013-07-30 Thread Marguerite Butler
Oops. Sorry the citation is Schluter, Price, Mooers, Ludwig 1997. Likelihood of 
ancestor states in adaptive radiation. Evolution 51:1699-1711.  This issue has 
been known for a long time. 


On Jul 30, 2013, at 6:51 AM, Marguerite Butler mbutler...@gmail.com wrote:

 Hi Tom,
 
 One thing to keep in mind is the information content of the data relative to 
 what you are trying to infer. Basically, you have data only at the tips, but 
 are trying to infer the state of the root deep in the tree. So therefore 
 there is actually very little information being brought to bear on this 
 problem. In this case, whatever answer you get will very strongly reflect the 
 assumptions of the model that you apply and the structure of the tree. Put 
 another way, if you were to construct error bars around this character state 
 estimate, you would see that they are huge (See Moers et al. 1997 in 
 Evolution).   
 
 It sounds like you are expecting a linear parsimony reconstruction. Why not 
 just use that? Your character does not change very much on the tree. This is 
 basically what your ML answer is telling you anyway, more than 50% chance of 
 red at the base.  
 
 Marguerite
 
 On Jul 30, 2013, at 4:38 AM, Liam J. Revell liam.rev...@umb.edu wrote:
 
 Hi Tom.
 
 There is no reason to expect that the marginal ancestral state 
 reconstructions at the root node (empirical Bayesian posterior 
 probabilities) should match your tip frequencies or prior probabilities. 
 Imagine the following scenario: you have one diverse clade comprising 50% of 
 extant taxa that all diverged recently from a common ancestor share state 
 B; whereas state A is found in all the other tips of the tree, some of 
 which are in clades originating near the root. We would not expect posterior 
 probabilities at the root node to mach the empirical frequencies of our 
 state at the tips (50:50). In fact, we might expect that our reconstructed 
 state at the root of the tree would be strongly A.
 
 In your specific case, state 0 is found in three clades that originate 
 nearer to the root, whereas more nested clades are exclusively in state 1. 
 This is why - in spite of its relative rarity across the tips of the tree - 
 there is still some reasonable (PP~0.3) posterior probability under the 
 model that the root is in state 0. This is not an error that needs to be 
 corrected - it is just what your data, model, and tree tell us about the 
 ancestral node of the phylogeny.
 
 All the best, Liam
 
 Liam J. Revell, Assistant Professor of Biology
 University of Massachusetts Boston
 web: http://faculty.umb.edu/liam.revell/
 email: liam.rev...@umb.edu
 blog: http://blog.phytools.org
 
 On 7/30/2013 6:10 AM, Tom Wenseleers wrote:
 Dear all,
 Many thanks for all your advice so far. I have now moved to using rayDISC 
 in package corHMM to reconstruct marginal maximum likelihood ancestral 
 state reconstructions, using the method of Maddison et al (2007) and 
 FitzJohn et al (2009) to fix the prior probabilities at the root (setting 
 it to the observed frequency at the tips doesn't change much).
 The code I have is
 
 library(ape)
 library(corHMM)
 tree=read.tree(http://www.kuleuven.be/bio/ento/temp/tree.tre;)
 data=read.csv(file=http://www.kuleuven.be/bio/ento/temp/data.csv;)
 rownames(data)=data[,1]
 ASR=rayDISC(tree,data,ntraits=1,charnum=1,model=ER,node.states=marginal,root.p=maddfitz)
 plot(tree, cex=0.6, show.tip.label=TRUE, ljoin=2,lend=2,label.offset=0.02)
 nodelabels(pie=ASR$states,piecol=c(white,red), cex=0.45)
 tiplabels(pch = 22, bg = ifelse(data[tree$tip.label, ][,2],red,white), 
 col=black,adj = c(0.51, 0.5), cex = 0.6)
 
 I still get unusually low marginal ML values for the trait being 1 at the 
 basal nodes though (ca. 0.7, which is very low considering that 89% of my 
 species have the trait).
 Would anyone be able to offer advice on why one could get the reconstructed 
 root ML value to be so much lower than the actual observed frequency of the 
 trait at the tips, and what could be a solution to obtaining a more 
 realistic ML reconstruction? (I also tried diversitree and phangorn, but 
 they all give similar results)
 
 Cheers,
 Tom
 
 
 
 -Original Message-
 From: Jack Viljoen [mailto:javilj...@gmail.com]
 Sent: 30 July 2013 10:23
 To: Tom Wenseleers
 Subject: Re: [R-sig-phylo] Question on ace ML reconstruction of discrete 
 binary character
 
 Hello, Tom.
 
 I was just wondering if the higher uncertainty at the basal nodes isn't 
 expected, particularly given the long branches descended from them?
 
 Since this is an ML estimate and not a Bayesian one, surely the concept of 
 priors does not apply? My understanding is that ace() actually only 
 estimates the root node and that other methods are required to properly 
 estimate the states at the other nodes. I'm basing this on these posts from 
 Liam Revell earlier this year:
 http://blog.phytools.org/2013/03/conditional-scaled-likelihoods-in-ace.html
 http://blog.phytools.org