Re: [R-sig-phylo] ape-package: unknown states in ace analyses?

2018-06-25 Thread Théo Léger
Dear Emmanuel and Liam,


thank you for your answers. I tried ace with the modifications suggested by 
Emmanuel and make.simmap from phytools suggested by Liam, both worked well for 
me. I think it would be worth to add these modifications to the ace function, 
or at least a note in the manual stating that missing info should be coded "NA" 
(which wasn't obvious to me). Surprisingly I got an error message when running 
an ace with "NA" instead of "?" in my character file:

Error in matexpo(Q * EL[i]) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In ace(char3_NA, tree_cramb, model = "ER", type = "discrete") :
model fit suspicious: gradients apparently non-finite

However, I did not pay more attention to it as first two solutions work fine.

Cheers,
Th�o






De : Emmanuel Paradis 
Envoy� : vendredi, 22 juin 2018 15:45:05
� : Th�o L�ger; r-sig-phylo@r-project.org
Objet : Re: [R-sig-phylo] ape-package: unknown states in ace analyses?

Two more comments:

1/ The first insertion in the code of ace() below is not needed if the
user replaces "?" (or any coding for unknown/uncertain state) with NA --
which seems reasonable to do.

2/ This approach of taking uncertainty into account in likelihood
ancestral state reconstruction is also used for DNA sequences and
implemented in phangorn where uncertainty is (implicitly) taken from the
IUPAC ambiguity code. Because there are here four states, there is more
than one possibility of ambiguity, but the same logic applies: if the
tip "may be" in state x, then set column x to one in the matrix of
likelihoods:

1 0 0 0 # if the base observed is A
0 0 1 0 # if the base observed is G
1 0 1 0 # if the base observed is R
...
1 1 1 1 # if the base observed is N

This can be applied to any character with more than two states.

Best,

Emmanuel

Le 22/06/2018 � 11:41, Emmanuel Paradis a �crit :
> Hi Th�o,
>
> It is possible to modify the code of ace() to take uncertain character
> states into account. If you edit the code with fix(ace), after these
> four lines:
>
> if (method != "ML")
> stop("only ML estimation is possible for discrete characters.")
> if (any(phy$edge.length <= 0))
> stop("some branches have length zero or negative")
>
> you insert:
>
>  if (any(x == "?")) x[x == "?"] <- NA
>
> Then around 30 lines below, after this line:
>
> liks[cbind(TIPS, x)] <- 1
>
> you insert:
>
> if (anyNA(x)) liks[which(is.na(x)), ] <- 1
>
> Save and close. I tested this with a simple example:
>
> tr <- compute.brlen(stree(8, "b"), 1)
> x <- rep(0:1, each = 4)
> plot(tr)
> tiplabels(x)
>
> So there are two clades in state 0 and state 1, respectively. With no
> uncertainty, the node likelihoods are well unbalanced except at the root:
>
> R> ace(x, tr, "d")$lik.anc
> 0   1
> [1,] 0.5 0.5
> [2,] 0.953204905 0.046795095
> [3,] 0.995642789 0.004357211
> [4,] 0.995642789 0.004357211
> [5,] 0.046795095 0.953204905
> [6,] 0.004357211 0.995642789
> [7,] 0.004357211 0.995642789
>
> Let's say the first value is unknown:
>
> x[1] <- "?"
>
> Now the node likelihoods are slightly "in favour" of state 1 because of
> the possibility of this state within the "state 0" clade:
>
> R> ace(x, tr, "d")$lik.anc
> 0   1
> [1,] 0.479307333 0.520692667
> [2,] 0.904647485 0.095352515
> [3,] 0.943101903 0.056898097
> [4,] 0.990270355 0.009729645
> [5,] 0.053105869 0.946894131
> [6,] 0.005881435 0.994118565
> [7,] 0.005881435 0.994118565
>
> For comparison, here's what is returned by the current version in ape:
>
> R> ace(x, tr, "d")$lik.anc
> ?   0  1
> [1,] 0.093607676 0.404434328 0.50195800
> [2,] 0.096246928 0.787370632 0.11638244
> [3,] 0.201389158 0.750401358 0.04820948
> [4,] 0.011648214 0.974955230 0.01339656
> [5,] 0.017176009 0.050038984 0.93278501
> [6,] 0.003422534 0.006275985 0.99030148
> [7,] 0.003422534 0.006275985 0.99030148
>
> You may try this modification with your data and give feed-back: if it
> useful, I'll fix that in ape.
>
> Cheers,
>
> Emmanuel
>
> Le 21/06/2018 � 23:32, Th�o L�ger a �crit :
>> Hello,
>>
>>
>> I am working on the phylogeny of a subfamily of moths and I use ace
>> from the ape-package to reconstruct the ancestral state of a bunch of
>> morphological characters.
>>
>> I encountered a problem with the few unknown states I have on my
>> matrix (coded "?", either because 

Re: [R-sig-phylo] ape-package: unknown states in ace analyses?

2018-06-22 Thread Jacob Berv
This is so cool. Thanks Liam!
J

> On Jun 21, 2018, at 8:22 PM, Liam J. Revell  wrote:
> 
> Dear Theo.
> 
> This can be done with the function rerootingMethod as described here 
> http://blog.phytools.org/2013/04/estimating-ancestral-states-when.html, 
> although it is *very* important to note that rerootingMethod is only valid if 
> the fitted transition model is symmetric (e.g., "ER", "SYM", or any custom 
> model in which i->j == j->i for all i & j). It can also be done with 
> make.simmap for both symmetric & non-symmetric transition models. If you 
> sample enough stochastic maps & then run summary on the set of maps the 
> posterior probabilities will converge on the marginal ancestral states under 
> the defaults for make.simmap, so long as a flat prior is used on internal 
> nodes. This is also on my blog here: 
> http://blog.phytools.org/2013/03/estimating-ancestral-states-when-tips.html.
> 
> All the best, Liam
> 
> Liam J. Revell, Associate Professor of Biology
> University of Massachusetts Boston
> & Profesor Asociado, Programa de Biología
> Universidad del Rosario
> web: http://faculty.umb.edu/liam.revell/
> 
> On 6/21/2018 5:32 PM, Théo Léger wrote:
>> Hello,
>> I am working on the phylogeny of a subfamily of moths and I use ace from the 
>> ape-package to reconstruct the ancestral state of a bunch of morphological 
>> characters.
>> I encountered a problem with the few unknown states I have on my matrix 
>> (coded "?", either because material for examination was missing or the state 
>> could not fit in any of the categories): they are treated as other 
>> characters. Is there a way to ignore them? Is there a way to estimate the 
>> state for the species with missing information? I know it is possible to 
>> estimate the state at a tip with missing information in functions like 
>> AncTresh (using bayesian reconstruction) from the phytools package, but I am 
>> not sure if ML reconstruction can do it.
>> Thanks in advance to those who can enlighten me!
>> Cheers,
>> Th�o L�ger
>>  [[alternative HTML version deleted]]
>> ___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
>> 
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] ape-package: unknown states in ace analyses?

2018-06-22 Thread Emmanuel Paradis

Two more comments:

1/ The first insertion in the code of ace() below is not needed if the 
user replaces "?" (or any coding for unknown/uncertain state) with NA -- 
which seems reasonable to do.


2/ This approach of taking uncertainty into account in likelihood 
ancestral state reconstruction is also used for DNA sequences and 
implemented in phangorn where uncertainty is (implicitly) taken from the 
IUPAC ambiguity code. Because there are here four states, there is more 
than one possibility of ambiguity, but the same logic applies: if the 
tip "may be" in state x, then set column x to one in the matrix of 
likelihoods:


1 0 0 0 # if the base observed is A
0 0 1 0 # if the base observed is G
1 0 1 0 # if the base observed is R
...
1 1 1 1 # if the base observed is N

This can be applied to any character with more than two states.

Best,

Emmanuel

Le 22/06/2018 à 11:41, Emmanuel Paradis a écrit :

Hi Théo,

It is possible to modify the code of ace() to take uncertain character 
states into account. If you edit the code with fix(ace), after these 
four lines:


    if (method != "ML")
    stop("only ML estimation is possible for discrete characters.")
    if (any(phy$edge.length <= 0))
    stop("some branches have length zero or negative")

you insert:

     if (any(x == "?")) x[x == "?"] <- NA

Then around 30 lines below, after this line:

    liks[cbind(TIPS, x)] <- 1

you insert:

    if (anyNA(x)) liks[which(is.na(x)), ] <- 1

Save and close. I tested this with a simple example:

tr <- compute.brlen(stree(8, "b"), 1)
x <- rep(0:1, each = 4)
plot(tr)
tiplabels(x)

So there are two clades in state 0 and state 1, respectively. With no 
uncertainty, the node likelihoods are well unbalanced except at the root:


R> ace(x, tr, "d")$lik.anc
    0   1
[1,] 0.5 0.5
[2,] 0.953204905 0.046795095
[3,] 0.995642789 0.004357211
[4,] 0.995642789 0.004357211
[5,] 0.046795095 0.953204905
[6,] 0.004357211 0.995642789
[7,] 0.004357211 0.995642789

Let's say the first value is unknown:

x[1] <- "?"

Now the node likelihoods are slightly "in favour" of state 1 because of 
the possibility of this state within the "state 0" clade:


R> ace(x, tr, "d")$lik.anc
    0   1
[1,] 0.479307333 0.520692667
[2,] 0.904647485 0.095352515
[3,] 0.943101903 0.056898097
[4,] 0.990270355 0.009729645
[5,] 0.053105869 0.946894131
[6,] 0.005881435 0.994118565
[7,] 0.005881435 0.994118565

For comparison, here's what is returned by the current version in ape:

R> ace(x, tr, "d")$lik.anc
    ?   0  1
[1,] 0.093607676 0.404434328 0.50195800
[2,] 0.096246928 0.787370632 0.11638244
[3,] 0.201389158 0.750401358 0.04820948
[4,] 0.011648214 0.974955230 0.01339656
[5,] 0.017176009 0.050038984 0.93278501
[6,] 0.003422534 0.006275985 0.99030148
[7,] 0.003422534 0.006275985 0.99030148

You may try this modification with your data and give feed-back: if it 
useful, I'll fix that in ape.


Cheers,

Emmanuel

Le 21/06/2018 à 23:32, Théo Léger a écrit :

Hello,


I am working on the phylogeny of a subfamily of moths and I use ace 
from the ape-package to reconstruct the ancestral state of a bunch of 
morphological characters.


I encountered a problem with the few unknown states I have on my 
matrix (coded "?", either because material for examination was missing 
or the state could not fit in any of the categories): they are treated 
as other characters. Is there a way to ignore them? Is there a way to 
estimate the state for the species with missing information? I know it 
is possible to estimate the state at a tip with missing information in 
functions like AncTresh (using bayesian reconstruction) from the 
phytools package, but I am not sure if ML reconstruction can do it.


Thanks in advance to those who can enlighten me!


Cheers,

Théo Léger


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at 
http://www.mail-archive.com/r-sig-phylo@r-project.org/








___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] ape-package: unknown states in ace analyses?

2018-06-22 Thread Emmanuel Paradis

Hi Théo,

It is possible to modify the code of ace() to take uncertain character 
states into account. If you edit the code with fix(ace), after these 
four lines:


   if (method != "ML")
   stop("only ML estimation is possible for discrete characters.")
   if (any(phy$edge.length <= 0))
   stop("some branches have length zero or negative")

you insert:

if (any(x == "?")) x[x == "?"] <- NA

Then around 30 lines below, after this line:

   liks[cbind(TIPS, x)] <- 1

you insert:

   if (anyNA(x)) liks[which(is.na(x)), ] <- 1

Save and close. I tested this with a simple example:

tr <- compute.brlen(stree(8, "b"), 1)
x <- rep(0:1, each = 4)
plot(tr)
tiplabels(x)

So there are two clades in state 0 and state 1, respectively. With no 
uncertainty, the node likelihoods are well unbalanced except at the root:


R> ace(x, tr, "d")$lik.anc
   0   1
[1,] 0.5 0.5
[2,] 0.953204905 0.046795095
[3,] 0.995642789 0.004357211
[4,] 0.995642789 0.004357211
[5,] 0.046795095 0.953204905
[6,] 0.004357211 0.995642789
[7,] 0.004357211 0.995642789

Let's say the first value is unknown:

x[1] <- "?"

Now the node likelihoods are slightly "in favour" of state 1 because of 
the possibility of this state within the "state 0" clade:


R> ace(x, tr, "d")$lik.anc
   0   1
[1,] 0.479307333 0.520692667
[2,] 0.904647485 0.095352515
[3,] 0.943101903 0.056898097
[4,] 0.990270355 0.009729645
[5,] 0.053105869 0.946894131
[6,] 0.005881435 0.994118565
[7,] 0.005881435 0.994118565

For comparison, here's what is returned by the current version in ape:

R> ace(x, tr, "d")$lik.anc
   ?   0  1
[1,] 0.093607676 0.404434328 0.50195800
[2,] 0.096246928 0.787370632 0.11638244
[3,] 0.201389158 0.750401358 0.04820948
[4,] 0.011648214 0.974955230 0.01339656
[5,] 0.017176009 0.050038984 0.93278501
[6,] 0.003422534 0.006275985 0.99030148
[7,] 0.003422534 0.006275985 0.99030148

You may try this modification with your data and give feed-back: if it 
useful, I'll fix that in ape.


Cheers,

Emmanuel

Le 21/06/2018 à 23:32, Théo Léger a écrit :

Hello,


I am working on the phylogeny of a subfamily of moths and I use ace from the 
ape-package to reconstruct the ancestral state of a bunch of morphological 
characters.

I encountered a problem with the few unknown states I have on my matrix (coded 
"?", either because material for examination was missing or the state could not 
fit in any of the categories): they are treated as other characters. Is there a way to 
ignore them? Is there a way to estimate the state for the species with missing 
information? I know it is possible to estimate the state at a tip with missing 
information in functions like AncTresh (using bayesian reconstruction) from the phytools 
package, but I am not sure if ML reconstruction can do it.

Thanks in advance to those who can enlighten me!


Cheers,

Théo Léger


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/