Dear Emmanuel and Liam,

thank you for your answers. I tried ace with the modifications suggested by 
Emmanuel and make.simmap from phytools suggested by Liam, both worked well for 
me. I think it would be worth to add these modifications to the ace function, 
or at least a note in the manual stating that missing info should be coded "NA" 
(which wasn't obvious to me). Surprisingly I got an error message when running 
an ace with "NA" instead of "?" in my character file:

Error in matexpo(Q * EL[i]) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In ace(char3_NA, tree_cramb, model = "ER", type = "discrete") :
model fit suspicious: gradients apparently non-finite

However, I did not pay more attention to it as first two solutions work fine.

Cheers,
Th�o





________________________________
De : Emmanuel Paradis <emmanuel.para...@ird.fr>
Envoy� : vendredi, 22 juin 2018 15:45:05
� : Th�o L�ger; r-sig-phylo@r-project.org
Objet : Re: [R-sig-phylo] ape-package: unknown states in ace analyses?

Two more comments:

1/ The first insertion in the code of ace() below is not needed if the
user replaces "?" (or any coding for unknown/uncertain state) with NA --
which seems reasonable to do.

2/ This approach of taking uncertainty into account in likelihood
ancestral state reconstruction is also used for DNA sequences and
implemented in phangorn where uncertainty is (implicitly) taken from the
IUPAC ambiguity code. Because there are here four states, there is more
than one possibility of ambiguity, but the same logic applies: if the
tip "may be" in state x, then set column x to one in the matrix of
likelihoods:

1 0 0 0 # if the base observed is A
0 0 1 0 # if the base observed is G
1 0 1 0 # if the base observed is R
...
1 1 1 1 # if the base observed is N

This can be applied to any character with more than two states.

Best,

Emmanuel

Le 22/06/2018 � 11:41, Emmanuel Paradis a �crit :
> Hi Th�o,
>
> It is possible to modify the code of ace() to take uncertain character
> states into account. If you edit the code with fix(ace), after these
> four lines:
>
>     if (method != "ML")
>         stop("only ML estimation is possible for discrete characters.")
>     if (any(phy$edge.length <= 0))
>         stop("some branches have length zero or negative")
>
> you insert:
>
>      if (any(x == "?")) x[x == "?"] <- NA
>
> Then around 30 lines below, after this line:
>
>     liks[cbind(TIPS, x)] <- 1
>
> you insert:
>
>     if (anyNA(x)) liks[which(is.na(x)), ] <- 1
>
> Save and close. I tested this with a simple example:
>
> tr <- compute.brlen(stree(8, "b"), 1)
> x <- rep(0:1, each = 4)
> plot(tr)
> tiplabels(x)
>
> So there are two clades in state 0 and state 1, respectively. With no
> uncertainty, the node likelihoods are well unbalanced except at the root:
>
> R> ace(x, tr, "d")$lik.anc
>                 0           1
> [1,] 0.500000000 0.500000000
> [2,] 0.953204905 0.046795095
> [3,] 0.995642789 0.004357211
> [4,] 0.995642789 0.004357211
> [5,] 0.046795095 0.953204905
> [6,] 0.004357211 0.995642789
> [7,] 0.004357211 0.995642789
>
> Let's say the first value is unknown:
>
> x[1] <- "?"
>
> Now the node likelihoods are slightly "in favour" of state 1 because of
> the possibility of this state within the "state 0" clade:
>
> R> ace(x, tr, "d")$lik.anc
>                 0           1
> [1,] 0.479307333 0.520692667
> [2,] 0.904647485 0.095352515
> [3,] 0.943101903 0.056898097
> [4,] 0.990270355 0.009729645
> [5,] 0.053105869 0.946894131
> [6,] 0.005881435 0.994118565
> [7,] 0.005881435 0.994118565
>
> For comparison, here's what is returned by the current version in ape:
>
> R> ace(x, tr, "d")$lik.anc
>                 ?           0          1
> [1,] 0.093607676 0.404434328 0.50195800
> [2,] 0.096246928 0.787370632 0.11638244
> [3,] 0.201389158 0.750401358 0.04820948
> [4,] 0.011648214 0.974955230 0.01339656
> [5,] 0.017176009 0.050038984 0.93278501
> [6,] 0.003422534 0.006275985 0.99030148
> [7,] 0.003422534 0.006275985 0.99030148
>
> You may try this modification with your data and give feed-back: if it
> useful, I'll fix that in ape.
>
> Cheers,
>
> Emmanuel
>
> Le 21/06/2018 � 23:32, Th�o L�ger a �crit :
>> Hello,
>>
>>
>> I am working on the phylogeny of a subfamily of moths and I use ace
>> from the ape-package to reconstruct the ancestral state of a bunch of
>> morphological characters.
>>
>> I encountered a problem with the few unknown states I have on my
>> matrix (coded "?", either because material for examination was missing
>> or the state could not fit in any of the categories): they are treated
>> as other characters. Is there a way to ignore them? Is there a way to
>> estimate the state for the species with missing information? I know it
>> is possible to estimate the state at a tip with missing information in
>> functions like AncTresh (using bayesian reconstruction) from the
>> phytools package, but I am not sure if ML reconstruction can do it.
>>
>> Thanks in advance to those who can enlighten me!
>>
>>
>> Cheers,
>>
>> Th�o L�ger
>
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>
>
>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to