Re: [R-sig-phylo] ape-package: unknown states in ace analyses?
Dear Emmanuel and Liam, thank you for your answers. I tried ace with the modifications suggested by Emmanuel and make.simmap from phytools suggested by Liam, both worked well for me. I think it would be worth to add these modifications to the ace function, or at least a note in the manual stating that missing info should be coded "NA" (which wasn't obvious to me). Surprisingly I got an error message when running an ace with "NA" instead of "?" in my character file: Error in matexpo(Q * EL[i]) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning message: In ace(char3_NA, tree_cramb, model = "ER", type = "discrete") : model fit suspicious: gradients apparently non-finite However, I did not pay more attention to it as first two solutions work fine. Cheers, Th�o De : Emmanuel Paradis Envoy� : vendredi, 22 juin 2018 15:45:05 � : Th�o L�ger; r-sig-phylo@r-project.org Objet : Re: [R-sig-phylo] ape-package: unknown states in ace analyses? Two more comments: 1/ The first insertion in the code of ace() below is not needed if the user replaces "?" (or any coding for unknown/uncertain state) with NA -- which seems reasonable to do. 2/ This approach of taking uncertainty into account in likelihood ancestral state reconstruction is also used for DNA sequences and implemented in phangorn where uncertainty is (implicitly) taken from the IUPAC ambiguity code. Because there are here four states, there is more than one possibility of ambiguity, but the same logic applies: if the tip "may be" in state x, then set column x to one in the matrix of likelihoods: 1 0 0 0 # if the base observed is A 0 0 1 0 # if the base observed is G 1 0 1 0 # if the base observed is R ... 1 1 1 1 # if the base observed is N This can be applied to any character with more than two states. Best, Emmanuel Le 22/06/2018 � 11:41, Emmanuel Paradis a �crit : > Hi Th�o, > > It is possible to modify the code of ace() to take uncertain character > states into account. If you edit the code with fix(ace), after these > four lines: > > if (method != "ML") > stop("only ML estimation is possible for discrete characters.") > if (any(phy$edge.length <= 0)) > stop("some branches have length zero or negative") > > you insert: > > if (any(x == "?")) x[x == "?"] <- NA > > Then around 30 lines below, after this line: > > liks[cbind(TIPS, x)] <- 1 > > you insert: > > if (anyNA(x)) liks[which(is.na(x)), ] <- 1 > > Save and close. I tested this with a simple example: > > tr <- compute.brlen(stree(8, "b"), 1) > x <- rep(0:1, each = 4) > plot(tr) > tiplabels(x) > > So there are two clades in state 0 and state 1, respectively. With no > uncertainty, the node likelihoods are well unbalanced except at the root: > > R> ace(x, tr, "d")$lik.anc > 0 1 > [1,] 0.5 0.5 > [2,] 0.953204905 0.046795095 > [3,] 0.995642789 0.004357211 > [4,] 0.995642789 0.004357211 > [5,] 0.046795095 0.953204905 > [6,] 0.004357211 0.995642789 > [7,] 0.004357211 0.995642789 > > Let's say the first value is unknown: > > x[1] <- "?" > > Now the node likelihoods are slightly "in favour" of state 1 because of > the possibility of this state within the "state 0" clade: > > R> ace(x, tr, "d")$lik.anc > 0 1 > [1,] 0.479307333 0.520692667 > [2,] 0.904647485 0.095352515 > [3,] 0.943101903 0.056898097 > [4,] 0.990270355 0.009729645 > [5,] 0.053105869 0.946894131 > [6,] 0.005881435 0.994118565 > [7,] 0.005881435 0.994118565 > > For comparison, here's what is returned by the current version in ape: > > R> ace(x, tr, "d")$lik.anc > ? 0 1 > [1,] 0.093607676 0.404434328 0.50195800 > [2,] 0.096246928 0.787370632 0.11638244 > [3,] 0.201389158 0.750401358 0.04820948 > [4,] 0.011648214 0.974955230 0.01339656 > [5,] 0.017176009 0.050038984 0.93278501 > [6,] 0.003422534 0.006275985 0.99030148 > [7,] 0.003422534 0.006275985 0.99030148 > > You may try this modification with your data and give feed-back: if it > useful, I'll fix that in ape. > > Cheers, > > Emmanuel > > Le 21/06/2018 � 23:32, Th�o L�ger a �crit : >> Hello, >> >> >> I am working on the phylogeny of a subfamily of moths and I use ace >> from the ape-package to reconstruct the ancestral state of a bunch of >> morphological characters. >> >> I encountered a problem with the few unknown states I have on my >> matrix (coded "?&qu
Re: [R-sig-phylo] ape-package: unknown states in ace analyses?
This is so cool. Thanks Liam! J > On Jun 21, 2018, at 8:22 PM, Liam J. Revell wrote: > > Dear Theo. > > This can be done with the function rerootingMethod as described here > http://blog.phytools.org/2013/04/estimating-ancestral-states-when.html, > although it is *very* important to note that rerootingMethod is only valid if > the fitted transition model is symmetric (e.g., "ER", "SYM", or any custom > model in which i->j == j->i for all i & j). It can also be done with > make.simmap for both symmetric & non-symmetric transition models. If you > sample enough stochastic maps & then run summary on the set of maps the > posterior probabilities will converge on the marginal ancestral states under > the defaults for make.simmap, so long as a flat prior is used on internal > nodes. This is also on my blog here: > http://blog.phytools.org/2013/03/estimating-ancestral-states-when-tips.html. > > All the best, Liam > > Liam J. Revell, Associate Professor of Biology > University of Massachusetts Boston > & Profesor Asociado, Programa de Biología > Universidad del Rosario > web: http://faculty.umb.edu/liam.revell/ > > On 6/21/2018 5:32 PM, Théo Léger wrote: >> Hello, >> I am working on the phylogeny of a subfamily of moths and I use ace from the >> ape-package to reconstruct the ancestral state of a bunch of morphological >> characters. >> I encountered a problem with the few unknown states I have on my matrix >> (coded "?", either because material for examination was missing or the state >> could not fit in any of the categories): they are treated as other >> characters. Is there a way to ignore them? Is there a way to estimate the >> state for the species with missing information? I know it is possible to >> estimate the state at a tip with missing information in functions like >> AncTresh (using bayesian reconstruction) from the phytools package, but I am >> not sure if ML reconstruction can do it. >> Thanks in advance to those who can enlighten me! >> Cheers, >> Th�o L�ger >> [[alternative HTML version deleted]] >> ___ >> R-sig-phylo mailing list - R-sig-phylo@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ >> > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] ape-package: unknown states in ace analyses?
Two more comments: 1/ The first insertion in the code of ace() below is not needed if the user replaces "?" (or any coding for unknown/uncertain state) with NA -- which seems reasonable to do. 2/ This approach of taking uncertainty into account in likelihood ancestral state reconstruction is also used for DNA sequences and implemented in phangorn where uncertainty is (implicitly) taken from the IUPAC ambiguity code. Because there are here four states, there is more than one possibility of ambiguity, but the same logic applies: if the tip "may be" in state x, then set column x to one in the matrix of likelihoods: 1 0 0 0 # if the base observed is A 0 0 1 0 # if the base observed is G 1 0 1 0 # if the base observed is R ... 1 1 1 1 # if the base observed is N This can be applied to any character with more than two states. Best, Emmanuel Le 22/06/2018 à 11:41, Emmanuel Paradis a écrit : Hi Théo, It is possible to modify the code of ace() to take uncertain character states into account. If you edit the code with fix(ace), after these four lines: if (method != "ML") stop("only ML estimation is possible for discrete characters.") if (any(phy$edge.length <= 0)) stop("some branches have length zero or negative") you insert: if (any(x == "?")) x[x == "?"] <- NA Then around 30 lines below, after this line: liks[cbind(TIPS, x)] <- 1 you insert: if (anyNA(x)) liks[which(is.na(x)), ] <- 1 Save and close. I tested this with a simple example: tr <- compute.brlen(stree(8, "b"), 1) x <- rep(0:1, each = 4) plot(tr) tiplabels(x) So there are two clades in state 0 and state 1, respectively. With no uncertainty, the node likelihoods are well unbalanced except at the root: R> ace(x, tr, "d")$lik.anc 0 1 [1,] 0.5 0.5 [2,] 0.953204905 0.046795095 [3,] 0.995642789 0.004357211 [4,] 0.995642789 0.004357211 [5,] 0.046795095 0.953204905 [6,] 0.004357211 0.995642789 [7,] 0.004357211 0.995642789 Let's say the first value is unknown: x[1] <- "?" Now the node likelihoods are slightly "in favour" of state 1 because of the possibility of this state within the "state 0" clade: R> ace(x, tr, "d")$lik.anc 0 1 [1,] 0.479307333 0.520692667 [2,] 0.904647485 0.095352515 [3,] 0.943101903 0.056898097 [4,] 0.990270355 0.009729645 [5,] 0.053105869 0.946894131 [6,] 0.005881435 0.994118565 [7,] 0.005881435 0.994118565 For comparison, here's what is returned by the current version in ape: R> ace(x, tr, "d")$lik.anc ? 0 1 [1,] 0.093607676 0.404434328 0.50195800 [2,] 0.096246928 0.787370632 0.11638244 [3,] 0.201389158 0.750401358 0.04820948 [4,] 0.011648214 0.974955230 0.01339656 [5,] 0.017176009 0.050038984 0.93278501 [6,] 0.003422534 0.006275985 0.99030148 [7,] 0.003422534 0.006275985 0.99030148 You may try this modification with your data and give feed-back: if it useful, I'll fix that in ape. Cheers, Emmanuel Le 21/06/2018 à 23:32, Théo Léger a écrit : Hello, I am working on the phylogeny of a subfamily of moths and I use ace from the ape-package to reconstruct the ancestral state of a bunch of morphological characters. I encountered a problem with the few unknown states I have on my matrix (coded "?", either because material for examination was missing or the state could not fit in any of the categories): they are treated as other characters. Is there a way to ignore them? Is there a way to estimate the state for the species with missing information? I know it is possible to estimate the state at a tip with missing information in functions like AncTresh (using bayesian reconstruction) from the phytools package, but I am not sure if ML reconstruction can do it. Thanks in advance to those who can enlighten me! Cheers, Théo Léger ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] ape-package: unknown states in ace analyses?
Hi Théo, It is possible to modify the code of ace() to take uncertain character states into account. If you edit the code with fix(ace), after these four lines: if (method != "ML") stop("only ML estimation is possible for discrete characters.") if (any(phy$edge.length <= 0)) stop("some branches have length zero or negative") you insert: if (any(x == "?")) x[x == "?"] <- NA Then around 30 lines below, after this line: liks[cbind(TIPS, x)] <- 1 you insert: if (anyNA(x)) liks[which(is.na(x)), ] <- 1 Save and close. I tested this with a simple example: tr <- compute.brlen(stree(8, "b"), 1) x <- rep(0:1, each = 4) plot(tr) tiplabels(x) So there are two clades in state 0 and state 1, respectively. With no uncertainty, the node likelihoods are well unbalanced except at the root: R> ace(x, tr, "d")$lik.anc 0 1 [1,] 0.5 0.5 [2,] 0.953204905 0.046795095 [3,] 0.995642789 0.004357211 [4,] 0.995642789 0.004357211 [5,] 0.046795095 0.953204905 [6,] 0.004357211 0.995642789 [7,] 0.004357211 0.995642789 Let's say the first value is unknown: x[1] <- "?" Now the node likelihoods are slightly "in favour" of state 1 because of the possibility of this state within the "state 0" clade: R> ace(x, tr, "d")$lik.anc 0 1 [1,] 0.479307333 0.520692667 [2,] 0.904647485 0.095352515 [3,] 0.943101903 0.056898097 [4,] 0.990270355 0.009729645 [5,] 0.053105869 0.946894131 [6,] 0.005881435 0.994118565 [7,] 0.005881435 0.994118565 For comparison, here's what is returned by the current version in ape: R> ace(x, tr, "d")$lik.anc ? 0 1 [1,] 0.093607676 0.404434328 0.50195800 [2,] 0.096246928 0.787370632 0.11638244 [3,] 0.201389158 0.750401358 0.04820948 [4,] 0.011648214 0.974955230 0.01339656 [5,] 0.017176009 0.050038984 0.93278501 [6,] 0.003422534 0.006275985 0.99030148 [7,] 0.003422534 0.006275985 0.99030148 You may try this modification with your data and give feed-back: if it useful, I'll fix that in ape. Cheers, Emmanuel Le 21/06/2018 à 23:32, Théo Léger a écrit : Hello, I am working on the phylogeny of a subfamily of moths and I use ace from the ape-package to reconstruct the ancestral state of a bunch of morphological characters. I encountered a problem with the few unknown states I have on my matrix (coded "?", either because material for examination was missing or the state could not fit in any of the categories): they are treated as other characters. Is there a way to ignore them? Is there a way to estimate the state for the species with missing information? I know it is possible to estimate the state at a tip with missing information in functions like AncTresh (using bayesian reconstruction) from the phytools package, but I am not sure if ML reconstruction can do it. Thanks in advance to those who can enlighten me! Cheers, Théo Léger ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] ape-package: unknown states in ace analyses?
Dear Theo. This can be done with the function rerootingMethod as described here http://blog.phytools.org/2013/04/estimating-ancestral-states-when.html, although it is *very* important to note that rerootingMethod is only valid if the fitted transition model is symmetric (e.g., "ER", "SYM", or any custom model in which i->j == j->i for all i & j). It can also be done with make.simmap for both symmetric & non-symmetric transition models. If you sample enough stochastic maps & then run summary on the set of maps the posterior probabilities will converge on the marginal ancestral states under the defaults for make.simmap, so long as a flat prior is used on internal nodes. This is also on my blog here: http://blog.phytools.org/2013/03/estimating-ancestral-states-when-tips.html. All the best, Liam Liam J. Revell, Associate Professor of Biology University of Massachusetts Boston & Profesor Asociado, Programa de Biología Universidad del Rosario web: http://faculty.umb.edu/liam.revell/ On 6/21/2018 5:32 PM, Théo Léger wrote: Hello, I am working on the phylogeny of a subfamily of moths and I use ace from the ape-package to reconstruct the ancestral state of a bunch of morphological characters. I encountered a problem with the few unknown states I have on my matrix (coded "?", either because material for examination was missing or the state could not fit in any of the categories): they are treated as other characters. Is there a way to ignore them? Is there a way to estimate the state for the species with missing information? I know it is possible to estimate the state at a tip with missing information in functions like AncTresh (using bayesian reconstruction) from the phytools package, but I am not sure if ML reconstruction can do it. Thanks in advance to those who can enlighten me! Cheers, Th�o L�ger [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/