Re: [R-sig-phylo] R-sig-phylo Digest, Vol 106, Issue 15
I found the java software package TreeCmp to be very good with the pruning function implemented: Bogdanowicz, Damian, Krzysztof Giaro, and Borys Wróbel. “TreeCmp: Comparison of Trees in Polynomial Time.” *Evolutionary Bioinformatics Online* 8 (2012): 475–87. Chris Buddenhagen cbuddenha...@gmail.com On Fri, Nov 25, 2016 at 6:00 AM, <r-sig-phylo-requ...@r-project.org> wrote: > Send R-sig-phylo mailing list submissions to > r-sig-phylo@r-project.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > or, via email, send a message with subject or body 'help' to > r-sig-phylo-requ...@r-project.org > > You can reach the person managing the list at > r-sig-phylo-ow...@r-project.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of R-sig-phylo digest..." > > > Today's Topics: > >1. Re: distances between trees (Emmanuel Paradis) > > > -- > > Message: 1 > Date: Thu, 24 Nov 2016 18:36:37 +0100 > From: Emmanuel Paradis <emmanuel.para...@ird.fr> > To: Karla Shikev <karlashi...@gmail.com>, r-sig-phylo@r-project.org > Subject: Re: [R-sig-phylo] distances between trees > Message-ID: <8de184e7-3aba-2c32-42b2-54d5263bd...@ird.fr> > Content-Type: text/plain; charset=windows-1252; format=flowed > > Hi Karla, > > I cannot answer to your question on comparing trees with different sets > of labels, but surely dist.topo() should return an error in this > situation. I modified the code to handle this -- it is also much faster > (~ 100 times faster when comparing 100 trees with 100 tips). > > And yes, you are right: from the definition of the RF distance in > ?RF.dist and the one for the PH distance in ?dist.topo, they are the same. > > Best, > > Emmanuel > > Le 20/11/2016 ? 18:07, Karla Shikev a ?crit : > > Here's another question. > > > > I'm looking for ways to compute distances between two trees when there > are > > differences in the sets of tip labels. > > > > Based on some preliminary tests using simulated trees, phangorn's RF.dist > > gives me exactly the same results as ape's dist.topo (which makes me > wonder > > if Penny and Hendy (1985)'s topological distance and the Robinson-Foulds > > distance are equivalent). But, if there is one tip missing from one of > the > > trees, dist.topo gives me a distance anyway, but RF.dist gives me an > error > > message: > > > >> tr1<-pbtree(n=30) > >> tr2<-pbtree(n=29) > >> dist.topo(tr1, tr2) > > [1] 51 > >> RF.dist(tr1, tr2) > > Error in RF.dist(tr1, tr2) : trees have different labels > > > > However, this distance does not seem reasonable to me. For instance, if I > > take the same tree and drop different tips, I get: > > > >> tr<-pbtree(n=10) > >> tr1<-drop.tip(tr, "t1") > >> tr2<-drop.tip(tr, "t2") > >> dist.topo(tr1, tr2) > > [1] 4 > >> RF.dist(tr1, tr2) > > Error in RF.dist(tr1, tr2) : trees have different labels > > > > Sometimes dist.topo gives me a distance of 2, other times I get 4, > > depending on the simulated tree, yet both trees are completely consistent > > with one another. Is there a metric that would circumvent this issue? > > > > Thanks again, > > > > Karla > > > > [[alternative HTML version deleted]] > > > > ___ > > R-sig-phylo mailing list - R-sig-phylo@r-project.org > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > > Searchable archive at http://www.mail-archive.com/r- > sig-ph...@r-project.org/ > > > > > > Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : > http://f.security-mail.net/301GGR55Htk > > > > > > > > -- > > Subject: Digest Footer > > ___ > R-sig-phylo mailing list > R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > > -- > > End of R-sig-phylo Digest, Vol 106, Issue 15 > > [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] chronos ape package does it return confidence intervals too?
Just this quick question Chris Buddenhagen cbuddenha...@gmail.com [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Associated r created graphs or external images with specific nodes on a phylogeny
Is there a way to quickly associate a small graph created in R with each node on a tree and have the graphic appear on the node when plotted? Or alternatively is there a way to get out the node coordinates of each node for a given plot? Sincerely Chris Buddenhagen cbuddenha...@gmail.com [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Mirrored trees with connecting lines between taxa that are not in matching locations
I want to visually compare topologies between a chloroplast & nuclear DNA derived trees. I remember a post where some r code/package is available to draw mirrored trees with lines between taxa that are non-matching between them. Please can someone remind me about this package? Hopefully, Chris Buddenhagen cbuddenha...@gmail.com [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] 3 dimensional ordination of tree distances
Is there a way to plot gene tree similarities in 3D ordinal space? I want to identify groups and outliers for hundreds of gene trees. I am pretty happy with TreeCmp (java program and graphing distances in ggplot) but I'd be keen to see this in a more multi-dimensional format. Cheers Chris Buddenhagen cbuddenha...@gmail.com Bogdanowicz, D., K. Giaro, and B. Wróbel. 2012. TreeCmp: Comparison of trees in polynomial time. *Evolutionary Bioinformatics Online* 8: 475–487. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] get mean support and sd for 100s of gene trees - also node match counting gene trees vs reference
Dear all My goal is to read in 352 or more trees and get out mean and sd support values for each. Ideally i read in all 352 trees (or reference files in in folder) and the function returns a list of support values with sd for each tree. For now I am having problems getting the support values to be treated as numbers. Can anyone help? I tried this on a single tree. #read in tree and calculate mean and sd tr<-read.tree(file.choose())#bipartout.tree > tr_sd <- sd(as.numeric(tr$node.label)) > tr_mean <- mean(as.numeric(tr$node.label)) > tr_sd [1] NA > tr_mean > tr_mean [1] NA #check numbers are OK > are_numbers<-as.numeric(tr$node.label) > are_numbers [1] NA NA 86 97 58 40 88 100 18 7 7 2 1 4 25 38 1 3 62 1 52 82 1 1 7 8 [27] 64 96 48 32 40 61 72 49 61 67 32 52 54 41 36 93 97 92 70 75 12 9 5 83 100 38 [53] 21 83 70 65 43 66 70 73 91 100 82 88 97 83 88 37 23 99 62 26 87 0 0 1 14 15 [79] 21 62 28 55 76 73 94 #so doesn't this look OK > mean(are_numbers) [1] NA #Not sure why its not calculating. What is going on here? Another question: is there a way to use a reference total evidence tree and get an idea how many gene trees support the same descendants (tip members) for a node? Then maybe do the same and take into account that not all gene trees have the same tips? Perhaps pruning the total evidence tree to the shared taxa in the gene trees? Hopefully Chris Buddenhagen cbuddenha...@gmail.com [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Pairwise genetic distances for many genes and taxa
Dear r-sig-phylo folks I am thinking of looking for outlier genes per the recent paper: Xu X., Dunn K.A., Field C. 2015. A Robust ANOVA Approach to Estimating a Phylogeny from Multiple Genes. Molecular Biology and Evolution. The starting point for that analysis is a matrix of genes by taxa pairs (with cells containing genetic distances for each combination). I have 100s of loci either as separate phylip files or as a single concatenated phylip file with a separate file specifying locus coordinates. Any ideas how I could quickly generate the matrix? Sincerely Chris Buddenhagen Florida State University cbuddenha...@gmail.com [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Determining the most informative regions for a selected clade
Given a tree is there a way to determine which DNA regions best explain clade membership within user selected clades on a tree? This would be within 261 loci represented by multiple alignments or a partitioned concatenation? [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Is there an existing method for tallying relationships between 3 clades for 100s of trees?
Dear all I have 100s of trees in nexus or phylip format estimated from different alignments using sequences taken from an identical set of taxa, the tree has about a 10 clades and 50 taxa. All trees are stored in a single directory. I am interested in tallying the relationship (what clade is sister to what) for each loci, and the support value for 3 clades in particular i.e. A, B and C. I am pretty confident that clade membership within clades A, B and C will be maintained. But some taxa within in each could occasionally have missing data for a locus (5% or so). I hope to get out a table with the following fields. Loci# (taken from file name), Relationship Supported, Support Value for the relationship Any ideas? Hopefully Chris Buddenhagen FSU cbuddenha...@gmail.com [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] pairwise comparisons of 100s of gene tree topologies
I would like to make pairwise comparisons of topological similarity between all possible combinations of 518 gene trees. The expected output would be a matrix of topological distances for each gene tree to each other tree. Any suggestions? Also as an aside, is there a way to mechanize the estimation of 100s of gene trees from alignments, such that the best model models of nucleotide substitution is chosen objectively and then a tree is generated using likelihood or Bayesian methods. Ideally I give the program the folder of 100s of alignments and tell it to go and wait for the gene trees. Best Chris Buddenhagen Florida State University cbuddenha...@gmail.com [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Does possession of a character state affect number of regions occupied or rate of diversification?
Hello all Just point me in the right direction please (I hope to figure this out in the next 36 hours for an assignment). 1) I have a dated tree from beast, and data with plant habit character with 3 states (epiphyte, geophyte or other terrestrial). These habit character states are I think correlated with dispersability. I would like to test whether being in any of those states changes the rate of diversification on the tree and whether it affects the number of geographic regions occupied (continuous variable). Can you point me in the right direction? I do not have an estimate of the total number of species in each clade though but have 80 taxa on the tree. 2) I also want to compare diversification rates between two separate dated trees. Is there an accepted way to do that? 3) Also I have a matrix of taxa vs 9 regions with 0,1 or ? indicating occupancy in each region, a few taxa occur in all regions, most in two, some in one. Can I use this data and my time calibrated beast tree to reconstruct ancestral areas in R? Cheers Chris Buddenhagen Florida State University [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] Fully annotated phylogeny using R?
Hi all I would really like to see an example of code for a fully annotated phylogeny done using R if anyone has it? Say combining symbols, words, related empirical data? Or alternatively tell me its too hard and there are better programs out there? Hopefully, Chris Buddenhagen Florida State University Graduate Student [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo