Joe,

I agree with what you wrote. To me, this makes even stronger the point of looking at the distribution of pairwise distances before estimating the tree. I'll modify boot.phylo() so that it randomizes rows by default.

Besides of this, Klaus Schliep and I are working on ways to improve coding of splits so that this should be faster, and will solve the problem of rooted vs. unrooted trees as handled by prop.part().

Best,

Emmanuel

Joe Felsenstein wrote on 09/05/2011 20:34:

Emmanuel wrote:

Is it a problem with ties or with identical sequences? I guess you can
solve the latter easily (eg, using the haplotype function in pegas), and
this will solve the vast majority of ties. Other cases of ties will
certainly not result in such high bootstrap values (that's my intuition).

My intuition disagrees with this.  If several sequences are nearly
identical, but each differs from their consensus at  K  sites,
them if the tree-making algorithm does not randomise
addition order of species (or otherwise somehow
randomize the resolution of ties), there are likely
to be artificially high bootstrap values even with
nonzero branch lenghs.

1. Dropping identical sequences is a good thing to do, especially with a distance-based method.

One issue is that after bootstrap sampling, some
sequences that were not identical may become
identical.  So the tree-making method ought to
be able to handle identical sequences.

2. A high bootstrap support (or another form of support) associated with a zero-length branch is an indication that something's wrong there.

Again, it can be a problem even when the branch lengths
are nonzero.

Joe
----
Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA
--
Emmanuel Paradis
IRD, Jakarta, Indonesia
http://ape.mpl.ird.fr/

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Reply via email to