Joe,
I agree with what you wrote. To me, this makes even stronger the point
of looking at the distribution of pairwise distances before estimating
the tree. I'll modify boot.phylo() so that it randomizes rows by default.
Besides of this, Klaus Schliep and I are working on ways to improve
coding of splits so that this should be faster, and will solve the
problem of rooted vs. unrooted trees as handled by prop.part().
Best,
Emmanuel
Joe Felsenstein wrote on 09/05/2011 20:34:
Emmanuel wrote:
Is it a problem with ties or with identical sequences? I guess you can
solve the latter easily (eg, using the haplotype function in pegas), and
this will solve the vast majority of ties. Other cases of ties will
certainly not result in such high bootstrap values (that's my intuition).
My intuition disagrees with this. If several sequences are nearly
identical, but each differs from their consensus at K sites,
them if the tree-making algorithm does not randomise
addition order of species (or otherwise somehow
randomize the resolution of ties), there are likely
to be artificially high bootstrap values even with
nonzero branch lenghs.
1. Dropping identical sequences is a good thing to do, especially with
a distance-based method.
One issue is that after bootstrap sampling, some
sequences that were not identical may become
identical. So the tree-making method ought to
be able to handle identical sequences.
2. A high bootstrap support (or another form of support) associated
with a zero-length branch is an indication that something's wrong there.
Again, it can be a problem even when the branch lengths
are nonzero.
Joe
----
Joe Felsenstein, j...@gs.washington.edu
Dept. of Genome Sciences, Univ. of Washington
Box 355065, Seattle, WA 98195-5065 USA
--
Emmanuel Paradis
IRD, Jakarta, Indonesia
http://ape.mpl.ird.fr/
_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo