[R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples: Again

2011-05-16 Thread Alastair Potts
Hi Emmanuel, I'm wondering why prop.clade always returns 100% for the first node? For example: library(ape) a <- as.DNAbin(matrix('a',10,10)) # DNA data with no variation rownames(a) <- paste('tip',1:10,sep="") f <- function(x) nj(dist.dna(x[sample(nrow(x)), ])) tr <- f(a) o <- boot.phylo(tr, a, f

Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples

2011-05-10 Thread Emmanuel Paradis
Joe, I agree with what you wrote. To me, this makes even stronger the point of looking at the distribution of pairwise distances before estimating the tree. I'll modify boot.phylo() so that it randomizes rows by default. Besides of this, Klaus Schliep and I are working on ways to improve cod

Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples

2011-05-09 Thread Joe Felsenstein
Emmanuel wrote: Is it a problem with ties or with identical sequences? I guess you can solve the latter easily (eg, using the haplotype function in pegas), and this will solve the vast majority of ties. Other cases of ties will certainly not result in such high bootstrap values (that's my

Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples

2011-05-09 Thread Emmanuel Paradis
Hi Alastair, Alastair Potts wrote on 08/05/2011 00:07: Hi Emmanuel (Klaus and Joe), The example data was meant to demonstrate that the tie-breaking in nj is affecting the bootstrap results - or rather the lack of any way to deal with tie breaking. I've noticed that a bunch of identical sequen

Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples

2011-05-07 Thread Alastair Potts
Hi Emmanuel (Klaus and Joe), The example data was meant to demonstrate that the tie-breaking in nj is affecting the bootstrap results - or rather the lack of any way to deal with tie breaking. I've noticed that a bunch of identical sequences form a 'polytomy' in my real dataset (but obviously

Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples

2011-05-06 Thread Emmanuel Paradis
Hi Alastair, Klaus & Joe, Before doing the tree, you should do some preliminary data explorations, such as: d <- dist.dna(a) hist(d) summary(d) That'd show you any tree estimation procedure (not only NJ) has very little meaning -- just like you do plot(x, y) before doing lm(y ~ x). Best,

Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples

2011-05-05 Thread Alastair Potts
Hi Klaus and Joe, Thanks very much for your responses. From Klaus: it is not that surprising. NJ normally does not produce poytomies, just edge weights of length 0. How these are broken may depends from the input order (from labels in the distance matrix like in this implementation) or could be

Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples

2011-05-05 Thread Joe Felsenstein
Klaus Schliep wrote -- > it is not that surprising. NJ normally does not produce poytomies, > just edge weights of length 0. How these are broken may depends from > the input order (from labels in the distance matrix like in this > implementation) or could be broken randomly. I added some code b

Re: [R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples

2011-05-05 Thread Klaus Schliep
Hi Alastair, it is not that surprising. NJ normally does not produce poytomies, just edge weights of length 0. How these are broken may depends from the input order (from labels in the distance matrix like in this implementation) or could be broken randomly. I added some code below to highlight i

[R-sig-phylo] Bootstrap values and NJ when there is no genetic distance between samples

2011-05-05 Thread Alastair Potts
Good day all, I noticed something that I would consider an anomaly when analysing one of my trees with NJ. A 'polytomy' of samples contained many bootstrap values of 100 between samples. I was looking at the total change in bootstrap values for all nodes when I picked this up (as the signal in