Re: [R] hclust, does order of data matter?
I found the problem. For some reason, when I converted the list object with the data in it to numeric, the values changed. This resulted in different clustering results. Once that was fixed, the clustering was the same. Thanks for the responses! On Mon, Nov 15, 2010 at 2:37 PM, Peter Langfelder peter.langfel...@gmail.com wrote: On Mon, Nov 15, 2010 at 2:19 PM, Reshmi Chowdhury rchowdh...@alumni.upenn.edu wrote: Here is the code I am using: m - read.csv(data_unsorted.csv,header=TRUE) m - na.omit(m) cs - hclust(dist(t(m),method=euclidean),method=complete) ds - as.dendrogram(cs) As Christian said, you may want to plot the cs tree (i.e., plot(cs)) in both cases and make sure that the differences do not just stem from equal distances. Also, check the matrix m to make sure that the first column in data_unsorted.csv is interpreted correctly by the read.csv function - if your first data column is interpreted as row names, the dendograms may indeed look different. Other than the ambiguity of equal distances, the dendrogram produced by hclust should not depend on the order of the columns in input to dist. Peter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hclust, does order of data matter?
Here is the code I am using: m - read.csv(data_unsorted.csv,header=TRUE) m - na.omit(m) cs - hclust(dist(t(m),method=euclidean),method=complete) ds - as.dendrogram(cs) In this case, m is a 106x40 matrix of doubles. When I change the order of the columns, I get different results... Thanks, RC On Mon, Nov 15, 2010 at 2:13 PM, Peter Langfelder peter.langfel...@gmail.com wrote: On Mon, Nov 15, 2010 at 2:07 PM, rchowdhury rchowdh...@alumni.upenn.edu wrote: Hello, I am using the hclust function to cluster some data. I have two separate files with the same data. The only difference is the order of the data in the file. For some reason, when I run the two files through the hclust function, I get two completely different results. Does anyone know why this is happening? Does the order of the data matter? No, order of the data should not matter. However, hclust takes a distance structure, not a matrix, so the problem may be in how you create the distance. Can you provide an example? Peter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.