Re: [R] hclust, does order of data matter?
I found the problem. For some reason, when I converted the list object with the data in it to numeric, the values changed. This resulted in different clustering results. Once that was fixed, the clustering was the same. Thanks for the responses! On Mon, Nov 15, 2010 at 2:37 PM, Peter Langfelder peter.langfel...@gmail.com wrote: On Mon, Nov 15, 2010 at 2:19 PM, Reshmi Chowdhury rchowdh...@alumni.upenn.edu wrote: Here is the code I am using: m - read.csv(data_unsorted.csv,header=TRUE) m - na.omit(m) cs - hclust(dist(t(m),method=euclidean),method=complete) ds - as.dendrogram(cs) As Christian said, you may want to plot the cs tree (i.e., plot(cs)) in both cases and make sure that the differences do not just stem from equal distances. Also, check the matrix m to make sure that the first column in data_unsorted.csv is interpreted correctly by the read.csv function - if your first data column is interpreted as row names, the dendograms may indeed look different. Other than the ambiguity of equal distances, the dendrogram produced by hclust should not depend on the order of the columns in input to dist. Peter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hclust, does order of data matter?
Hello, I am using the hclust function to cluster some data. I have two separate files with the same data. The only difference is the order of the data in the file. For some reason, when I run the two files through the hclust function, I get two completely different results. Does anyone know why this is happening? Does the order of the data matter? Thanks, RC -- View this message in context: http://r.789695.n4.nabble.com/hclust-does-order-of-data-matter-tp3043896p3043896.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hclust, does order of data matter?
On Mon, Nov 15, 2010 at 2:07 PM, rchowdhury rchowdh...@alumni.upenn.edu wrote: Hello, I am using the hclust function to cluster some data. I have two separate files with the same data. The only difference is the order of the data in the file. For some reason, when I run the two files through the hclust function, I get two completely different results. Does anyone know why this is happening? Does the order of the data matter? No, order of the data should not matter. However, hclust takes a distance structure, not a matrix, so the problem may be in how you create the distance. Can you provide an example? Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hclust, does order of data matter?
I don't know how the hclust function is implemented, but generally in hierarchical clustering the result can be ambiguous if there are several distances of identical value in the dataset (or identical between-cluster distances occur when aggregating clusters). The role of the order of the data depends on how these ambiguities are resolved. It may well be that in such cases if at some point when building the hierarchy there are two different possibilities to merge clusters at the same distance value what is done by hclust is determined by the order. Hope this helps, Christian On Mon, 15 Nov 2010, rchowdhury wrote: Hello, I am using the hclust function to cluster some data. I have two separate files with the same data. The only difference is the order of the data in the file. For some reason, when I run the two files through the hclust function, I get two completely different results. Does anyone know why this is happening? Does the order of the data matter? Thanks, RC -- View this message in context: http://r.789695.n4.nabble.com/hclust-does-order-of-data-matter-tp3043896p3043896.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hclust, does order of data matter?
Here is the code I am using: m - read.csv(data_unsorted.csv,header=TRUE) m - na.omit(m) cs - hclust(dist(t(m),method=euclidean),method=complete) ds - as.dendrogram(cs) In this case, m is a 106x40 matrix of doubles. When I change the order of the columns, I get different results... Thanks, RC On Mon, Nov 15, 2010 at 2:13 PM, Peter Langfelder peter.langfel...@gmail.com wrote: On Mon, Nov 15, 2010 at 2:07 PM, rchowdhury rchowdh...@alumni.upenn.edu wrote: Hello, I am using the hclust function to cluster some data. I have two separate files with the same data. The only difference is the order of the data in the file. For some reason, when I run the two files through the hclust function, I get two completely different results. Does anyone know why this is happening? Does the order of the data matter? No, order of the data should not matter. However, hclust takes a distance structure, not a matrix, so the problem may be in how you create the distance. Can you provide an example? Peter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hclust, does order of data matter?
On Mon, Nov 15, 2010 at 2:19 PM, Reshmi Chowdhury rchowdh...@alumni.upenn.edu wrote: Here is the code I am using: m - read.csv(data_unsorted.csv,header=TRUE) m - na.omit(m) cs - hclust(dist(t(m),method=euclidean),method=complete) ds - as.dendrogram(cs) As Christian said, you may want to plot the cs tree (i.e., plot(cs)) in both cases and make sure that the differences do not just stem from equal distances. Also, check the matrix m to make sure that the first column in data_unsorted.csv is interpreted correctly by the read.csv function - if your first data column is interpreted as row names, the dendograms may indeed look different. Other than the ambiguity of equal distances, the dendrogram produced by hclust should not depend on the order of the columns in input to dist. Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.