Re: [R] hclust, does order of data matter?

2010-11-16 Thread Reshmi Chowdhury
I found the problem. For some reason, when I converted the list object with the data in it to numeric, the values changed. This resulted in different clustering results. Once that was fixed, the clustering was the same. Thanks for the responses! On Mon, Nov 15, 2010 at 2:37 PM, Peter

[R] hclust, does order of data matter?

2010-11-15 Thread rchowdhury
Hello, I am using the hclust function to cluster some data. I have two separate files with the same data. The only difference is the order of the data in the file. For some reason, when I run the two files through the hclust function, I get two completely different results. Does anyone know

Re: [R] hclust, does order of data matter?

2010-11-15 Thread Peter Langfelder
On Mon, Nov 15, 2010 at 2:07 PM, rchowdhury rchowdh...@alumni.upenn.edu wrote: Hello, I am using the hclust function to cluster some data.  I have two separate files with the same data.  The only difference is the order of the data in the file.  For some reason, when I run the two files

Re: [R] hclust, does order of data matter?

2010-11-15 Thread Christian Hennig
I don't know how the hclust function is implemented, but generally in hierarchical clustering the result can be ambiguous if there are several distances of identical value in the dataset (or identical between-cluster distances occur when aggregating clusters). The role of the order of the data

Re: [R] hclust, does order of data matter?

2010-11-15 Thread Reshmi Chowdhury
Here is the code I am using: m - read.csv(data_unsorted.csv,header=TRUE) m - na.omit(m) cs - hclust(dist(t(m),method=euclidean),method=complete) ds - as.dendrogram(cs) In this case, m is a 106x40 matrix of doubles. When I change the order of the columns, I get different results... Thanks, RC

Re: [R] hclust, does order of data matter?

2010-11-15 Thread Peter Langfelder
On Mon, Nov 15, 2010 at 2:19 PM, Reshmi Chowdhury rchowdh...@alumni.upenn.edu wrote: Here is the code I am using: m - read.csv(data_unsorted.csv,header=TRUE) m - na.omit(m) cs - hclust(dist(t(m),method=euclidean),method=complete) ds - as.dendrogram(cs) As Christian said, you may want to