Re: [R] hclust, does order of data matter?

2010-11-16 Thread Reshmi Chowdhury
I found the problem.

For some reason, when I converted the list object with the data in it to
numeric, the values changed.  This resulted in different clustering
results.  Once that was fixed, the clustering was the same.

Thanks for the responses!


On Mon, Nov 15, 2010 at 2:37 PM, Peter Langfelder 
peter.langfel...@gmail.com wrote:

 On Mon, Nov 15, 2010 at 2:19 PM, Reshmi Chowdhury
 rchowdh...@alumni.upenn.edu wrote:
  Here is the code I am using:
 
  m - read.csv(data_unsorted.csv,header=TRUE)
  m - na.omit(m)
  cs - hclust(dist(t(m),method=euclidean),method=complete)
  ds - as.dendrogram(cs)

 As Christian said, you may want to plot the cs tree (i.e., plot(cs))
 in both cases and make sure that the differences do not just stem from
 equal distances. Also, check the matrix m to make sure that the first
 column in data_unsorted.csv is interpreted correctly by the read.csv
 function - if your first data column is interpreted as row names, the
 dendograms may indeed look different. Other than the ambiguity of
 equal distances, the dendrogram produced by hclust should not depend
 on the order of the columns in input to dist.

 Peter


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hclust, does order of data matter?

2010-11-15 Thread Reshmi Chowdhury
Here is the code I am using:

m - read.csv(data_unsorted.csv,header=TRUE)
m - na.omit(m)
cs - hclust(dist(t(m),method=euclidean),method=complete)
ds - as.dendrogram(cs)

In this case, m is a 106x40 matrix of doubles.  When I change the order of
the columns, I get different results...

Thanks,
RC


On Mon, Nov 15, 2010 at 2:13 PM, Peter Langfelder 
peter.langfel...@gmail.com wrote:

 On Mon, Nov 15, 2010 at 2:07 PM, rchowdhury rchowdh...@alumni.upenn.edu
 wrote:
 
  Hello,
 
  I am using the hclust function to cluster some data.  I have two separate
  files with the same data.  The only difference is the order of the data
 in
  the file.  For some reason, when I run the two files through the hclust
  function, I get two completely different results.
 
  Does anyone know why this is happening?  Does the order of the data
 matter?

 No, order of the data should not matter. However, hclust takes a
 distance structure, not a matrix, so the problem may be in how you
 create the distance. Can you provide an example?

 Peter


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.