Re: [R] hclust, does order of data matter?

2010-11-16 Thread Reshmi Chowdhury
I found the problem.

For some reason, when I converted the list object with the data in it to
numeric, the values changed.  This resulted in different clustering
results.  Once that was fixed, the clustering was the same.

Thanks for the responses!


On Mon, Nov 15, 2010 at 2:37 PM, Peter Langfelder 
peter.langfel...@gmail.com wrote:

 On Mon, Nov 15, 2010 at 2:19 PM, Reshmi Chowdhury
 rchowdh...@alumni.upenn.edu wrote:
  Here is the code I am using:
 
  m - read.csv(data_unsorted.csv,header=TRUE)
  m - na.omit(m)
  cs - hclust(dist(t(m),method=euclidean),method=complete)
  ds - as.dendrogram(cs)

 As Christian said, you may want to plot the cs tree (i.e., plot(cs))
 in both cases and make sure that the differences do not just stem from
 equal distances. Also, check the matrix m to make sure that the first
 column in data_unsorted.csv is interpreted correctly by the read.csv
 function - if your first data column is interpreted as row names, the
 dendograms may indeed look different. Other than the ambiguity of
 equal distances, the dendrogram produced by hclust should not depend
 on the order of the columns in input to dist.

 Peter


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] hclust, does order of data matter?

2010-11-15 Thread rchowdhury

Hello,

I am using the hclust function to cluster some data.  I have two separate
files with the same data.  The only difference is the order of the data in
the file.  For some reason, when I run the two files through the hclust
function, I get two completely different results.

Does anyone know why this is happening?  Does the order of the data matter?

Thanks,
RC
-- 
View this message in context: 
http://r.789695.n4.nabble.com/hclust-does-order-of-data-matter-tp3043896p3043896.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hclust, does order of data matter?

2010-11-15 Thread Peter Langfelder
On Mon, Nov 15, 2010 at 2:07 PM, rchowdhury rchowdh...@alumni.upenn.edu wrote:

 Hello,

 I am using the hclust function to cluster some data.  I have two separate
 files with the same data.  The only difference is the order of the data in
 the file.  For some reason, when I run the two files through the hclust
 function, I get two completely different results.

 Does anyone know why this is happening?  Does the order of the data matter?

No, order of the data should not matter. However, hclust takes a
distance structure, not a matrix, so the problem may be in how you
create the distance. Can you provide an example?

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hclust, does order of data matter?

2010-11-15 Thread Christian Hennig
I don't know how the hclust function is implemented, but generally in 
hierarchical clustering the result can be ambiguous if there are several 
distances of identical value in the dataset (or identical between-cluster 
distances occur when aggregating clusters). The role of the order of the 
data depends on how these ambiguities are resolved. It may well be that in 
such cases if at some point when building the hierarchy there are two 
different possibilities to merge clusters at the same distance value what 
is done by hclust is determined by the order.


Hope this helps,
Christian

On Mon, 15 Nov 2010, rchowdhury wrote:



Hello,

I am using the hclust function to cluster some data.  I have two separate
files with the same data.  The only difference is the order of the data in
the file.  For some reason, when I run the two files through the hclust
function, I get two completely different results.

Does anyone know why this is happening?  Does the order of the data matter?

Thanks,
RC
--
View this message in context: 
http://r.789695.n4.nabble.com/hclust-does-order-of-data-matter-tp3043896p3043896.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hclust, does order of data matter?

2010-11-15 Thread Reshmi Chowdhury
Here is the code I am using:

m - read.csv(data_unsorted.csv,header=TRUE)
m - na.omit(m)
cs - hclust(dist(t(m),method=euclidean),method=complete)
ds - as.dendrogram(cs)

In this case, m is a 106x40 matrix of doubles.  When I change the order of
the columns, I get different results...

Thanks,
RC


On Mon, Nov 15, 2010 at 2:13 PM, Peter Langfelder 
peter.langfel...@gmail.com wrote:

 On Mon, Nov 15, 2010 at 2:07 PM, rchowdhury rchowdh...@alumni.upenn.edu
 wrote:
 
  Hello,
 
  I am using the hclust function to cluster some data.  I have two separate
  files with the same data.  The only difference is the order of the data
 in
  the file.  For some reason, when I run the two files through the hclust
  function, I get two completely different results.
 
  Does anyone know why this is happening?  Does the order of the data
 matter?

 No, order of the data should not matter. However, hclust takes a
 distance structure, not a matrix, so the problem may be in how you
 create the distance. Can you provide an example?

 Peter


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hclust, does order of data matter?

2010-11-15 Thread Peter Langfelder
On Mon, Nov 15, 2010 at 2:19 PM, Reshmi Chowdhury
rchowdh...@alumni.upenn.edu wrote:
 Here is the code I am using:

 m - read.csv(data_unsorted.csv,header=TRUE)
 m - na.omit(m)
 cs - hclust(dist(t(m),method=euclidean),method=complete)
 ds - as.dendrogram(cs)

As Christian said, you may want to plot the cs tree (i.e., plot(cs))
in both cases and make sure that the differences do not just stem from
equal distances. Also, check the matrix m to make sure that the first
column in data_unsorted.csv is interpreted correctly by the read.csv
function - if your first data column is interpreted as row names, the
dendograms may indeed look different. Other than the ambiguity of
equal distances, the dendrogram produced by hclust should not depend
on the order of the columns in input to dist.

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.