[R] cluster analysis

2007-03-06 Thread Vallejo, Roger
Dear R-help,

 

In performing cluster analysis (packages: hopach, cluster, boot, and
many others), I got these errors:

 

 makeoutput(kidney, gene.hobj, bobj, file= kidney.out, gene.names=
gene.acc)

Error: could not find function makeoutput

 

 boot2fuzzy(kidney, bobj, gene.hobj, array.hobj, file= kidneyFzy,
gene.names= gene.desc)

Error: could not find function boot2fuzzy

 

It seems that I am missing to include a library that contains the
functions: makeoutput and boot2fuzzy or perhaps these function names
are outdated in the newest versions of these packages. I hope they are
not typographical errors in the reference that I am trying to use:
Gentleman et al., 2005. Bioinformatics and computational biology
solutions using R and Bioconductor. Pgs: 226-227.

 

Thank you very much for your help.

Roger

 

 

 

Roger L. Vallejo, Ph.D.

Computational Biologist  Geneticist

U.S. Department of Agriculture, ARS  

National Center for Cool  Cold Water Aquaculture

11861 Leetown Road

Kearneysville, WV 25430

Voice:(304) 724-8340 Ext. 2141

Email:   [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cluster analysis under contiguity constraints with R ?

2007-02-16 Thread Bellanger Lise
Hello,
 
I would like to know if there is a function in an R library that 
allows to do cluster analysis under contiguity constraints ?
 
 
Thank you very much for your answer !

Lise Bellanger

-- 
Lise Bellanger, 
Université de Nantes
Département de Mathématiques, Laboratoire Jean Leray UMR CNRS 6629 
2, Rue de la Houssinière BP 92208 - F-44322 Nantes Cedex 03 
Tél. : (33|0) 2 51 12 59 00 (ou 43) - Fax : (33|0) 2 51 12 59 12 
E-Mail : [EMAIL PROTECTED]
URL : http://www.math.sciences.univ-nantes.fr/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cluster analysis using Dmax

2006-11-01 Thread Kris Lockyear
Dear All,

a long time ago I ran a cluster analysis where the dissimilarity matrix used 
consisted of Dmax (or Kolmogorov-Smirnov distance) values.  In other words 
the maximum difference between two cumulative proportion curves.  This all 
worked very well indeed.  The matrix was calculated using Dbase III+ and 
took a day and a half and the clustering was done using MV-ARCH, with the 
resultant dendrogram converted from HP Plotter language to PostScript 
manually.  As you might guess, I'd like to be able to do this more 
efficiently in R.

I have looked through the various help files and found that some of the 
clustering routines will take a dissimilarity matrix as input (yay!).

My questions (as a very novice R user) are:

a) how would one go about calculating the matrix of Dmax/KS distance values?

b) of the many clustering packages (I'll be doing a simple average link 
hierarchical clustering) is there one where I can ask: If I 'cut' the 
dendrogram at the 0.x dissimilarity level, which items are in which  
clusters? (As my dataset has over 200 items this is non-trivial to work out 
manually).

Many thanks indeed for your help.

Kris Lockyear.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cluster analysis using Dmax

2006-11-01 Thread Christian Hennig
Dear Kris,

 a) how would one go about calculating the matrix of Dmax/KS distance values?

Hmm, I'd implement this directly by comparing the curves on a dense 
sequence of equidistant points over a given value 
range (hope you know a suitable one) and looking for the maximum 
difference...

 b) of the many clustering packages (I'll be doing a simple average link
 hierarchical clustering) is there one where I can ask: If I 'cut' the
 dendrogram at the 0.x dissimilarity level, which items are in which
 clusters? (As my dataset has over 200 items this is non-trivial to work out
 manually).

?cutree

Best,
Christian


 Many thanks indeed for your help.

 Kris Lockyear.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cluster analysis of microarray data

2006-07-25 Thread Mahdi Osman
Hi list,

I am interested in cluster analysis of microarray data. The data was generated 
using cDNA method and a loop design.


I was wondering if any one has a suggestion about which package I can use to 
analyse such data.


Many thanks in advance

Mahdi
-- 
---
Mahdi Osman (PhD)
E-mail: [EMAIL PROTECTED]
---

Echte DSL-Flatrate dauerhaft für 0,- Euro*. Nur noch kurze Zeit!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cluster analysis of microarray data

2006-07-25 Thread Seth Falcon
Mahdi Osman [EMAIL PROTECTED] writes:

 Hi list,

 I am interested in cluster analysis of microarray data. The data was
 generated using cDNA method and a loop design.


 I was wondering if any one has a suggestion about which package I can
 use to analyse such data.

There are many packages within the Bioconductor project that provide
tools for analysis of microarray data.

I would start by taking a look at the Microarray and TwoChannel
BiocViews:

http://bioconductor.org/packages/1.8/Microarray.html
http://bioconductor.org/packages/1.8/TwoChannel.html

+ seth

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cluster Analysis with flexible beta linkage method

2006-07-17 Thread Martin Maechler
 Wade == Wade Wall [EMAIL PROTECTED]
 on Fri, 14 Jul 2006 10:10:11 -0400 writes:

Wade I am trying to run a cluster analysis using Sorenson
Wade (Bray-Curtis) distance measure with flexible beta
Wade linkage method.  However, I can't seem to find
Wade flexible beta in any of the functions/packages I have
Wade looked at.

Maybe you explain what the above are, rather than us having to
look up the information ?

Wade Any help would be appreciated.

Wade [[alternative HTML version deleted]]
   
   would not appear here,  had you read and followed
   the posting guide :

Wade PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Cluster Analysis with flexible beta linkage method

2006-07-14 Thread Wade Wall
Hi all,

I am trying to run a cluster analysis using Sorenson (Bray-Curtis) distance
measure with flexible beta linkage method.  However, I can't seem to find
flexible beta in any of the functions/packages I have looked at.

Any help would be appreciated.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Cluster Analysis - Number of Clusters

2006-02-06 Thread John Janmaat
Hello,

I'm playing around with cluster analysis, and am looking for methods to 
select the number of clusters.  I am aware of methods based on a 'pseudo 
F' or a 'pseudo T^2'.  Are there packages in R that will generate these 
statistics, and/or other statistics to aid in cluster number selection?

Thanks,

John.
-- 
===
Dr. John Janmaat   Tel: 902-585-1461
Department of EconomicsFax: 902-585-1070
Acadia University  Email: [EMAIL PROTECTED]
Wolfville, Nova Scotia, Canada.Web: ace.acadiau.ca/~jjanmaat/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cluster Analysis - Number of Clusters

2006-02-06 Thread P. Olsson
Have you checked the amap package? It has been updated just recently and if
I am not wrong there is a method which indicates the best number of k groups
for your data.

Best wishes,
P. Olsson



2006/2/5, John Janmaat [EMAIL PROTECTED]:

 Hello,

 I'm playing around with cluster analysis, and am looking for methods to
 select the number of clusters.  I am aware of methods based on a 'pseudo
 F' or a 'pseudo T^2'.  Are there packages in R that will generate these
 statistics, and/or other statistics to aid in cluster number selection?

 Thanks,

 John.
 --

 ===
 Dr. John Janmaat   Tel: 902-585-1461
 Department of EconomicsFax: 902-585-1070
 Acadia University  Email: [EMAIL PROTECTED]
 Wolfville, Nova Scotia, Canada.Web: ace.acadiau.ca/~jjanmaat/

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cluster Analysis - Number of Clusters

2006-02-06 Thread Christian Hennig
Hi,

as said before, some statistics to estimate the number of clusters are in 
the cluster.stats function of package fpc. These are distance-based, 
not pseudo F or T^2. They are documented in the book 
of Gordon (1999) Classification (see ?cluster.stats for more references). 
It also includes the average silhouette width of Kaufman and Rousseeuw 
(1990) (exact reference in ?plot.agnes), which is also part of the output 
of some functions in package cluster (pam, agnes,...?).

An alternative way to estimate the number of clusters is the use of the 
BIC together with a (normal) mixture model, see package mclust.

Best,
Christian


On Sun, 5 Feb 2006, John Janmaat wrote:

 Hello,

 I'm playing around with cluster analysis, and am looking for methods to
 select the number of clusters.  I am aware of methods based on a 'pseudo
 F' or a 'pseudo T^2'.  Are there packages in R that will generate these
 statistics, and/or other statistics to aid in cluster number selection?

 Thanks,

 John.
 -- 
 ===
 Dr. John Janmaat   Tel: 902-585-1461
 Department of EconomicsFax: 902-585-1070
 Acadia University  Email: [EMAIL PROTECTED]
 Wolfville, Nova Scotia, Canada.Web: ace.acadiau.ca/~jjanmaat/

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cluster Analysis - Number of Clusters

2006-02-06 Thread TEMPL Matthias
Dear John,

You can play around with cluster.stats function in library fpc, e.g. you
can try:

library(fpc)
library(cluster)
data(xclara)
dM - dist(xclara)
cl - vector()
for(i in 2:7){
  cl[i] - cluster.stats(d=dM, clustering=clara(d,i)$cluster,
silhouette=FALSE)$wb.ratio
}
plot(1:6,cl[2:7], xaxt=n)
axis(1, at=1:6, labels=2:7)

(..takes some minutes time)
indicates that 3 clusters are optimal for this data.

Best,
Matthias


 
 Hello,
 
 I'm playing around with cluster analysis, and am looking for 
 methods to 
 select the number of clusters.  I am aware of methods based 
 on a 'pseudo 
 F' or a 'pseudo T^2'.  Are there packages in R that will 
 generate these 
 statistics, and/or other statistics to aid in cluster number 
 selection?
 
 Thanks,
 
 John.
 -- 
 ==
 =
 Dr. John Janmaat   Tel: 902-585-1461
 Department of EconomicsFax: 902-585-1070
 Acadia University  Email: [EMAIL PROTECTED]
 Wolfville, Nova Scotia, Canada.Web: ace.acadiau.ca/~jjanmaat/
 
 __
 R-help@stat.math.ethz.ch mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read 
 the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Cluster Analysis

2006-02-05 Thread John Janmaat
Hello,

I'm trying some cluster analysis, using the hclust command.  I am looking for 
some help in selecting the 'best' number of clusters.  Some software reports 
pseudo-F and pseudo-T^2 statistics, for each cluster merge.  Is there any way 
to generate such statistics simply in R?

Thanks,

John.

Dr. John Janmaat   Tel: 902-585-1461
Department of EconomicsFax: 902-585-1461
Acadia University, Email: [EMAIL PROTECTED]
Wolfville, Nova Scotia, Canada.Web: ace.acadiau.ca/~jjanmaat/
B4P 1H5

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cluster Analysis

2006-02-05 Thread Romain Francois
Le 05.02.2006 17:50, John Janmaat a écrit :

Hello,

I'm trying some cluster analysis, using the hclust command.  I am looking for 
some help in selecting the 'best' number of clusters.  Some software reports 
pseudo-F and pseudo-T^2 statistics, for each cluster merge.  Is there any way 
to generate such statistics simply in R?

Thanks,

John.

Dr. John Janmaat   Tel: 902-585-1461
Department of EconomicsFax: 902-585-1461
Acadia University, Email: [EMAIL PROTECTED]
Wolfville, Nova Scotia, Canada.Web: ace.acadiau.ca/~jjanmaat/
B4P 1H5
  

Hi,

The package fpc have things like that.

Romain

-- 
visit the R Graph Gallery : http://addictedtor.free.fr/graphiques
mixmod 1.7 is released : http://www-math.univ-fcomte.fr/mixmod/index.php
+---+
| Romain FRANCOIS - http://francoisromain.free.fr   |
| Doctorant INRIA Futurs / EDF  |
+---+

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] cluster analysis for 80000 observations

2006-01-27 Thread Martin Maechler
 Markus == Markus Preisetanz [EMAIL PROTECTED]
 on Thu, 26 Jan 2006 20:48:29 +0100 writes:

Markus Dear R Specialists,
Markus when trying to cluster a data.frame with about 80.000 rows and 25 
columns I get the above error message. I tried hclust (using dist), agnes 
(entering the data.frame directly) and pam (entering the data.frame directly). 
What I actually do not want to do is generate a random sample from the data.

Currently all the above mentioned cluster methods work with
full distance / dissimilarity objects, even if only internally,
i.e. they store all d_{i,j} for  1 = i  j = n, i.e.  n(n-1)/2 values,
also each of them in double precision, i.e. 8 bytes.

So: no chance with the above functions and n=80'000

 Markus The machine I run R on is a Windows 2000 Server (Pentium 4) with 2 GB 
of RAM.

If you would run an machine with a 64-bit version of OS and R
{typical case today: Linux on AMD Opteron}, you could go up
quite a bit higher than on your Windoze box,
{I vaguely remember I could do  'n = a few thousand' on our 
 dual opteron with 16 GBytes}, but 80'000 is definitely too
large.

OTOH, there is clara() in the cluster package, which has been
designed for such situations, 
 CLARA:= [C]lustering [LAR]ge [A]pplications.
It is similar in spirit to pam(),
*does* cluster all 80'000 observations but does so by taking
sub samples to construct the medoids.
(and you can ask it to take many medium size subsamples, instead
 of just 5 small sized ones as it does by default).

Martin Maechler, ETH Zurich
maintainer of cluster package.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] cluster analysis: error in vector(do uble, length): given vector size is too bi g {Fehler in vector(double, length) : angegebene Vektorgröße ist zu groß}

2006-01-26 Thread Markus Preisetanz
Dear R Specialists,

 

when trying to cluster a data.frame with about 80.000 rows and 25 columns I get 
the above error message. I tried hclust (using dist), agnes (entering the 
data.frame directly) and pam (entering the data.frame directly). What I 
actually do not want to do is generate a random sample from the data.

 

The machine I run R on is a Windows 2000 Server (Pentium 4) with 2 GB of RAM.

 

Does anybody know what to do?

 

Sincerely

___

Markus Preisetanz

Consultant

 

Client Vela GmbH

Albert-Roßhaupter-Str. 32

81369 München

fon:  +49 (0) 89 742 17-113

fax:  +49 (0) 89 742 17-150

mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 



Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. 
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten 
haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. 
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail ist nicht 
gestattet.

This e-mail may contain confidential and/or privileged infor...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] cluster analysis: error in v ector(double, length): given vector size is too big {Fehler in vector(double, length ) : angegebene Vektorgröße ist zu groß}

2006-01-26 Thread Liaw, Andy
Let's do some simple calculation:  The dist object from a data set with
8 cases would have

  8 * (8 - 1) / 2 

elements, each takes 8 bytes to be stored in double precision.  That's over
24GB if my arithmetic isn't too flaky.  You'd have a devil of a time trying
to do this on a 64-bit machine with 32GB RAM, let alone what you are using.
You'd have much better chance sticking with algorithms that do not require
storage of the (dis)similarity matrix.

Andy

From: Markus Preisetanz
 
 Dear R Specialists,
 
  
 
 when trying to cluster a data.frame with about 80.000 rows 
 and 25 columns I get the above error message. I tried hclust 
 (using dist), agnes (entering the data.frame directly) and 
 pam (entering the data.frame directly). What I actually do 
 not want to do is generate a random sample from the data.
 
  
 
 The machine I run R on is a Windows 2000 Server (Pentium 4) 
 with 2 GB of RAM.
 
  
 
 Does anybody know what to do?
 
  
 
 Sincerely
 
 ___
 
 Markus Preisetanz
 
 Consultant
 
  
 
 Client Vela GmbH
 
 Albert-Roßhaupter-Str. 32
 
 81369 München
 
 fon:  +49 (0) 89 742 17-113
 
 fax:  +49 (0) 89 742 17-150
 
 mailto:[EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] 
 
 
 
 Diese E-Mail enthält vertrauliche und/oder rechtlich 
 geschützte Informationen. Wenn Sie nicht der richtige 
 Adressat sind oder diese E-Mail irrtümlich erhalten haben, 
 informieren Sie bitte sofort den Absender und vernichten Sie 
 diese Mail. Das unerlaubte Kopieren sowie die unbefugte 
 Weitergabe dieser E-Mail ist nicht gestattet.
 
 This e-mail may contain confidential and/or privileged 
 infor...{{dropped}}
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] CLuster analysis with only nominal variables

2006-01-17 Thread Nagu
Hi All,

I am wondering if there is any literature or any prior implementations
of cluster analysis for only nominal (categorical) variables for a
large dataset, apprx 20,000 rows with 15 variables.

I came across one or two such implementations, but they seem to assume
certain data distributions.

Thank you,
Nagu

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] cluster analysis question

2005-08-10 Thread Antonio Olinto
Hi,

I’m using hclust to make a cluster analysis in Q mode, but I have too many
objects (observations) and it’s difficult to identify them in the plot. I’d like
to get a list with the objects ordered in the same way they appear in the 
cluster.

I have already tried order, labels and merge but I couldn’t get the result I 
want.

Thanks for any help,

Antonio Olinto



-
WebMail Bignet - O seu provedor do litoral
www.bignet.com.br

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cluster analysis using EM algorithm

2005-01-27 Thread Christian Hennig
Hi!

Take a look at the packages mclust and flexmix!
They use the EM algorithm for mixture modelling, sometimes called model
based cluster analysis.

Best,
Christian

On Wed, 26 Jan 2005 [EMAIL PROTECTED] wrote:

 Hi, 
  I am looking for a package to do the clustering analysis using the
  expectation maximization algorithm. 
 
  Thanks in advance.
 
  Ming
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Cluster analysis using EM algorithm

2005-01-26 Thread msck9
Hi, 
 I am looking for a package to do the clustering analysis using the
 expectation maximization algorithm. 

 Thanks in advance.

 Ming

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Cluster Analysis: Density-Based Method

2004-10-21 Thread Fernando Prass
Hi people,

Does anybody know some Density-Based Method for clustering implemented in R?

Thanks,

Fernando Prass





___ 
Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! 
http://br.acesso.yahoo.com/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cluster Analysis: Density-Based Method

2004-10-21 Thread Kjetil Brinchmann Halvorsen
Fernando Prass wrote:
Hi people,
Does anybody know some Density-Based Method for clustering implemented in R?
 

Have you looked at CRAN package mclust?
Thanks,
Fernando Prass
	
	
		
___ 
Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! http://br.acesso.yahoo.com/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 


--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
  --  Mahdi Elmandjra
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cluster Analysis: Density-Based Method

2004-10-21 Thread Fernando Prass
Yes, but mclust don't have a density-based algorithm. Mclust have the algorithm
BIC, that is a model-based method...

Fernando Prass

 --- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] escreveu: 
 Fernando Prass wrote:
 
 Hi people,
 
 Does anybody know some Density-Based Method for clustering implemented in R?
   
 
 Have you looked at CRAN package mclust?
 
 Thanks,
 
 Fernando Prass






___ 
Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! 
http://br.acesso.yahoo.com/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cluster Analysis: Density-Based Method

2004-10-21 Thread Ingmar Visser
maybe ?kmeans is what you're looking for ...
ingmar

On 10/21/04 2:47 PM, Fernando Prass [EMAIL PROTECTED] wrote:

 Yes, but mclust don't have a density-based algorithm. Mclust have the
 algorithm
 BIC, that is a model-based method...
 
 Fernando Prass
 
 --- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] escreveu:
 Fernando Prass wrote:
 
 Hi people,
 
 Does anybody know some Density-Based Method for clustering implemented in R?
  
 
 Have you looked at CRAN package mclust?
 
 Thanks,
 
 Fernando Prass
 
 
 
 
 
 
 ___
 Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora!
 http://br.acesso.yahoo.com/
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cluster Analysis: Density-Based Method

2004-10-21 Thread Fernando Prass
No, kmeans is a partition method. I need a model-based method, like DBSCAN or
DENCLUE algorithm...

Fernando Prass

 --- Ingmar Visser [EMAIL PROTECTED] escreveu: 
 maybe ?kmeans is what you're looking for ...
 ingmar
 
 On 10/21/04 2:47 PM, Fernando Prass [EMAIL PROTECTED] wrote:
 
  Yes, but mclust don't have a density-based algorithm. Mclust have the
  algorithm
  BIC, that is a model-based method...
  
  Fernando Prass
  
  --- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] escreveu:
  Fernando Prass wrote:
  
  Hi people,
  
  Does anybody know some Density-Based Method for clustering implemented in
 R?
   
  
  Have you looked at CRAN package mclust?
  
  Thanks,
  
  Fernando Prass
  
  
  
  
  
  
  ___
  Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora!
  http://br.acesso.yahoo.com/
  
  __
  [EMAIL PROTECTED] mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
  
 
  





___ 
Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! 
http://br.acesso.yahoo.com/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Cluster Analysis: Density-Based Method

2004-10-21 Thread Liaw, Andy
I'm no expert in this, but mclust is `density-based' because it estimates
the density with a mixture of Gaussians.  If this is not what you want, you
should clarify what you mean by `density-based'.  Do you mean an algorithm
based on kernel estimator of the density?

Andy

 From: Fernando Prass
 
 Yes, but mclust don't have a density-based algorithm. Mclust 
 have the algorithm
 BIC, that is a model-based method...
 
 Fernando Prass
 
  --- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] escreveu: 
  Fernando Prass wrote:
  
  Hi people,
  
  Does anybody know some Density-Based Method for clustering 
 implemented in R?

  
  Have you looked at CRAN package mclust?
  
  Thanks,
  
  Fernando Prass
 
 
 
   
   
   
 ___ 
 Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o 
 discador agora! http://br.acesso.yahoo.com/
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cluster Analysis: Density-Based Method

2004-10-21 Thread Christian Hennig
Dear Fernando,

below you find a DBSCAN function I wrote for my own purposes.
It comes with no warranty and without proper documentation, but I followed
the notation of the original KDD-96 DBSCAN paper.
For large data sets, it may be slow.

Best,
Christian

On Thu, 21 Oct 2004, Fernando Prass wrote:

 No, kmeans is a partition method. I need a model-based method, like DBSCAN or
 DENCLUE algorithm...
 
 Fernando Prass

distvector - function(x,data){
  ddata - t(data)-x
  dv - apply(ddata^2,2,sum)
}

# data may be nxp or distance matrix
# eps is the dbscan distance cutoff parameter
# MinPts is the minimum size of a cluster
# scale: Should the data be scaled?
# distances: has to be TRUE if data is a distance matrix
# showplot: Should the computation process be visualized? 
# countmode: dbscan gives messages when processing point no. (countmode)
dbscan - function(data,eps,MinPts=5, scale=FALSE, distances=FALSE,
   showplot=FALSE,
   countmode=c(1,2,3,5,10,100,1000,5000,1,5)){
  data - as.matrix(data)
  n - nrow(data)
  if (scale) data - scale(data)
  unregpoints - rep(0,n)
  e2 - eps^2
  cv - rep(0,n)
  cn - 0
  i - 1
  for (i in 1:n){
if (i %in% countmode) cat(Processing point , i, of ,n, .\n)
unclass - cv1
if (cv[i]==0){
  if (distances) seeds - data[i,]=eps
  else{
seeds - rep(FALSE,n)
seeds[unclass] - distvector(data[i,],data[unclass,])=e2
  }
  if (sum(seeds)+unregpoints[i]MinPts) cv[i] - (-1)
  else{
cn - cn+1
cv[i] - cn
seeds[i] - unclass[i] - FALSE
unregpoints[seeds] - unregpoints[seeds]+1
while (sum(seeds)0){
  if (showplot) plot(data,col=1+cv)
  unclass[seeds] - FALSE
  cv[seeds] - cn
  ap - (1:n)[seeds]
#  print(ap)
  seeds - rep(FALSE,n)  
  for (j in ap){
#if (showplot) plot(data,col=1+cv)
jseeds - rep(FALSE,n)  
if (distances) jseeds[unclass] - data[j,unclass]=eps
else{
  jseeds[unclass] - distvector(data[j,],data[unclass,])=e2
}
unregpoints[jseeds] - unregpoints[jseeds]+1
#if (cn==1)
#  cat(j, sum seeds=,sum(seeds), unreg=,unregpoints[j],
#   newseeds=,sum(cv[jseeds]==0),\n)
if (sum(jseeds)+unregpoints[j]=MinPts){  
  seeds[jseeds] - cv[jseeds]==0
  cv[jseeds  cv0] - cn
}
  } # for j
} # while sum seeds0
  } # else (sum seeds + ... = MinPts)
} # if cv==0
  } # for i
  if (sum(cv==(-1))0){
noisenumber - cn+1
cv[cv==(-1)] - noisenumber
  }
  else
noisenumber - FALSE
  out - list(classification=cv, noisenumber=noisenumber,
  eps=eps, MinPts=MinPts, unregpoints=unregpoints)
  out
} # dbscan
# classification: classification vector
# noisenumber: number in the classification vector indicating noise points
# unregpoints: ignore...

***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Cluster Analysis: Density-Based Method

2004-10-21 Thread Martin Maechler
 AndyL == Liaw, Andy [EMAIL PROTECTED]
 on Thu, 21 Oct 2004 09:18:54 -0400 writes:

AndyL I'm no expert in this, but mclust is `density-based'
AndyL because it estimates the density with a mixture of
AndyL Gaussians.  If this is not what you want, you should
AndyL clarify what you mean by `density-based'.  Do you
AndyL mean an algorithm based on kernel estimator of the density?

yes, kernel or other nonparametric density estimator, is what
is usually meant in these contexts.
[ Of course, many nonparametric estimators can be seen to live
  in finite-dimensional spaces, so the difference to an explicit
 flexible / high dimensional method isn't that big.. ]

Martin

 From: Fernando Prass
 
 Yes, but mclust don't have a density-based algorithm. Mclust 
 have the algorithm
 BIC, that is a model-based method...
 
 Fernando Prass
 
 --- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] escreveu: 
  Fernando Prass wrote:
  
  Hi people,
  
  Does anybody know some Density-Based Method for clustering 
 implemented in R?

  
  Have you looked at CRAN package mclust?
  
  Thanks,
  
  Fernando Prass
 
 
 
 
 
 
 ___ 
 Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o 
 discador agora! http://br.acesso.yahoo.com/
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 

AndyL __
AndyL [EMAIL PROTECTED] mailing list
AndyL https://stat.ethz.ch/mailman/listinfo/r-help
AndyL PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Cluster Analysis: Density-Based Method

2004-10-21 Thread Liaw, Andy
 From: Martin Maechler
 
  AndyL == Liaw, Andy [EMAIL PROTECTED]
  on Thu, 21 Oct 2004 09:18:54 -0400 writes:
 
 AndyL I'm no expert in this, but mclust is `density-based'
 AndyL because it estimates the density with a mixture of
 AndyL Gaussians.  If this is not what you want, you should
 AndyL clarify what you mean by `density-based'.  Do you
 AndyL mean an algorithm based on kernel estimator of the density?
 
 yes, kernel or other nonparametric density estimator, is what
 is usually meant in these contexts.
 [ Of course, many nonparametric estimators can be seen to live
   in finite-dimensional spaces, so the difference to an explicit
  flexible / high dimensional method isn't that big.. ]

 Martin


Yes.  However, after reading
ftp://ftp.stat.rice.edu/pub/scottdw/TECH/ipra.ps (David Scott's `From
Kernels to Mixtures' published in Technometrics in 2000, I believe the Tukey
memorial issue) I thought the line between kernel densities and mixture
models is rather gray...

Best,
Andy

  From: Fernando Prass
  
  Yes, but mclust don't have a density-based algorithm. Mclust 
  have the algorithm
  BIC, that is a model-based method...
  
  Fernando Prass
  
  --- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] 
 escreveu: 
   Fernando Prass wrote:
   
   Hi people,
   
   Does anybody know some Density-Based Method for clustering 
  implemented in R?
 
   
   Have you looked at CRAN package mclust?
   
   Thanks,
   
   Fernando Prass
  
  
  
  
  
  
  ___ 
  Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o 
  discador agora! http://br.acesso.yahoo.com/
  
  __
  [EMAIL PROTECTED] mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  
  
 
 AndyL __
 AndyL [EMAIL PROTECTED] mailing list
 AndyL https://stat.ethz.ch/mailman/listinfo/r-help
 AndyL PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Cluster Analysis: Density-Based Method

2004-10-21 Thread Fernando Prass
Andy,

I can be wrong, I'm no expert too, but density estimation is different of
density-model. MClust is a model-basead method because use model statistics
from clustering data (more information in
ftp://ftp.u.washington.edu/public/mclust/tr415R.pdf).

I need some package that implement algorithms like OPTICIS, DBSCAN or
DENCLUE...

Fernando Prass

 
  --- Liaw, Andy [EMAIL PROTECTED] escreveu: 
  I'm no expert in this, but mclust is `density-based' because it estimates
  the density with a mixture of Gaussians.  If this is not what you want, you
  should clarify what you mean by `density-based'.  Do you mean an algorithm
  based on kernel estimator of the density?
  
  Andy
  
   From: Fernando Prass
   
   Yes, but mclust don't have a density-based algorithm. Mclust 
   have the algorithm
   BIC, that is a model-based method...
   
   Fernando Prass
  

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] cluster analysis

2004-10-15 Thread Christian Hennig
Dear James,

sorry, this is not really an answer.
I use cutree to obtain clusters from an hclust object.
I do not get from the identify help page that identify should do anything
like what you expect it to do... I tried it out and to my surprise it
behaved as you said, i.e., it indeed does something at least similar to what
you want it to do, and that might be useful also for me. However, I wonder
where you got the information that identify could be suitable to obtain the
hclust clusters.

Puzzled,
Christian

PS: It seems that each value is typed twice because classi is named, and
each value is also a name. Try as.vector(classi). (Perhaps a little useful
help in the end?)

On Fri, 15 Oct 2004, James Foadi wrote:

 Hello. I wonder if anyone can help me with this.
 
 I'm performing cluster analysis by using hclust in stats package.
 My data are contained in a data frame with 10 columns, named drops.
 
 Firs I create a distance matrix using dist:
   
   distanxe - dist(drops)
 
 Then I perform cluster analysis via hclust:
 
   clusters - hclust(distanze)
 
 At this point I want to view the tree plot, and use plot:
 
   plot(clusters)
 
 Then, once decided which clusters to select, I start identify:
 
   classi - identify(clusters)
 
 and click on all clusters to be selected; I then finish by right-clicking.
 
 My understanding is that classi is now a list containing all individual 
 data, grouped in clusters. In my case classi contained 10 objects,
 simply named [1], [2], etc.
 
 To obtain all individual data belonging to one object I thought that
 would have sufficed to type for instance:
 
   classe_01 - classi[[1]]
 
 Unfortunately, rather than obtaining a vector, I obtain a numeric where
 each value is typed twice.
 
 Can anyone explain why, or what I've done wrong?
 
 Many thanks,
 
 james
 -- 
 Dr James Foadi
 Structural Biology Laboratory
 Department of Chemistry
 University of York
 YORK YO10 5YW
 UK
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] cluster analysis

2004-10-15 Thread Martin Maechler
 ChrisH == Christian Hennig [EMAIL PROTECTED]
 on Fri, 15 Oct 2004 11:43:53 +0200 (MEST) writes:

ChrisH Dear James,
ChrisH sorry, this is not really an answer.

nor this.  I'm answering Christian...

ChrisH I use cutree to obtain clusters from an hclust
ChrisH object.  I do not get from the identify help page
ChrisH that identify should do anything like what you
ChrisH expect it to do... I tried it out and to my surprise
well,
the reason is simple:  
There's been a nice  identify.hclust() method for a long  time 
and this is mentioned (including a link to the page) on the 
?hclust page.

ChrisH it behaved as you said, i.e., it indeed does
ChrisH something at least similar to what you want it to
ChrisH do, and that might be useful also for me. However, I
ChrisH wonder where you got the information that identify
ChrisH could be suitable to obtain the hclust clusters.

(see above) --- 
 you see: It *does* pay to read documentation carefully

ChrisH Puzzled,
ChrisH Christian

ChrisH PS: It seems that each value is typed twice because
ChrisH classi is named, and each value is also a name. Try
ChrisH as.vector(classi). (Perhaps a little useful help in
ChrisH the end?)

or unname(classi) -- which is slightly more expressive in this
case and possibly more desirable in other situations.

Martin Maechler, ETH Zurich


ChrisH On Fri, 15 Oct 2004, James Foadi wrote:

 Hello. I wonder if anyone can help me with this.
 
 I'm performing cluster analysis by using hclust in stats package.
 My data are contained in a data frame with 10 columns, named drops.
 
 Firs I create a distance matrix using dist:
 
 distanxe - dist(drops)
 
 Then I perform cluster analysis via hclust:
 
 clusters - hclust(distanze)
 
 At this point I want to view the tree plot, and use plot:
 
 plot(clusters)
 
 Then, once decided which clusters to select, I start identify:
 
 classi - identify(clusters)
 
 and click on all clusters to be selected; I then finish by right-clicking.
 
 My understanding is that classi is now a list containing all individual 
 data, grouped in clusters. In my case classi contained 10 objects,
 simply named [1], [2], etc.
 
 To obtain all individual data belonging to one object I thought that
 would have sufficed to type for instance:
 
 classe_01 - classi[[1]]
 
 Unfortunately, rather than obtaining a vector, I obtain a numeric where
 each value is typed twice.
 
 Can anyone explain why, or what I've done wrong?
 
 Many thanks,
 
 james
 -- 
 Dr James Foadi
 Structural Biology Laboratory
 Department of Chemistry
 University of York
 YORK YO10 5YW
 UK

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] cluster analysis

2004-10-15 Thread James Foadi
On Friday 15 Oct 2004 10:43 am, you wrote:

 PS: It seems that each value is typed twice because classi is named, and
 each value is also a name. Try as.vector(classi). (Perhaps a little useful
 help in the end?)

Indeed. I have tried, for example:

as.vector(classi[[1]])

and obtained only one set of values. For some strange reason
each object of list classi is a named vector where the name of each
component is the component itself.

By the way, the cutree function you suggested is even more useful
for what I want to do.

The info on identify() can easily be obtained using help(hclust); you'll
find it at the end of the help page.


Many thanks, Christian !

J

-- 
Dr James Foadi
Structural Biology Laboratory
Department of Chemistry
University of York
YORK YO10 5YW
UK

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] cluster analysis

2004-10-15 Thread James Foadi
On Friday 15 Oct 2004 11:02 am, you wrote:


 or unname(classi) -- which is slightly more expressive in this
 case and possibly more desirable in other situations.

 Martin Maechler, ETH Zurich


Thanks, Martin.
I've tried, like you suggested:

un_classi - unname(classi)

but nothing changed. By typing, for instance:

un_classi[[1]]

I still obtained twice the values. But, if I type:

un_classe_01 - unname(classi[[1]])

the un_classe_01 is an unnamed vector.

Cheers,

james
-- 
Dr James Foadi
Structural Biology Laboratory
Department of Chemistry
University of York
YORK YO10 5YW
UK

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] cluster analysis

2004-10-15 Thread Martin Maechler
 James == James Foadi [EMAIL PROTECTED]
 on Fri, 15 Oct 2004 11:36:14 +0100 writes:

James On Friday 15 Oct 2004 11:02 am, you wrote:
 
 or unname(classi) -- which is slightly more expressive in this
 case and possibly more desirable in other situations.
 
 Martin Maechler, ETH Zurich
 

James Thanks, Martin.
James I've tried, like you suggested:

James un_classi - unname(classi)

James but nothing changed. By typing, for instance:

James un_classi[[1]]

of course -- I just chimed in with Christian who proposed as.vector(.)
Since your 'classi' is a list with named vector as components,
you'd need something like

  un_classi - lapply(classi, unname)

I'm sorry to have added more confusion.
OTOH, really, I think you should learn a bit more about basic
manipulation of R objects and study something like
An Introduction to R.

Regards, Martin


James I still obtained twice the values. But, if I type:

James un_classe_01 - unname(classi[[1]])

James the un_classe_01 is an unnamed vector.

(exactly, since it works on the *component* of a list)

James Cheers,

James james

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] cluster analysis and null hypothesis testing

2004-09-15 Thread Christian Hennig
Hi,

testing the randomness of a cluster analysis is not a well defined
problem, because it depends crucially on your null model. In fpc, there is
nothing like this. Function prabtest in package prabclus performs such a
test, but this is for a particular data structure, namely presence-absence
data in biogeography. 

In principle, a Monte Carlo test can be constructed (and thus implemented in
R) as follows:

1) You need a null model H_0, from which you generate data.
2) You need a test statistic T.
3) Compute T on your data (call it T_0).
4) Repeat k times:
 a) Generate data from H_0
 b) Compute T on the generated data.
5) The p-value is (K+1)/(k+1), where K is the number of generated datasets
   for which T=T_0 (given that T small indicates the tendency of
   clustering). 

Standard choices for H_0 will be a normal or uniform distribution. (In
prabtest, it is a complicated distribution on presence-absence data.)
There are lots of possible choices of T. prabtest uses the ratio between  
the 25% smallest distances in the dataset and the 25% largest distances.
This should be reasonable in fairly general settings. For a discussion of
this and alternative choices (and references on them), you may take a look
into 

C. Hennig and B. Hausdorf:  Distance-based parametric bootstrap tests for
clustering of species ranges,  Computational
Statistics and Data Analysis 45 (2004), 875-896.

A preprint of this can be obtained from my web page.

If you want to test the significance of a solution from a particular cluster
analysis method, you should think about choosing T so that it is somehow
connected to the method. (In the Hennig and Hausdorf paper, there are for
example two alternatives discussed that are connected to Single Linkage.)

Best,
Christian 

On Wed, 15 Sep 2004, Patrick Giraudoux wrote:

 Hi,
 
 I am wondering if a Monte Carlo method (or equivalent) exist permitting to test the 
 randomness of a cluster analysis (eg got by
 hclust(). I went through the package fpc (maybe too superficially) but dit not 
 find such method.
 
 Thanks for any hint,
 
 Patrick Giraudoux
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

***
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] cluster analysis and null hypothesis testing

2004-09-14 Thread Patrick Giraudoux
Hi,

I am wondering if a Monte Carlo method (or equivalent) exist permitting to test the 
randomness of a cluster analysis (eg got by
hclust(). I went through the package fpc (maybe too superficially) but dit not find 
such method.

Thanks for any hint,

Patrick Giraudoux

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Cluster Analysis with minimum cluster size?

2004-03-26 Thread Danny Heuman
Hi all,

Is it possible to run kmeans, pam or clara with a constraint such that
no resulting cluster has fewer than X cases?

These kmeans algorithms often find clusters that are too small for my
use.  There are usually a few clusters with 1-10 cases (generally
substantial outliers).  I then have to manually assign the small ones
to other sizable clusters.

If this doesn't exist, it there such an algorithm that does this?

Thanks,

Danny

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] cluster analysis

2003-06-27 Thread xiaoqin huang
Is there anyone who would like to give me some examples of plots or data 
frames on clustering anaylis?
if so, great thanks in advance!
Files can be sent to my big mail box as [EMAIL PROTECTED]
I want t operform cluster analysis on a set of data, the data is composed of 
time-evolution rms deviations, this is a N dimensional matrix with N(N-1) 
independent components.

thanks!

my name is xiaoqin huang, I am in CSIT of Florida State University.
CSIT=computational science and information technology
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Cluster analysis

2003-03-07 Thread Christian Hennig
Hi,

it seems that you mix something up. hclust is for dissimilarity based
hierarchical cluster analysis, which has nothing to do with R squared,
Pseudo F
Informative output about the clustering is given as value of the hclust
object, function cutree may help to extract a concrete clustering at some
level of the hierarchy.
Maybe you do not start with dissimilarity data and you might consider pam
(in library cluster), kmeans or the library mclust for Normal mixtures.
However, the statistics values you are looking for are not the primary
quantities of interest in clustering, regardless of the method.

Christian Hennig

On Fri, 7 Mar 2003, Pierre-Olivier Chasset wrote:

 Hello,
 
 I would like to calculate a cluster analysis and I use the function 'hclust'.
 I have seen the GRAPHICAL results of this function with 'plot'.
 I would like to analyse this cluster but I don't know how to see the NUMERICAL 
 results of each
 step of this cluster like:
 - R Squared
 - Pseudo F
 - Pseudo t**2
 Thank you for any help,
 
 Pierre-Olivier Chasset
 
 =
 Pierre-Olivier Chasset 
 41, rue de la course 
 F-67000 Strasbourg 
 Phone: +33 3 88 32 06 42
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 

-- 
***
Christian Hennig
Seminar fuer Statistik, ETH-Zentrum (LEO), CH-8092 Zuerich (currently)
and Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://stat.ethz.ch/~hennig/
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag.de

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help