[R] cluster analysis
Dear R-help, In performing cluster analysis (packages: hopach, cluster, boot, and many others), I got these errors: makeoutput(kidney, gene.hobj, bobj, file= kidney.out, gene.names= gene.acc) Error: could not find function makeoutput boot2fuzzy(kidney, bobj, gene.hobj, array.hobj, file= kidneyFzy, gene.names= gene.desc) Error: could not find function boot2fuzzy It seems that I am missing to include a library that contains the functions: makeoutput and boot2fuzzy or perhaps these function names are outdated in the newest versions of these packages. I hope they are not typographical errors in the reference that I am trying to use: Gentleman et al., 2005. Bioinformatics and computational biology solutions using R and Bioconductor. Pgs: 226-227. Thank you very much for your help. Roger Roger L. Vallejo, Ph.D. Computational Biologist Geneticist U.S. Department of Agriculture, ARS National Center for Cool Cold Water Aquaculture 11861 Leetown Road Kearneysville, WV 25430 Voice:(304) 724-8340 Ext. 2141 Email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cluster analysis under contiguity constraints with R ?
Hello, I would like to know if there is a function in an R library that allows to do cluster analysis under contiguity constraints ? Thank you very much for your answer ! Lise Bellanger -- Lise Bellanger, Université de Nantes Département de Mathématiques, Laboratoire Jean Leray UMR CNRS 6629 2, Rue de la Houssinière BP 92208 - F-44322 Nantes Cedex 03 Tél. : (33|0) 2 51 12 59 00 (ou 43) - Fax : (33|0) 2 51 12 59 12 E-Mail : [EMAIL PROTECTED] URL : http://www.math.sciences.univ-nantes.fr/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cluster analysis using Dmax
Dear All, a long time ago I ran a cluster analysis where the dissimilarity matrix used consisted of Dmax (or Kolmogorov-Smirnov distance) values. In other words the maximum difference between two cumulative proportion curves. This all worked very well indeed. The matrix was calculated using Dbase III+ and took a day and a half and the clustering was done using MV-ARCH, with the resultant dendrogram converted from HP Plotter language to PostScript manually. As you might guess, I'd like to be able to do this more efficiently in R. I have looked through the various help files and found that some of the clustering routines will take a dissimilarity matrix as input (yay!). My questions (as a very novice R user) are: a) how would one go about calculating the matrix of Dmax/KS distance values? b) of the many clustering packages (I'll be doing a simple average link hierarchical clustering) is there one where I can ask: If I 'cut' the dendrogram at the 0.x dissimilarity level, which items are in which clusters? (As my dataset has over 200 items this is non-trivial to work out manually). Many thanks indeed for your help. Kris Lockyear. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cluster analysis using Dmax
Dear Kris, a) how would one go about calculating the matrix of Dmax/KS distance values? Hmm, I'd implement this directly by comparing the curves on a dense sequence of equidistant points over a given value range (hope you know a suitable one) and looking for the maximum difference... b) of the many clustering packages (I'll be doing a simple average link hierarchical clustering) is there one where I can ask: If I 'cut' the dendrogram at the 0.x dissimilarity level, which items are in which clusters? (As my dataset has over 200 items this is non-trivial to work out manually). ?cutree Best, Christian Many thanks indeed for your help. Kris Lockyear. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 [EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cluster analysis of microarray data
Hi list, I am interested in cluster analysis of microarray data. The data was generated using cDNA method and a loop design. I was wondering if any one has a suggestion about which package I can use to analyse such data. Many thanks in advance Mahdi -- --- Mahdi Osman (PhD) E-mail: [EMAIL PROTECTED] --- Echte DSL-Flatrate dauerhaft für 0,- Euro*. Nur noch kurze Zeit! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cluster analysis of microarray data
Mahdi Osman [EMAIL PROTECTED] writes: Hi list, I am interested in cluster analysis of microarray data. The data was generated using cDNA method and a loop design. I was wondering if any one has a suggestion about which package I can use to analyse such data. There are many packages within the Bioconductor project that provide tools for analysis of microarray data. I would start by taking a look at the Microarray and TwoChannel BiocViews: http://bioconductor.org/packages/1.8/Microarray.html http://bioconductor.org/packages/1.8/TwoChannel.html + seth __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cluster Analysis with flexible beta linkage method
Wade == Wade Wall [EMAIL PROTECTED] on Fri, 14 Jul 2006 10:10:11 -0400 writes: Wade I am trying to run a cluster analysis using Sorenson Wade (Bray-Curtis) distance measure with flexible beta Wade linkage method. However, I can't seem to find Wade flexible beta in any of the functions/packages I have Wade looked at. Maybe you explain what the above are, rather than us having to look up the information ? Wade Any help would be appreciated. Wade [[alternative HTML version deleted]] would not appear here, had you read and followed the posting guide : Wade PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Cluster Analysis with flexible beta linkage method
Hi all, I am trying to run a cluster analysis using Sorenson (Bray-Curtis) distance measure with flexible beta linkage method. However, I can't seem to find flexible beta in any of the functions/packages I have looked at. Any help would be appreciated. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Cluster Analysis - Number of Clusters
Hello, I'm playing around with cluster analysis, and am looking for methods to select the number of clusters. I am aware of methods based on a 'pseudo F' or a 'pseudo T^2'. Are there packages in R that will generate these statistics, and/or other statistics to aid in cluster number selection? Thanks, John. -- === Dr. John Janmaat Tel: 902-585-1461 Department of EconomicsFax: 902-585-1070 Acadia University Email: [EMAIL PROTECTED] Wolfville, Nova Scotia, Canada.Web: ace.acadiau.ca/~jjanmaat/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cluster Analysis - Number of Clusters
Have you checked the amap package? It has been updated just recently and if I am not wrong there is a method which indicates the best number of k groups for your data. Best wishes, P. Olsson 2006/2/5, John Janmaat [EMAIL PROTECTED]: Hello, I'm playing around with cluster analysis, and am looking for methods to select the number of clusters. I am aware of methods based on a 'pseudo F' or a 'pseudo T^2'. Are there packages in R that will generate these statistics, and/or other statistics to aid in cluster number selection? Thanks, John. -- === Dr. John Janmaat Tel: 902-585-1461 Department of EconomicsFax: 902-585-1070 Acadia University Email: [EMAIL PROTECTED] Wolfville, Nova Scotia, Canada.Web: ace.acadiau.ca/~jjanmaat/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cluster Analysis - Number of Clusters
Hi, as said before, some statistics to estimate the number of clusters are in the cluster.stats function of package fpc. These are distance-based, not pseudo F or T^2. They are documented in the book of Gordon (1999) Classification (see ?cluster.stats for more references). It also includes the average silhouette width of Kaufman and Rousseeuw (1990) (exact reference in ?plot.agnes), which is also part of the output of some functions in package cluster (pam, agnes,...?). An alternative way to estimate the number of clusters is the use of the BIC together with a (normal) mixture model, see package mclust. Best, Christian On Sun, 5 Feb 2006, John Janmaat wrote: Hello, I'm playing around with cluster analysis, and am looking for methods to select the number of clusters. I am aware of methods based on a 'pseudo F' or a 'pseudo T^2'. Are there packages in R that will generate these statistics, and/or other statistics to aid in cluster number selection? Thanks, John. -- === Dr. John Janmaat Tel: 902-585-1461 Department of EconomicsFax: 902-585-1070 Acadia University Email: [EMAIL PROTECTED] Wolfville, Nova Scotia, Canada.Web: ace.acadiau.ca/~jjanmaat/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 [EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cluster Analysis - Number of Clusters
Dear John, You can play around with cluster.stats function in library fpc, e.g. you can try: library(fpc) library(cluster) data(xclara) dM - dist(xclara) cl - vector() for(i in 2:7){ cl[i] - cluster.stats(d=dM, clustering=clara(d,i)$cluster, silhouette=FALSE)$wb.ratio } plot(1:6,cl[2:7], xaxt=n) axis(1, at=1:6, labels=2:7) (..takes some minutes time) indicates that 3 clusters are optimal for this data. Best, Matthias Hello, I'm playing around with cluster analysis, and am looking for methods to select the number of clusters. I am aware of methods based on a 'pseudo F' or a 'pseudo T^2'. Are there packages in R that will generate these statistics, and/or other statistics to aid in cluster number selection? Thanks, John. -- == = Dr. John Janmaat Tel: 902-585-1461 Department of EconomicsFax: 902-585-1070 Acadia University Email: [EMAIL PROTECTED] Wolfville, Nova Scotia, Canada.Web: ace.acadiau.ca/~jjanmaat/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Cluster Analysis
Hello, I'm trying some cluster analysis, using the hclust command. I am looking for some help in selecting the 'best' number of clusters. Some software reports pseudo-F and pseudo-T^2 statistics, for each cluster merge. Is there any way to generate such statistics simply in R? Thanks, John. Dr. John Janmaat Tel: 902-585-1461 Department of EconomicsFax: 902-585-1461 Acadia University, Email: [EMAIL PROTECTED] Wolfville, Nova Scotia, Canada.Web: ace.acadiau.ca/~jjanmaat/ B4P 1H5 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cluster Analysis
Le 05.02.2006 17:50, John Janmaat a écrit : Hello, I'm trying some cluster analysis, using the hclust command. I am looking for some help in selecting the 'best' number of clusters. Some software reports pseudo-F and pseudo-T^2 statistics, for each cluster merge. Is there any way to generate such statistics simply in R? Thanks, John. Dr. John Janmaat Tel: 902-585-1461 Department of EconomicsFax: 902-585-1461 Acadia University, Email: [EMAIL PROTECTED] Wolfville, Nova Scotia, Canada.Web: ace.acadiau.ca/~jjanmaat/ B4P 1H5 Hi, The package fpc have things like that. Romain -- visit the R Graph Gallery : http://addictedtor.free.fr/graphiques mixmod 1.7 is released : http://www-math.univ-fcomte.fr/mixmod/index.php +---+ | Romain FRANCOIS - http://francoisromain.free.fr | | Doctorant INRIA Futurs / EDF | +---+ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] cluster analysis for 80000 observations
Markus == Markus Preisetanz [EMAIL PROTECTED] on Thu, 26 Jan 2006 20:48:29 +0100 writes: Markus Dear R Specialists, Markus when trying to cluster a data.frame with about 80.000 rows and 25 columns I get the above error message. I tried hclust (using dist), agnes (entering the data.frame directly) and pam (entering the data.frame directly). What I actually do not want to do is generate a random sample from the data. Currently all the above mentioned cluster methods work with full distance / dissimilarity objects, even if only internally, i.e. they store all d_{i,j} for 1 = i j = n, i.e. n(n-1)/2 values, also each of them in double precision, i.e. 8 bytes. So: no chance with the above functions and n=80'000 Markus The machine I run R on is a Windows 2000 Server (Pentium 4) with 2 GB of RAM. If you would run an machine with a 64-bit version of OS and R {typical case today: Linux on AMD Opteron}, you could go up quite a bit higher than on your Windoze box, {I vaguely remember I could do 'n = a few thousand' on our dual opteron with 16 GBytes}, but 80'000 is definitely too large. OTOH, there is clara() in the cluster package, which has been designed for such situations, CLARA:= [C]lustering [LAR]ge [A]pplications. It is similar in spirit to pam(), *does* cluster all 80'000 observations but does so by taking sub samples to construct the medoids. (and you can ask it to take many medium size subsamples, instead of just 5 small sized ones as it does by default). Martin Maechler, ETH Zurich maintainer of cluster package. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] cluster analysis: error in vector(do uble, length): given vector size is too bi g {Fehler in vector(double, length) : angegebene Vektorgröße ist zu groß}
Dear R Specialists, when trying to cluster a data.frame with about 80.000 rows and 25 columns I get the above error message. I tried hclust (using dist), agnes (entering the data.frame directly) and pam (entering the data.frame directly). What I actually do not want to do is generate a random sample from the data. The machine I run R on is a Windows 2000 Server (Pentium 4) with 2 GB of RAM. Does anybody know what to do? Sincerely ___ Markus Preisetanz Consultant Client Vela GmbH Albert-Roßhaupter-Str. 32 81369 München fon: +49 (0) 89 742 17-113 fax: +49 (0) 89 742 17-150 mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet. This e-mail may contain confidential and/or privileged infor...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] cluster analysis: error in v ector(double, length): given vector size is too big {Fehler in vector(double, length ) : angegebene Vektorgröße ist zu groß}
Let's do some simple calculation: The dist object from a data set with 8 cases would have 8 * (8 - 1) / 2 elements, each takes 8 bytes to be stored in double precision. That's over 24GB if my arithmetic isn't too flaky. You'd have a devil of a time trying to do this on a 64-bit machine with 32GB RAM, let alone what you are using. You'd have much better chance sticking with algorithms that do not require storage of the (dis)similarity matrix. Andy From: Markus Preisetanz Dear R Specialists, when trying to cluster a data.frame with about 80.000 rows and 25 columns I get the above error message. I tried hclust (using dist), agnes (entering the data.frame directly) and pam (entering the data.frame directly). What I actually do not want to do is generate a random sample from the data. The machine I run R on is a Windows 2000 Server (Pentium 4) with 2 GB of RAM. Does anybody know what to do? Sincerely ___ Markus Preisetanz Consultant Client Vela GmbH Albert-Roßhaupter-Str. 32 81369 München fon: +49 (0) 89 742 17-113 fax: +49 (0) 89 742 17-150 mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet. This e-mail may contain confidential and/or privileged infor...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] CLuster analysis with only nominal variables
Hi All, I am wondering if there is any literature or any prior implementations of cluster analysis for only nominal (categorical) variables for a large dataset, apprx 20,000 rows with 15 variables. I came across one or two such implementations, but they seem to assume certain data distributions. Thank you, Nagu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] cluster analysis question
Hi, Im using hclust to make a cluster analysis in Q mode, but I have too many objects (observations) and its difficult to identify them in the plot. Id like to get a list with the objects ordered in the same way they appear in the cluster. I have already tried order, labels and merge but I couldnt get the result I want. Thanks for any help, Antonio Olinto - WebMail Bignet - O seu provedor do litoral www.bignet.com.br __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cluster analysis using EM algorithm
Hi! Take a look at the packages mclust and flexmix! They use the EM algorithm for mixture modelling, sometimes called model based cluster analysis. Best, Christian On Wed, 26 Jan 2005 [EMAIL PROTECTED] wrote: Hi, I am looking for a package to do the clustering analysis using the expectation maximization algorithm. Thanks in advance. Ming __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html *** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg [EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/ ### ich empfehle www.boag-online.de __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Cluster analysis using EM algorithm
Hi, I am looking for a package to do the clustering analysis using the expectation maximization algorithm. Thanks in advance. Ming __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Cluster Analysis: Density-Based Method
Hi people, Does anybody know some Density-Based Method for clustering implemented in R? Thanks, Fernando Prass ___ Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! http://br.acesso.yahoo.com/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cluster Analysis: Density-Based Method
Fernando Prass wrote: Hi people, Does anybody know some Density-Based Method for clustering implemented in R? Have you looked at CRAN package mclust? Thanks, Fernando Prass ___ Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! http://br.acesso.yahoo.com/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cluster Analysis: Density-Based Method
Yes, but mclust don't have a density-based algorithm. Mclust have the algorithm BIC, that is a model-based method... Fernando Prass --- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] escreveu: Fernando Prass wrote: Hi people, Does anybody know some Density-Based Method for clustering implemented in R? Have you looked at CRAN package mclust? Thanks, Fernando Prass ___ Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! http://br.acesso.yahoo.com/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cluster Analysis: Density-Based Method
maybe ?kmeans is what you're looking for ... ingmar On 10/21/04 2:47 PM, Fernando Prass [EMAIL PROTECTED] wrote: Yes, but mclust don't have a density-based algorithm. Mclust have the algorithm BIC, that is a model-based method... Fernando Prass --- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] escreveu: Fernando Prass wrote: Hi people, Does anybody know some Density-Based Method for clustering implemented in R? Have you looked at CRAN package mclust? Thanks, Fernando Prass ___ Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! http://br.acesso.yahoo.com/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cluster Analysis: Density-Based Method
No, kmeans is a partition method. I need a model-based method, like DBSCAN or DENCLUE algorithm... Fernando Prass --- Ingmar Visser [EMAIL PROTECTED] escreveu: maybe ?kmeans is what you're looking for ... ingmar On 10/21/04 2:47 PM, Fernando Prass [EMAIL PROTECTED] wrote: Yes, but mclust don't have a density-based algorithm. Mclust have the algorithm BIC, that is a model-based method... Fernando Prass --- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] escreveu: Fernando Prass wrote: Hi people, Does anybody know some Density-Based Method for clustering implemented in R? Have you looked at CRAN package mclust? Thanks, Fernando Prass ___ Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! http://br.acesso.yahoo.com/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ___ Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! http://br.acesso.yahoo.com/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Cluster Analysis: Density-Based Method
I'm no expert in this, but mclust is `density-based' because it estimates the density with a mixture of Gaussians. If this is not what you want, you should clarify what you mean by `density-based'. Do you mean an algorithm based on kernel estimator of the density? Andy From: Fernando Prass Yes, but mclust don't have a density-based algorithm. Mclust have the algorithm BIC, that is a model-based method... Fernando Prass --- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] escreveu: Fernando Prass wrote: Hi people, Does anybody know some Density-Based Method for clustering implemented in R? Have you looked at CRAN package mclust? Thanks, Fernando Prass ___ Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! http://br.acesso.yahoo.com/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cluster Analysis: Density-Based Method
Dear Fernando, below you find a DBSCAN function I wrote for my own purposes. It comes with no warranty and without proper documentation, but I followed the notation of the original KDD-96 DBSCAN paper. For large data sets, it may be slow. Best, Christian On Thu, 21 Oct 2004, Fernando Prass wrote: No, kmeans is a partition method. I need a model-based method, like DBSCAN or DENCLUE algorithm... Fernando Prass distvector - function(x,data){ ddata - t(data)-x dv - apply(ddata^2,2,sum) } # data may be nxp or distance matrix # eps is the dbscan distance cutoff parameter # MinPts is the minimum size of a cluster # scale: Should the data be scaled? # distances: has to be TRUE if data is a distance matrix # showplot: Should the computation process be visualized? # countmode: dbscan gives messages when processing point no. (countmode) dbscan - function(data,eps,MinPts=5, scale=FALSE, distances=FALSE, showplot=FALSE, countmode=c(1,2,3,5,10,100,1000,5000,1,5)){ data - as.matrix(data) n - nrow(data) if (scale) data - scale(data) unregpoints - rep(0,n) e2 - eps^2 cv - rep(0,n) cn - 0 i - 1 for (i in 1:n){ if (i %in% countmode) cat(Processing point , i, of ,n, .\n) unclass - cv1 if (cv[i]==0){ if (distances) seeds - data[i,]=eps else{ seeds - rep(FALSE,n) seeds[unclass] - distvector(data[i,],data[unclass,])=e2 } if (sum(seeds)+unregpoints[i]MinPts) cv[i] - (-1) else{ cn - cn+1 cv[i] - cn seeds[i] - unclass[i] - FALSE unregpoints[seeds] - unregpoints[seeds]+1 while (sum(seeds)0){ if (showplot) plot(data,col=1+cv) unclass[seeds] - FALSE cv[seeds] - cn ap - (1:n)[seeds] # print(ap) seeds - rep(FALSE,n) for (j in ap){ #if (showplot) plot(data,col=1+cv) jseeds - rep(FALSE,n) if (distances) jseeds[unclass] - data[j,unclass]=eps else{ jseeds[unclass] - distvector(data[j,],data[unclass,])=e2 } unregpoints[jseeds] - unregpoints[jseeds]+1 #if (cn==1) # cat(j, sum seeds=,sum(seeds), unreg=,unregpoints[j], # newseeds=,sum(cv[jseeds]==0),\n) if (sum(jseeds)+unregpoints[j]=MinPts){ seeds[jseeds] - cv[jseeds]==0 cv[jseeds cv0] - cn } } # for j } # while sum seeds0 } # else (sum seeds + ... = MinPts) } # if cv==0 } # for i if (sum(cv==(-1))0){ noisenumber - cn+1 cv[cv==(-1)] - noisenumber } else noisenumber - FALSE out - list(classification=cv, noisenumber=noisenumber, eps=eps, MinPts=MinPts, unregpoints=unregpoints) out } # dbscan # classification: classification vector # noisenumber: number in the classification vector indicating noise points # unregpoints: ignore... *** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg [EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/ ### ich empfehle www.boag-online.de __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Cluster Analysis: Density-Based Method
AndyL == Liaw, Andy [EMAIL PROTECTED] on Thu, 21 Oct 2004 09:18:54 -0400 writes: AndyL I'm no expert in this, but mclust is `density-based' AndyL because it estimates the density with a mixture of AndyL Gaussians. If this is not what you want, you should AndyL clarify what you mean by `density-based'. Do you AndyL mean an algorithm based on kernel estimator of the density? yes, kernel or other nonparametric density estimator, is what is usually meant in these contexts. [ Of course, many nonparametric estimators can be seen to live in finite-dimensional spaces, so the difference to an explicit flexible / high dimensional method isn't that big.. ] Martin From: Fernando Prass Yes, but mclust don't have a density-based algorithm. Mclust have the algorithm BIC, that is a model-based method... Fernando Prass --- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] escreveu: Fernando Prass wrote: Hi people, Does anybody know some Density-Based Method for clustering implemented in R? Have you looked at CRAN package mclust? Thanks, Fernando Prass ___ Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! http://br.acesso.yahoo.com/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html AndyL __ AndyL [EMAIL PROTECTED] mailing list AndyL https://stat.ethz.ch/mailman/listinfo/r-help AndyL PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Cluster Analysis: Density-Based Method
From: Martin Maechler AndyL == Liaw, Andy [EMAIL PROTECTED] on Thu, 21 Oct 2004 09:18:54 -0400 writes: AndyL I'm no expert in this, but mclust is `density-based' AndyL because it estimates the density with a mixture of AndyL Gaussians. If this is not what you want, you should AndyL clarify what you mean by `density-based'. Do you AndyL mean an algorithm based on kernel estimator of the density? yes, kernel or other nonparametric density estimator, is what is usually meant in these contexts. [ Of course, many nonparametric estimators can be seen to live in finite-dimensional spaces, so the difference to an explicit flexible / high dimensional method isn't that big.. ] Martin Yes. However, after reading ftp://ftp.stat.rice.edu/pub/scottdw/TECH/ipra.ps (David Scott's `From Kernels to Mixtures' published in Technometrics in 2000, I believe the Tukey memorial issue) I thought the line between kernel densities and mixture models is rather gray... Best, Andy From: Fernando Prass Yes, but mclust don't have a density-based algorithm. Mclust have the algorithm BIC, that is a model-based method... Fernando Prass --- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] escreveu: Fernando Prass wrote: Hi people, Does anybody know some Density-Based Method for clustering implemented in R? Have you looked at CRAN package mclust? Thanks, Fernando Prass ___ Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o discador agora! http://br.acesso.yahoo.com/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html AndyL __ AndyL [EMAIL PROTECTED] mailing list AndyL https://stat.ethz.ch/mailman/listinfo/r-help AndyL PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Cluster Analysis: Density-Based Method
Andy, I can be wrong, I'm no expert too, but density estimation is different of density-model. MClust is a model-basead method because use model statistics from clustering data (more information in ftp://ftp.u.washington.edu/public/mclust/tr415R.pdf). I need some package that implement algorithms like OPTICIS, DBSCAN or DENCLUE... Fernando Prass --- Liaw, Andy [EMAIL PROTECTED] escreveu: I'm no expert in this, but mclust is `density-based' because it estimates the density with a mixture of Gaussians. If this is not what you want, you should clarify what you mean by `density-based'. Do you mean an algorithm based on kernel estimator of the density? Andy From: Fernando Prass Yes, but mclust don't have a density-based algorithm. Mclust have the algorithm BIC, that is a model-based method... Fernando Prass __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] cluster analysis
Dear James, sorry, this is not really an answer. I use cutree to obtain clusters from an hclust object. I do not get from the identify help page that identify should do anything like what you expect it to do... I tried it out and to my surprise it behaved as you said, i.e., it indeed does something at least similar to what you want it to do, and that might be useful also for me. However, I wonder where you got the information that identify could be suitable to obtain the hclust clusters. Puzzled, Christian PS: It seems that each value is typed twice because classi is named, and each value is also a name. Try as.vector(classi). (Perhaps a little useful help in the end?) On Fri, 15 Oct 2004, James Foadi wrote: Hello. I wonder if anyone can help me with this. I'm performing cluster analysis by using hclust in stats package. My data are contained in a data frame with 10 columns, named drops. Firs I create a distance matrix using dist: distanxe - dist(drops) Then I perform cluster analysis via hclust: clusters - hclust(distanze) At this point I want to view the tree plot, and use plot: plot(clusters) Then, once decided which clusters to select, I start identify: classi - identify(clusters) and click on all clusters to be selected; I then finish by right-clicking. My understanding is that classi is now a list containing all individual data, grouped in clusters. In my case classi contained 10 objects, simply named [1], [2], etc. To obtain all individual data belonging to one object I thought that would have sufficed to type for instance: classe_01 - classi[[1]] Unfortunately, rather than obtaining a vector, I obtain a numeric where each value is typed twice. Can anyone explain why, or what I've done wrong? Many thanks, james -- Dr James Foadi Structural Biology Laboratory Department of Chemistry University of York YORK YO10 5YW UK __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html *** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg [EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/ ### ich empfehle www.boag-online.de __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] cluster analysis
ChrisH == Christian Hennig [EMAIL PROTECTED] on Fri, 15 Oct 2004 11:43:53 +0200 (MEST) writes: ChrisH Dear James, ChrisH sorry, this is not really an answer. nor this. I'm answering Christian... ChrisH I use cutree to obtain clusters from an hclust ChrisH object. I do not get from the identify help page ChrisH that identify should do anything like what you ChrisH expect it to do... I tried it out and to my surprise well, the reason is simple: There's been a nice identify.hclust() method for a long time and this is mentioned (including a link to the page) on the ?hclust page. ChrisH it behaved as you said, i.e., it indeed does ChrisH something at least similar to what you want it to ChrisH do, and that might be useful also for me. However, I ChrisH wonder where you got the information that identify ChrisH could be suitable to obtain the hclust clusters. (see above) --- you see: It *does* pay to read documentation carefully ChrisH Puzzled, ChrisH Christian ChrisH PS: It seems that each value is typed twice because ChrisH classi is named, and each value is also a name. Try ChrisH as.vector(classi). (Perhaps a little useful help in ChrisH the end?) or unname(classi) -- which is slightly more expressive in this case and possibly more desirable in other situations. Martin Maechler, ETH Zurich ChrisH On Fri, 15 Oct 2004, James Foadi wrote: Hello. I wonder if anyone can help me with this. I'm performing cluster analysis by using hclust in stats package. My data are contained in a data frame with 10 columns, named drops. Firs I create a distance matrix using dist: distanxe - dist(drops) Then I perform cluster analysis via hclust: clusters - hclust(distanze) At this point I want to view the tree plot, and use plot: plot(clusters) Then, once decided which clusters to select, I start identify: classi - identify(clusters) and click on all clusters to be selected; I then finish by right-clicking. My understanding is that classi is now a list containing all individual data, grouped in clusters. In my case classi contained 10 objects, simply named [1], [2], etc. To obtain all individual data belonging to one object I thought that would have sufficed to type for instance: classe_01 - classi[[1]] Unfortunately, rather than obtaining a vector, I obtain a numeric where each value is typed twice. Can anyone explain why, or what I've done wrong? Many thanks, james -- Dr James Foadi Structural Biology Laboratory Department of Chemistry University of York YORK YO10 5YW UK __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] cluster analysis
On Friday 15 Oct 2004 10:43 am, you wrote: PS: It seems that each value is typed twice because classi is named, and each value is also a name. Try as.vector(classi). (Perhaps a little useful help in the end?) Indeed. I have tried, for example: as.vector(classi[[1]]) and obtained only one set of values. For some strange reason each object of list classi is a named vector where the name of each component is the component itself. By the way, the cutree function you suggested is even more useful for what I want to do. The info on identify() can easily be obtained using help(hclust); you'll find it at the end of the help page. Many thanks, Christian ! J -- Dr James Foadi Structural Biology Laboratory Department of Chemistry University of York YORK YO10 5YW UK __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] cluster analysis
On Friday 15 Oct 2004 11:02 am, you wrote: or unname(classi) -- which is slightly more expressive in this case and possibly more desirable in other situations. Martin Maechler, ETH Zurich Thanks, Martin. I've tried, like you suggested: un_classi - unname(classi) but nothing changed. By typing, for instance: un_classi[[1]] I still obtained twice the values. But, if I type: un_classe_01 - unname(classi[[1]]) the un_classe_01 is an unnamed vector. Cheers, james -- Dr James Foadi Structural Biology Laboratory Department of Chemistry University of York YORK YO10 5YW UK __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] cluster analysis
James == James Foadi [EMAIL PROTECTED] on Fri, 15 Oct 2004 11:36:14 +0100 writes: James On Friday 15 Oct 2004 11:02 am, you wrote: or unname(classi) -- which is slightly more expressive in this case and possibly more desirable in other situations. Martin Maechler, ETH Zurich James Thanks, Martin. James I've tried, like you suggested: James un_classi - unname(classi) James but nothing changed. By typing, for instance: James un_classi[[1]] of course -- I just chimed in with Christian who proposed as.vector(.) Since your 'classi' is a list with named vector as components, you'd need something like un_classi - lapply(classi, unname) I'm sorry to have added more confusion. OTOH, really, I think you should learn a bit more about basic manipulation of R objects and study something like An Introduction to R. Regards, Martin James I still obtained twice the values. But, if I type: James un_classe_01 - unname(classi[[1]]) James the un_classe_01 is an unnamed vector. (exactly, since it works on the *component* of a list) James Cheers, James james __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] cluster analysis and null hypothesis testing
Hi, testing the randomness of a cluster analysis is not a well defined problem, because it depends crucially on your null model. In fpc, there is nothing like this. Function prabtest in package prabclus performs such a test, but this is for a particular data structure, namely presence-absence data in biogeography. In principle, a Monte Carlo test can be constructed (and thus implemented in R) as follows: 1) You need a null model H_0, from which you generate data. 2) You need a test statistic T. 3) Compute T on your data (call it T_0). 4) Repeat k times: a) Generate data from H_0 b) Compute T on the generated data. 5) The p-value is (K+1)/(k+1), where K is the number of generated datasets for which T=T_0 (given that T small indicates the tendency of clustering). Standard choices for H_0 will be a normal or uniform distribution. (In prabtest, it is a complicated distribution on presence-absence data.) There are lots of possible choices of T. prabtest uses the ratio between the 25% smallest distances in the dataset and the 25% largest distances. This should be reasonable in fairly general settings. For a discussion of this and alternative choices (and references on them), you may take a look into C. Hennig and B. Hausdorf: Distance-based parametric bootstrap tests for clustering of species ranges, Computational Statistics and Data Analysis 45 (2004), 875-896. A preprint of this can be obtained from my web page. If you want to test the significance of a solution from a particular cluster analysis method, you should think about choosing T so that it is somehow connected to the method. (In the Hennig and Hausdorf paper, there are for example two alternatives discussed that are connected to Single Linkage.) Best, Christian On Wed, 15 Sep 2004, Patrick Giraudoux wrote: Hi, I am wondering if a Monte Carlo method (or equivalent) exist permitting to test the randomness of a cluster analysis (eg got by hclust(). I went through the package fpc (maybe too superficially) but dit not find such method. Thanks for any hint, Patrick Giraudoux __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html *** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg [EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/ ### ich empfehle www.boag-online.de __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] cluster analysis and null hypothesis testing
Hi, I am wondering if a Monte Carlo method (or equivalent) exist permitting to test the randomness of a cluster analysis (eg got by hclust(). I went through the package fpc (maybe too superficially) but dit not find such method. Thanks for any hint, Patrick Giraudoux __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Cluster Analysis with minimum cluster size?
Hi all, Is it possible to run kmeans, pam or clara with a constraint such that no resulting cluster has fewer than X cases? These kmeans algorithms often find clusters that are too small for my use. There are usually a few clusters with 1-10 cases (generally substantial outliers). I then have to manually assign the small ones to other sizable clusters. If this doesn't exist, it there such an algorithm that does this? Thanks, Danny __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] cluster analysis
Is there anyone who would like to give me some examples of plots or data frames on clustering anaylis? if so, great thanks in advance! Files can be sent to my big mail box as [EMAIL PROTECTED] I want t operform cluster analysis on a set of data, the data is composed of time-evolution rms deviations, this is a N dimensional matrix with N(N-1) independent components. thanks! my name is xiaoqin huang, I am in CSIT of Florida State University. CSIT=computational science and information technology __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Cluster analysis
Hi, it seems that you mix something up. hclust is for dissimilarity based hierarchical cluster analysis, which has nothing to do with R squared, Pseudo F Informative output about the clustering is given as value of the hclust object, function cutree may help to extract a concrete clustering at some level of the hierarchy. Maybe you do not start with dissimilarity data and you might consider pam (in library cluster), kmeans or the library mclust for Normal mixtures. However, the statistics values you are looking for are not the primary quantities of interest in clustering, regardless of the method. Christian Hennig On Fri, 7 Mar 2003, Pierre-Olivier Chasset wrote: Hello, I would like to calculate a cluster analysis and I use the function 'hclust'. I have seen the GRAPHICAL results of this function with 'plot'. I would like to analyse this cluster but I don't know how to see the NUMERICAL results of each step of this cluster like: - R Squared - Pseudo F - Pseudo t**2 Thank you for any help, Pierre-Olivier Chasset = Pierre-Olivier Chasset 41, rue de la course F-67000 Strasbourg Phone: +33 3 88 32 06 42 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- *** Christian Hennig Seminar fuer Statistik, ETH-Zentrum (LEO), CH-8092 Zuerich (currently) and Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg [EMAIL PROTECTED], http://stat.ethz.ch/~hennig/ [EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/ ### ich empfehle www.boag.de __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help