Re: [R] Cluster analysis

2019-03-31 Thread Sarah Goslee
Hi, R has a vast array of tools for cluster analysis. There's even a task view: https://cran.r-project.org/web/views/Cluster.html Which method is best for your needs is going to require you spending some time working to understand the pros and cons, and possibly consulting with a local

[R] Cluster analysis

2019-03-31 Thread bienvenidoz...@gmail.com
Hi, I have data from farmers with different variables. I would like to classify them according to some variables. Can you help me with "R" to find the best variables to classify them and how to classify them with "R". Some variables are numerical others are ordinal. Best regards, Bienvenue

Re: [R] cluster samples using self organizing map in R

2018-10-10 Thread Sarah Goslee
Hi Tina, What's wrong with what you did? The output object of som() contains the classification of each sample. You probably do need to read more about self-organizing maps, since you specified you wanted the samples classified into nine groups, and that's unlikely to be your actual intent. I

Re: [R] cluster samples using self organizing map in R

2018-10-10 Thread Bert Gunter
Search! the rseek.org site gives many hits for "self organizing maps", including the som package among others. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic

[R] cluster samples using self organizing map in R

2018-10-10 Thread A DNA RNA
Dear All, Who can I use Self Organizing Map (SOM) results to cluster samples? I have tried following but this gives me only the clustering of grids, while I want to cluster (150) samples: library(kohonen) iris.sc <- scale(iris[, 1:4]) iris.som <- som(iris.sc, grid=somgrid(xdim = 3, ydim=3,

Re: [R] cluster data in lattice dotplot and show stdev

2017-02-16 Thread Duncan Mackay
: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Luigi Marongiu Sent: Friday, 17 February 2017 02:31 To: r-help Subject: [R] cluster data in lattice dotplot and show stdev dear all, i have a set of data that is separated in the variables: cluster (two runs), type (blank, negative a

Re: [R] cluster data in lattice dotplot and show stdev

2017-02-16 Thread Jim Lemon
Hi Luigi, Are you looking for something like this? library(plotrix) ylim=c(0,1.7) png("lmplot.png",width=600,height=300) par(mfrow=c(1,2)) brkdn.plot(value~type,data=my.data[my.data$target=="A",], main="Run 1",ylab="Value",xlab="",xaxlab="target",ylim=ylim,

[R] cluster data in lattice dotplot and show stdev

2017-02-16 Thread Luigi Marongiu
dear all, i have a set of data that is separated in the variables: cluster (two runs), type (blank, negative and positive) and target (A and B), each duplicated. I am plotting it with lattice and the result is a 2x2 matrix plot in which the top two cells (or panels) are relative to run 2, the

[R] Cluster analysis with Weighted attribute

2016-06-03 Thread Ahreum Lee
Hi! All. I'm not much familiar with R. So I tried to find a R function or packages that could work with my problems. What I wonder is, Whether there is any R function or package that includes the cluster analysis considering with the weighted attribute. I saw several papers that dealt

Re: [R] cluster analysis

2015-06-17 Thread PIKAL Petr
Hi -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Venky Sent: Wednesday, June 17, 2015 8:43 AM To: R Help R Subject: [R] cluster analysis Hi friends, I have data like this In R or elsewhere? Group Employee size WOE Employee size2

[R] cluster analysis

2015-06-17 Thread Venky
Hi friends, I have data like this Group Employee size WOE Employee size2 Weight of Evidence 1081680995 0 0.12875537 0.128755 -0.30761 1007079896 1 0.48380133 -0.46544 -0.70464 1000507407 2 0.26029825 -0.46544 0.070221 1006400720 3 0.12875537 0.128755 0.151385 1006916029 4 0.12875537 -0.05955

Re: [R] Cluster analysis using term frequencies

2015-03-24 Thread Christian Hennig
Dear Sun Shine, dtes - dist(tes.df, method = 'euclidean') dtesFreq - hclust(dtes, method = 'ward.D') plot(dtesFreq, labels = names(tes.df)) However, I get an error message when trying to plot this: Error in graphics:::plotHclust(n1, merge, height, order(x$order), hang, : invalid dendrogram

[R] Cluster analysis using term frequencies

2015-03-24 Thread Sun Shine
Hi list I am using the 'tm' package to review meeting notes at a school to identify terms frequently associated with 'learning', 'sports', and 'extra-mural' activities, and then to sort any terms according to these three headers in a way that could be supported statistically (as opposed to,

Re: [R] Cluster mapping data

2015-03-08 Thread Leask, Graham
Bert, Thank you for the suggestion but I am familiar with the clustering routines in R. My issue is how to carry out a grouping analysis on multi variate data that includes postcode shape file data as a variable. Rather than obtain clusters spread across the map I am looking to limit the

[R] Cluster mapping data

2015-03-08 Thread Leask, Graham
I am looking to cluster some data including a postcode shape file but need to ensure that the resulting groups are contiguous. How do I accomplish this using R? Kind Regards Dr Graham Leask Economics Strategy Group Aston University Aston Triangle Birmingham B4 7ET Tel: 0121 204 3150

Re: [R] Cluster mapping data

2015-03-08 Thread Bert Gunter
Have you looked at the Cluster task view on CRAN? http://cran.r-project.org/web/views/ -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Sun, Mar 8, 2015 at

Re: [R] cluster + tt terms in coxph

2014-11-06 Thread Henric Winell
On 2014-11-05 14:50, Therneau, Terry M., Ph.D. wrote: This is fixed in version 2.37-8 of the survival package, which has been in my send to CRAN real-soon-now queue for 6 months. Your note is a prod to get it done. I've been updating and adding vignettes. Is your fixed code publicly

Re: [R] cluster + tt terms in coxph

2014-11-05 Thread Therneau, Terry M., Ph.D.
This is fixed in version 2.37-8 of the survival package, which has been in my send to CRAN real-soon-now queue for 6 months. Your note is a prod to get it done. I've been updating and adding vignettes. Terry Therneau On 11/05/2014 05:00 AM, r-help-requ...@r-project.org wrote: I am

[R] Cluster -- Agnes function

2014-09-24 Thread Sohail Khan
Dear All, I have clustered a patient data set by agnes. I want to extract information for each cluster, I.E. all row ids belonging to each cluster. Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list

Re: [R] Cluster -- Agnes function

2014-09-24 Thread Bart Kastermans
On 24/09/14 16:13, Sohail Khan wrote: Dear All, I have clustered a patient data set by agnes. I want to extract information for each cluster, I.E. all row ids belonging to each cluster. Fascinating, thank you for sharing. Best, Bart __

Re: [R] Cluster -- Agnes function

2014-09-24 Thread David L Carlson
of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sohail Khan Sent: Wednesday, September 24, 2014 9:14 AM To: r-help@r-project.org Subject: [R] Cluster -- Agnes function Dear

[R] Error creating daisy matrix in R cluster package - Cannot allocate vector of size 66.0 Gb

2014-06-21 Thread Scott Davis
My purpose involves creating a dissimilarity matrix using the daisy package in R before applying k-mediod clustering for customer segmentation. The dataset has 133,153 observations of 35 variables in a data.frame with numerical, categorical, blank cells and missing values. Missing values refer to

[R] cluster option in stata for random intercept model in the R language?

2013-10-15 Thread Martin Batholdy
Dear R-list, I am currently working on a dataset with a colleague who uses stata. We fit a random intercept model to the data (decisions clustered in participants) and get closely the same results in stata (using xtreg re) and R (using the lme4 or multilevel package). Now in stata, there is

Re: [R] cluster option in stata for random intercept model in the R language?

2013-10-15 Thread David Winsemius
On Oct 15, 2013, at 3:32 AM, Martin Batholdy wrote: Dear R-list, I am currently working on a dataset with a colleague who uses stata. We fit a random intercept model to the data (decisions clustered in participants) and get closely the same results in stata (using xtreg re) and R (using

[R] cluster package - Installation problems

2013-07-15 Thread David Stevens
Group - I'm having problems with the 'cluster' package. Installation appears successful but attempts to load it with either library() or require() result in the error message Error in library(cluster) : there is no package called ‘cluster’ All that appears to be installed is cluster.dll in

Re: [R] cluster package - Installation problems

2013-07-15 Thread Uwe Ligges
On 15.07.2013 23:51, David Stevens wrote: Group - I'm having problems with the 'cluster' package. Installation appears successful but attempts to load it with either library() or require() result in the error message Error in library(cluster) : there is no package called ‘cluster’ All that

[R] cluster analysis

2013-07-04 Thread Ekele Alih
I want to do Agglomerative Hierarchical clustering using complete linkage method in R using the function agnes or hclust. 1. Can i do a cluster analysis of h=(n+p+1)/2 out of n observation?  note that p=nomber of variables(dependent and independent) 2. Can i plot the dendrogram and get the

Re: [R] cluster gene list

2013-04-22 Thread Catarina Maia
Hello, I'm just a beginner and probably there is a better way to do it but here it goes: #cluster analysis Euclidean_Distance - dist(mydata, method=euclidean, diag=FALSE , upper=FALSE, p=2) data - hclust(Euclidean_Distance, method=ward, members=NULL) plot(data,hang=-1) #K=4 # i chose to

[R] cluster gene list

2013-04-21 Thread Sudhir Singh
Hi, I have created a heatmap using heatmap.2 having 7 clusters. I would like to extract the list of genes that are in these 7 clusters. Is there any function that can be used to extract genes for each cluster? Cheers, Sudhir -- __

[R] Cluster analysis

2013-04-11 Thread ravanlou
I am doing cluster analysis of my SNPs data. I have 2 questions: 1. I draw the cluster in hclust using the following codes.change direction to vertical. data - read.table(as.matrix(file.choose()), header=T, row.names = 1, sep=\t) plot(hclust(as.dist(data),method=complete)) it is horizontal,

[R] Cluster analysis on weighted survey data with continuous and categorical variables

2013-03-19 Thread Emma Gibson
I am trying to perform cluster analysis on survey data where each respondent has answered several questions, some of which have categorical answers (blue pink green etc) and some of which have scale answers (rating from 1 to 10 etc).My problem is that certain age groups were over-sampled and I

Re: [R] Cluster analysis on weighted survey data with continuous and categorical variables

2013-03-19 Thread Thomas Lumley
On Wed, Mar 20, 2013 at 3:55 AM, Emma Gibson waterbab...@hotmail.comwrote: I am trying to perform cluster analysis on survey data where each respondent has answered several questions, some of which have categorical answers (blue pink green etc) and some of which have scale answers (rating

[R] Cluster analysis in the setting of repeated measures

2013-03-10 Thread John Sorkin
Does R have any function for performing cluster analysis when each subject contributes more than one observation to the analysis, i.e. a repeated measures cluster analysis? I prefer an agglomerative clustering, but would certainly be happy with a K-mean or other clustering technique. To the

Re: [R] Is it possible to obtain an agglomeration schedule with R cluster analyis

2013-02-23 Thread Uwe Ligges
On 22.02.2013 11:41, Bob Green wrote: Hello, In SPSS the cluster analysis output includes an agglomerations schedule, which details the stages when cases are joined. Is it possible to obtain such output when performing cluster analysis in R? If so, I'd appreciate advice regarding how to

Re: [R] Is it possible to obtain an agglomeration schedule with R cluster analyis

2013-02-23 Thread Bob Green
Hello Uwes, Thanks. Re-reading the hclust pages I found that using the hclust 'USArrests' data that the command plot (hc1) will generate the order in which cases joined. however, I still can't see how to obtain the respective height at which each case joined each cluster or the height

Re: [R] Is it possible to obtain an agglomeration schedule with R cluster analyis

2013-02-23 Thread William Dunlap
To: Uwe Ligges Cc: r-help@r-project.org Subject: Re: [R] Is it possible to obtain an agglomeration schedule with R cluster analyis Hello Uwes, Thanks. Re-reading the hclust pages I found that using the hclust 'USArrests' data that the command plot (hc1) will generate the order

Re: [R] Is it possible to obtain an agglomeration schedule with R cluster analyis

2013-02-23 Thread Bob Green
tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bob Green Sent: Saturday, February 23, 2013 12:49 PM To: Uwe Ligges Cc: r-help@r-project.org Subject: Re: [R] Is it possible to obtain an agglomeration schedule with R

[R] Is it possible to obtain an agglomeration schedule with R cluster analyis

2013-02-22 Thread Bob Green
Hello, In SPSS the cluster analysis output includes an agglomerations schedule, which details the stages when cases are joined. Is it possible to obtain such output when performing cluster analysis in R? If so, I'd appreciate advice regarding how to obtain this information. Any

[R] Cluster Analysis and PCoA (mixt variables)

2013-01-19 Thread Julien Mvdb
Hello everyone, I mail you because of my lake of knowlegde regarding statistics. I'm using the CA and PCoA (but maybe should I use some other techniques) to determine the differences and similarities between a large sample of plants using different kind of traits through matrix of mixte

[R] cluster analysis error - mclust package

2012-11-26 Thread KitKat
I am following instructions online for cluster analysis using the mclust package, and keep getting errors. http://www.statmethods.net/advstats/cluster.html These are the instructions (there is no sample dataset unfortunately): # Model Based Clustering library(mclust) fit - Mclust(mydata)

Re: [R] cluster analysis in R

2012-11-22 Thread Ingmar Visser
It's hard to answer these questions without knowing what the errors are and how they can be reproduced. Best, Ingmar On Thu, Nov 22, 2012 at 1:03 AM, KitKat katherinewri...@trentu.ca wrote: Thanks, I have been trying that site and another one (http://www.statmethods.net/advstats/cluster.html)

Re: [R] cluster analysis in R

2012-11-22 Thread KitKat
These are the errors I've been having. I have been trying 3 different things 1- Mclust: This is the example I have been following: # Model Based Clustering library(mclust) fit - Mclust(mydata) plot(fit, mydata) # plot results print(fit) # display the best model What I have done: fit -

Re: [R] cluster analysis in R

2012-11-21 Thread KitKat
Thank you for replying! I made a new post asking if there are any websites or files on how to download package mclust (or other Bayesian cluster analysis packages) and the appropriate R functions? Sorry I don't know how this forum works yet -- View this message in context:

Re: [R] cluster analysis in R

2012-11-21 Thread Brian Feeny
http://cran.r-project.org/web/views/Cluster.html might be a good start Brian On Nov 21, 2012, at 1:36 PM, KitKat wrote: Thank you for replying! I made a new post asking if there are any websites or files on how to download package mclust (or other Bayesian cluster analysis packages) and

Re: [R] cluster analysis in R

2012-11-21 Thread KitKat
Thanks, I have been trying that site and another one (http://www.statmethods.net/advstats/cluster.html) I don't know if I should be doing mclust or mcclust, but either way, the codes are not working. I am following the guidelines online at: mcclust -

Re: [R] cluster analysis in R

2012-11-16 Thread Hennig, Christian
, www.homepages.ucl.ac.uk/~ucakche From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of KitKat [katherinewri...@trentu.ca] Sent: 15 November 2012 18:14 To: r-help@r-project.org Subject: [R] cluster analysis in R I have two issues. 1-I am trying

[R] cluster analysis in R

2012-11-15 Thread KitKat
I have two issues. 1-I am trying to use morphology to identify gender. I have 9 variables, both continuous and categorical. I was using two-step cluster analysis in SPSS because two-step could deal with different types of variables. But the output tells me that an animal is in cluster 1 or 2, it

Re: [R] cluster analysis in R

2012-11-15 Thread Ingmar Visser
Dear KitKat, After installing R and reading some introductory material on getting started with R you may want to check the CRAN task view on cluster analysis: http://cran.r-project.org/web/views/Cluster.html which has many useful references to all kinds and flavors of clustering techniques,

Re: [R] cluster analysis in R

2012-11-15 Thread Jose Iparraguirre
Have a look at the package mclust. Jose From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Ingmar Visser [i.vis...@uva.nl] Sent: 15 November 2012 21:10 To: KitKat Cc: r-help@r-project.org Subject: Re: [R] cluster analysis in R

Re: [R] cluster of points

2012-07-31 Thread Jean V Adams
Frederico, This is not exactly what you're after, but perhaps it will help. In this example I fit a cluster analysis to the data, then I cut the tree at a height of 3 (you would do this with your data at a height of 40). It's not a perfect solution, but it might be good enough, depending on

[R] cluster of points

2012-07-30 Thread Frederico Mestre
Hello: What I want to do is quite simple, but I can't find a way. I have a data frame with several points (x and y coords). I want to add another column with cluster membership. For example aggregate all the points that stand within a distance of 40 from each other. I've tried using

Re: [R] cluster algorithm with fixed cluster size

2012-06-07 Thread Martin Gütlein
Hi, okay, and which algorithm is it? I had a closer look at the manual and could not find it, but there is quite a number of methods in there, maybe I missed it. Thanks, Martin -- View this message in context:

[R] cluster algorithm with fixed cluster size

2012-06-06 Thread Martin Guetlein
Hi all, Does anyone know a cluster algorithm in R that allows to set the cluster size (not the number of clusters) to a fixed value? With best regards, Martin -- Dipl-Inf. Martin Gütlein Phone: +49 (0)761 203 7633 (office) +49 (0)177 623 9499 (mobile) Email:

Re: [R] cluster algorithm with fixed cluster size

2012-06-06 Thread Özgür Asar
Hi, See the package cluster in R. Ozgur -- View this message in context: http://r.789695.n4.nabble.com/cluster-algorithm-with-fixed-cluster-size-tp4632523p4632540.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org

Re: [R] cluster with mahalanobis distance

2012-05-31 Thread David L Carlson
-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Maria Froes Sent: Wednesday, May 30, 2012 6:42 PM To: r-help@r-project.org Subject: Re: [R] cluster with mahalanobis distance How can I perform cluster analysis using the mahalanobis distance instead

Re: [R] cluster with mahalanobis distance

2012-05-30 Thread Maria Froes
How can I perform cluster analysis using the mahalanobis distance instead of the euclidean distance? Thank you Maria Froes [[alternative HTML version deleted]] __ R-help@r-project.org mailing list

Re: [R] Cluster Analysis

2012-04-19 Thread Alekseiy Beloshitskiy
From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of Taisa Brown [taisa.br...@unb.ca] Sent: 15 April 2012 03:28 To: r-help@r-project.org Subject: [R] Cluster Analysis Hi, I was wondering what the best equivalent to SAS's FASTCLUS

Re: [R] Cluster Analysis

2012-04-16 Thread David L Carlson
Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Taisa Brown Sent: Saturday, April 14, 2012 7:29 PM To: r-help@r-project.org Subject: [R] Cluster

[R] Cluster Analysis

2012-04-14 Thread Taisa Brown
Hi, I was wondering what the best equivalent to SAS's FASTCLUS and PROC CLUSTER would be. I need to be able to test the significance of the clusters by comparing the probability of obtaining an equal or greater pseudo F to the Bonferroni-corrected level. I will also need to plot r squared

[R] cluster analysis with pairwise data

2012-04-04 Thread paladini
Hello, I want to do a cluster analysis with my data. The problem is, that the variables dont't consist of single value but the entries are pairs of values. That lokks like this: Variable 1:Variable2: Variable3: ... (1,2) (1,5) (4,2) (7,8) (3,88)

Re: [R] cluster analysis with pairwise data

2012-04-04 Thread David L Carlson
Sent: Wednesday, April 04, 2012 6:32 AM To: r-help@r-project.org Subject: [R] cluster analysis with pairwise data Hello, I want to do a cluster analysis with my data. The problem is, that the variables dont't consist of single value but the entries are pairs of values. That lokks like

Re: [R] cluster analysis with pairwise data

2012-04-04 Thread Petr Savicky
On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote: Hello, I want to do a cluster analysis with my data. The problem is, that the variables dont't consist of single value but the entries are pairs of values. That lokks like this: Variable 1:Variable2: Variable3: ..

Re: [R] cluster analysis with pairwise data

2012-04-04 Thread ilai
On Wed, Apr 4, 2012 at 10:12 AM, Petr Savicky savi...@cs.cas.cz wrote: On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:  Var1 - c((1,2), (7,8), (4,7))  Var2 - c((1,5), (3,88), (12,4))  Var3 - c((4,2), (6,5), (4,4))  DF - data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE) If you

[R] Cluster GUI package worth publishing/enhancing?

2012-02-28 Thread Todd Gillette
For a school course I and a partner developed a GUI in R designed to enable exploration of data via visualization of hierarchical clustering and correlation of cluster partitions with external metadata. The key features were the ability to load in a distance matrix (most GUI-based clustering

[R] cluster by unique value

2011-07-18 Thread Alfredo Alessandrini
Hi, I need to make a cluster classification by the unique values of the data frame. I explain the problem. I need to classify this table, and assign to the same cluster each row that has the same combination of value: data1 layer_1 layer_2 layer_3 [1,] 0.246000 2

Re: [R] cluster by unique value

2011-07-18 Thread Sarah Goslee
Your data1 and your data1_class file differ in the first three columns. Assuming that's an error, here's one way to do it: data1 - data.frame(layer1=c(.2, .5, .2, .8, .2, .5, .5, .8, .2, .8),layer2=c(2,3,2,2,1,2,3,2,2,2), layer3=c(1,1,1,1,1,1,1,1,1,4)) data1 - cbind(data1,

Re: [R] cluster by unique value

2011-07-18 Thread jim holtman
Also read FAQ 7.31 before using 'numerics' as grouping factors. On Mon, Jul 18, 2011 at 6:36 AM, Sarah Goslee sarah.gos...@gmail.com wrote: Your data1 and your data1_class file differ in the first three columns. Assuming that's an error, here's one way to do it: data1 -

Re: [R] cluster by unique value

2011-07-18 Thread Petr Savicky
On Mon, Jul 18, 2011 at 06:36:13AM -0400, Sarah Goslee wrote: Your data1 and your data1_class file differ in the first three columns. Assuming that's an error, here's one way to do it: data1 - data.frame(layer1=c(.2, .5, .2, .8, .2, .5, .5, .8, .2, .8),layer2=c(2,3,2,2,1,2,3,2,2,2),

Re: [R] cluster() or frailty() in coxph

2011-06-27 Thread Terry Therneau
Addition of a cluster() term fits a Generalized Estimating Equations (GEE) type of model, addition of frailty() fits a random effects model (Mixed Effect or ME). In glm analysis (linear regression, logistic regression, etc) the arguments about the advantages/disadvantages of GEE ve ME would

[R] cluster() or frailty() in coxph

2011-06-26 Thread Ehsan Karim
Dear List, Can anyone please explain the difference between cluster() and frailty() in a coxph? I am a bit puzzled about it. Would appreciate any useful reference or direction. cheers, Ehsan marginal.model - coxph(Surv(time, status) ~ rx + cluster(litter), rats) frailty.model -

Re: [R] cluster() or frailty() in coxph

2011-06-26 Thread Joshua Wiley
Hi Ehsan, My understanding (hopefully someone will jump in if this is wrong) is that cluster() identifies a variable that is an indicator for correlated observations (rats in a litter, children in a classroom, etc.). The relative risk from treatment (rx) is for a random sample of rats.

[R] cluster analysis on extreme event

2011-05-27 Thread FMH
Dear all, I'm modelling extreme rainfall,particularly those that lie above a threshold was searching for a suitable package in R which may enable a cluster analysis on those extreme events and would really appreciate for any suggestions. Thanks, Fir

[R] Cluster analysis, factor variables, large data set

2011-03-31 Thread Hans Ekbrand
Dear R helpers, I have a large data set with 36 variables and about 50.000 cases. The variabels represent labour market status during 36 months, there are 8 different variable values (e.g. Full-time Employment, Student,...) Only cases with at least one change in labour market status is included

Re: [R] Cluster analysis, factor variables, large data set

2011-03-31 Thread Christian Hennig
Dear Hans, clara doesn't require a distance matrix as input (and therefore doesn't require you to run daisy), it will work with the raw data matrix using Euclidean distances implicitly. I can't tell you whether Euclidean distances are appropriate in this situation (this depends on the

Re: [R] Cluster analysis, factor variables, large data set

2011-03-31 Thread Hans Ekbrand
On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote: Dear Hans, clara doesn't require a distance matrix as input (and therefore doesn't require you to run daisy), it will work with the raw data matrix using Euclidean distances implicitly. I can't tell you whether Euclidean

Re: [R] Cluster analysis, factor variables, large data set

2011-03-31 Thread Hans Ekbrand
On Thu, Mar 31, 2011 at 08:48:02PM +0200, Hans Ekbrand wrote: On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote: Dear Hans, clara doesn't require a distance matrix as input (and therefore doesn't require you to run daisy), it will work with the raw data matrix using

Re: [R] Cluster analysis, factor variables, large data set

2011-03-31 Thread Peter Langfelder
On Thu, Mar 31, 2011 at 11:48 AM, Hans Ekbrand h...@sociologi.cjb.net wrote: The variables are unordered factors, stored as integers 1:9, where 1 means Full-time employment 2 means Part-time employment 3 means Student 4 means Full-time self-employee ... Does euclidean distances make

Re: [R] cluster analysis: predefined clusters

2010-12-01 Thread deriK2000
Peter Langfelder wrote: On Fri, Nov 26, 2010 at 6:55 AM, Derik Burgert derik2...@yahoo.de wrote: Dear list, running a hierachical cluster analysis I want to define a number of objects that build a cluster already. In other words: I want to force some of the cases to be in the same

[R] cluster analysis: predefined clusters

2010-11-26 Thread Derik Burgert
Dear list,   running a hierachical cluster analysis I want to define a number of objects that build a cluster already. In other words: I want to force some of the cases to be in the same cluster from the start of the algorithm.   Any hints? Thanks in advance!   Derik [[alternative HTML

Re: [R] cluster analysis: predefined clusters

2010-11-26 Thread Peter Langfelder
On Fri, Nov 26, 2010 at 6:55 AM, Derik Burgert derik2...@yahoo.de wrote: Dear list, running a hierachical cluster analysis I want to define a number of objects that build a cluster already. In other words: I want to force some of the cases to be in the same cluster from the start of the

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-09-27 Thread abanero
Hi Ulrich, I'm studying the principles of Affinity Propagation and I'm really glad to use your package (apcluster) in order to cluster my data. I have just an issue to solve.. If I apply the funcion: apcluster(sim) where sim is the matrix of dissimilarities, sometimes I encounter the warning

Re: [R] Cluster analysis

2010-07-27 Thread Jim Porzak
Pablo, we've had success using http://mephisto.unige.ch/traminer/preview.shtml to look at marketing paths. Question would be how many distinct case step discriptions are there? HTH, Jim On Jul 26, 2010 9:44 AM, Pablo Cerdeira pablo.cerde...@gmail.com wrote: Hi all, I have no idea if this

Re: [R] Cluster analysis

2010-07-27 Thread Pablo Cerdeira
Hi Allan, It helps a lot. I´ll try to read more about it. But, as you asked me, here goes a brief explanation about the necessary columns of the sample date paste at the end: id_processo: identify a legal case, it is its primary key. ordem_andamento: is the step number inside a legal case

Re: [R] Cluster analysis

2010-07-27 Thread Pablo Cerdeira
Hi Jim, Ow! Very nice job at http://mephisto.unige.ch/traminer/preview.shtml I´m going to read more about it. I have a lot of different steps, in a sequence. Actually, 586 different possible steps, but I have 4269 legal cases, with a maximum of 379 steps each one. If you want, I can send this

[R] Cluster analysis

2010-07-26 Thread Pablo Cerdeira
Hi all, I have no idea if this question is to easy to be answered, but I´m starting with R. So, here we go. I have a large dataset with a lot of steps a judicial case. A sample is attached. I´d like to do a cluster analysis to try to understand with one is the most usual path followed by this

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Ulrich Bodenhofer
abanero wrote: Do you know something like “knn1” that works with categorical variables too? Do you have any suggestion? There are surely plenty of clustering algorithms around that do not require a vector space structure on the inputs (like KNN does). I think agglomerative clustering would

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Christian Hennig
Dear abanero, In principle, k nearest neighbours classification can be computed on any dissimilarity matrix. Unfortunately, knn and knn1 seem to assume Euclidean vectors as input, which restricts their use. I'd probably compute an appropriate dissimilarity between points (have a look at

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread abanero
Hi, thank you Joris and Ulrich for you answers. Joris Meys wrote: see the library randomForest for example I'm trying to find some example in randomForest with categorical variables but I haven't found anything. Do you know any example with both categorical and numerical variables? Anyway I

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Joris Meys
Hi Abanero, first, I have to correct myself. Knn1 is a supervised learning algorithm, so my comment wasn't completely correct. In any case, if you want to do a clustering prior to a supervised classification, the function daisy() can handle any kind of variable. The resulting distance matrix can

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Joris Meys
@r-project.org Subject Re: [R] cluster analysis and 05/27/2010 07:56 supervised classification: an AMalternative to knn1? Hi

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Ulrich Bodenhofer
I had a look at the documentation of the package apcluster. That's interesting but do you have any example using it with both categorical and numerical variables? I'd like to test it with a large dataset.. Your posting has opened my eyes: problems where both numerical and categorical

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Ulrich Bodenhofer
Sorry, Joris, I overlooked that you already mentioned daisy() in your posting. I should have credited your recommendation in my previous message. Cheers, Ulrich -- View this message in context:

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread abanero
Ulrich wrote: Affinity propagation produces quite a number of clusters. I tried with q=0 and produces 17 clusters. Anyway that's a good idea, thanks. I'm looking to test it with my dataset. So I'll probably use daisy() to compute an appropriate dissimilarity then apcluster() or another

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Christian Hennig
Christian wrote: and the implement nearest neighbours classification myself if I needed it. It should be pretty straightforward to implement. Do you intend modify the code of the knn1() function by yourself? No; if you understand what the nearest neighbours method does, it's not very

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread Ulrich Bodenhofer
What do you suggest in order to assign a new observation to a determined cluster? As I mentioned already, I would simply assign the new observation to the cluster to whose exemplar the new observation is most similar to (in a knn1-like fashion). To compute these similarities, you can use the

[R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-26 Thread abanero
Hi, I have a 1.000 observations with 10 attributes (of different types: numeric, dicotomic, categorical ecc..) and a measure M. I need to cluster these observations in order to assign a new observation (with the same 10 attributes but not the measure) to a cluster. I want to calculate for

Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-26 Thread Joris Meys
Not a direct answer, but from your description it looks like you are better of with supervised classification algorithms instead of unsupervised clustering. see the library randomForest for example. Alternatively, you can try a logistic regression or a multinomial regression approach, but these

Re: [R] Cluster procedure using geographical neighborhood

2010-05-07 Thread Martin Maechler
Dear Dario Sacco, DS == Dario Sacco dario.sa...@unito.it on Thu, 06 May 2010 17:45:30 +0200 writes: DS Dear Dr. Maechler, DS I am an agronomist and a researcher at the University of Turin. I am DS also teaching Applied statistics, then I have some knowledge in DS

[R] Cluster analysis: dissimilar results between R and SPSS

2010-04-26 Thread Jeoffrey Gaspard
Hello everyone! My data is composed of 277 individuals measured on 8 binary variables (1=yes, 2=no). I did two similar cluster analyses, one on SPSS 18.0 and one on R 2.9.2. The objective is to have the means for each variable per retained cluster. 1) the R analysis ran as followed: call

Re: [R] Cluster analysis: dissimilar results between R and SPSS

2010-04-26 Thread Tal Galili
Hi Jeoffrey, How stable are the results in general ? If you repeat the analysis in R several times, does it yield the same results ? Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me:

  1   2   >