Hi,
R has a vast array of tools for cluster analysis. There's even a task
view: https://cran.r-project.org/web/views/Cluster.html
Which method is best for your needs is going to require you spending
some time working to understand the pros and cons, and possibly
consulting with a local
Hi,
I have data from farmers with different variables. I would like to classify
them according to some variables. Can you help me with "R" to find the best
variables to classify them and how to classify them with "R". Some variables
are numerical others are ordinal.
Best regards,
Bienvenue
Hi! All.
I'm not much familiar with R.
So I tried to find a R function or packages that could work with my problems.
What I wonder is,
Whether there is any R function or package that includes the cluster analysis
considering with the weighted attribute.
I saw several papers that dealt
Hi
-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Venky
Sent: Wednesday, June 17, 2015 8:43 AM
To: R Help R
Subject: [R] cluster analysis
Hi friends,
I have data like this
In R or elsewhere?
Group
Employee size WOE Employee size2
Hi friends,
I have data like this
Group
Employee size WOE Employee size2 Weight of Evidence 1081680995 0
0.12875537 0.128755 -0.30761 1007079896 1 0.48380133 -0.46544 -0.70464
1000507407 2 0.26029825 -0.46544 0.070221 1006400720 3 0.12875537 0.128755
0.151385 1006916029 4 0.12875537 -0.05955
Dear Sun Shine,
dtes - dist(tes.df, method = 'euclidean')
dtesFreq - hclust(dtes, method = 'ward.D')
plot(dtesFreq, labels = names(tes.df))
However, I get an error message when trying to plot this: Error in
graphics:::plotHclust(n1, merge, height, order(x$order), hang, : invalid
dendrogram
Hi list
I am using the 'tm' package to review meeting notes at a school to
identify terms frequently associated with 'learning', 'sports', and
'extra-mural' activities, and then to sort any terms according to these
three headers in a way that could be supported statistically (as opposed
to,
I want to do Agglomerative Hierarchical clustering using complete linkage
method in R using the function agnes or hclust.
1. Can i do a cluster analysis of h=(n+p+1)/2 out of n observation? note that
p=nomber of variables(dependent and independent)
2. Can i plot the dendrogram and get the
I am doing cluster analysis of my SNPs data. I have 2 questions:
1. I draw the cluster in hclust using the following codes.change direction
to vertical.
data - read.table(as.matrix(file.choose()), header=T, row.names = 1,
sep=\t)
plot(hclust(as.dist(data),method=complete))
it is horizontal,
I am trying to perform cluster analysis on survey data where each respondent
has answered several questions, some of which have categorical answers (blue
pink green etc) and some of which have scale answers (rating from 1 to 10
etc).My problem is that certain age groups were over-sampled and I
On Wed, Mar 20, 2013 at 3:55 AM, Emma Gibson waterbab...@hotmail.comwrote:
I am trying to perform cluster analysis on survey data where each
respondent has answered several questions, some of which have categorical
answers (blue pink green etc) and some of which have scale answers
(rating
Does R have any function for performing cluster analysis when each subject
contributes more than one observation to the analysis, i.e. a repeated measures
cluster analysis? I prefer an agglomerative clustering, but would certainly be
happy with a K-mean or other clustering technique. To the
Hello everyone,
I mail you because of my lake of knowlegde regarding statistics.
I'm using the CA and PCoA (but maybe should I use some other techniques) to
determine the differences and similarities between a large sample of plants
using different kind of traits through matrix of mixte
I am following instructions online for cluster analysis using the mclust
package, and keep getting errors.
http://www.statmethods.net/advstats/cluster.html
These are the instructions (there is no sample dataset unfortunately):
# Model Based Clustering
library(mclust)
fit - Mclust(mydata)
It's hard to answer these questions without knowing what the errors are and
how they can be reproduced.
Best, Ingmar
On Thu, Nov 22, 2012 at 1:03 AM, KitKat katherinewri...@trentu.ca wrote:
Thanks, I have been trying that site and another one
(http://www.statmethods.net/advstats/cluster.html)
These are the errors I've been having. I have been trying 3 different things
1- Mclust:
This is the example I have been following:
# Model Based Clustering
library(mclust)
fit - Mclust(mydata)
plot(fit, mydata) # plot results
print(fit) # display the best model
What I have done:
fit -
Thank you for replying!
I made a new post asking if there are any websites or files on how to
download package mclust (or other Bayesian cluster analysis packages) and
the appropriate R functions? Sorry I don't know how this forum works yet
--
View this message in context:
http://cran.r-project.org/web/views/Cluster.html
might be a good start
Brian
On Nov 21, 2012, at 1:36 PM, KitKat wrote:
Thank you for replying!
I made a new post asking if there are any websites or files on how to
download package mclust (or other Bayesian cluster analysis packages) and
Thanks, I have been trying that site and another one
(http://www.statmethods.net/advstats/cluster.html)
I don't know if I should be doing mclust or mcclust, but either way, the
codes are not working. I am following the guidelines online at:
mcclust -
, www.homepages.ucl.ac.uk/~ucakche
From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of
KitKat [katherinewri...@trentu.ca]
Sent: 15 November 2012 18:14
To: r-help@r-project.org
Subject: [R] cluster analysis in R
I have two issues.
1-I am trying
I have two issues.
1-I am trying to use morphology to identify gender. I have 9 variables, both
continuous and categorical. I was using two-step cluster analysis in SPSS
because two-step could deal with different types of variables. But the
output tells me that an animal is in cluster 1 or 2, it
Dear KitKat,
After installing R and reading some introductory material on getting
started with R you may want to check the CRAN task view on cluster analysis:
http://cran.r-project.org/web/views/Cluster.html
which has many useful references to all kinds and flavors of clustering
techniques,
Have a look at the package mclust.
Jose
From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of
Ingmar Visser [i.vis...@uva.nl]
Sent: 15 November 2012 21:10
To: KitKat
Cc: r-help@r-project.org
Subject: Re: [R] cluster analysis in R
From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of
Taisa Brown [taisa.br...@unb.ca]
Sent: 15 April 2012 03:28
To: r-help@r-project.org
Subject: [R] Cluster Analysis
Hi,
I was wondering what the best equivalent to SAS's FASTCLUS
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of Taisa Brown
Sent: Saturday, April 14, 2012 7:29 PM
To: r-help@r-project.org
Subject: [R] Cluster
Hi,
I was wondering what the best equivalent to SAS's FASTCLUS and PROC CLUSTER
would be. I need to be able to test the significance of the clusters by
comparing the probability of obtaining an equal or greater pseudo F to the
Bonferroni-corrected level. I will also need to plot r squared
Hello,
I want to do a cluster analysis with my data. The problem is, that the
variables dont't consist of single value but the entries are pairs of
values.
That lokks like this:
Variable 1:Variable2: Variable3: ...
(1,2) (1,5) (4,2)
(7,8) (3,88)
Sent: Wednesday, April 04, 2012 6:32 AM
To: r-help@r-project.org
Subject: [R] cluster analysis with pairwise data
Hello,
I want to do a cluster analysis with my data. The problem is, that the
variables dont't consist of single value but the entries are pairs of
values.
That lokks like
On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:
Hello,
I want to do a cluster analysis with my data. The problem is, that the
variables dont't consist of single value but the entries are pairs of
values.
That lokks like this:
Variable 1:Variable2: Variable3: ..
On Wed, Apr 4, 2012 at 10:12 AM, Petr Savicky savi...@cs.cas.cz wrote:
On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:
Var1 - c((1,2), (7,8), (4,7))
Var2 - c((1,5), (3,88), (12,4))
Var3 - c((4,2), (6,5), (4,4))
DF - data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE)
If you
Dear all,
I'm modelling extreme rainfall,particularly those that lie above a threshold
was searching for a suitable package in R which may enable a cluster
analysis on those extreme events and would really appreciate for any
suggestions.
Thanks,
Fir
Dear R helpers,
I have a large data set with 36 variables and about 50.000 cases. The
variabels represent labour market status during 36 months, there are 8
different variable values (e.g. Full-time Employment, Student,...)
Only cases with at least one change in labour market status is
included
Dear Hans,
clara doesn't require a distance matrix as input (and therefore doesn't
require you to run daisy), it will work with the raw data matrix using
Euclidean distances implicitly.
I can't tell you whether Euclidean distances are appropriate in this
situation (this depends on the
On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote:
Dear Hans,
clara doesn't require a distance matrix as input (and therefore
doesn't require you to run daisy), it will work with the raw data
matrix using
Euclidean distances implicitly.
I can't tell you whether Euclidean
On Thu, Mar 31, 2011 at 08:48:02PM +0200, Hans Ekbrand wrote:
On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote:
Dear Hans,
clara doesn't require a distance matrix as input (and therefore
doesn't require you to run daisy), it will work with the raw data
matrix using
On Thu, Mar 31, 2011 at 11:48 AM, Hans Ekbrand h...@sociologi.cjb.net wrote:
The variables are unordered factors, stored as integers 1:9, where
1 means Full-time employment
2 means Part-time employment
3 means Student
4 means Full-time self-employee
...
Does euclidean distances make
Peter Langfelder wrote:
On Fri, Nov 26, 2010 at 6:55 AM, Derik Burgert derik2...@yahoo.de wrote:
Dear list,
running a hierachical cluster analysis I want to define a number of
objects that build a cluster already. In other words: I want to force
some of the cases to be in the same
Dear list,
running a hierachical cluster analysis I want to define a number of objects
that build a cluster already. In other words: I want to force some of the cases
to be in the same cluster from the start of the algorithm.
Any hints? Thanks in advance!
Derik
[[alternative HTML
On Fri, Nov 26, 2010 at 6:55 AM, Derik Burgert derik2...@yahoo.de wrote:
Dear list,
running a hierachical cluster analysis I want to define a number of objects
that build a cluster already. In other words: I want to force some of the
cases to be in the same cluster from the start of the
Hi Ulrich,
I'm studying the principles of Affinity Propagation and I'm really glad to
use your package (apcluster) in order to cluster my data. I have just an
issue to solve..
If I apply the funcion: apcluster(sim)
where sim is the matrix of dissimilarities, sometimes I encounter the
warning
Pablo, we've had success using
http://mephisto.unige.ch/traminer/preview.shtml to look at marketing paths.
Question would be how many distinct case step discriptions are there?
HTH, Jim
On Jul 26, 2010 9:44 AM, Pablo Cerdeira pablo.cerde...@gmail.com wrote:
Hi all,
I have no idea if this
Hi Allan,
It helps a lot. I´ll try to read more about it.
But, as you asked me, here goes a brief explanation about the necessary
columns of the sample date paste at the end:
id_processo: identify a legal case, it is its primary key.
ordem_andamento: is the step number inside a legal case
Hi Jim,
Ow! Very nice job at http://mephisto.unige.ch/traminer/preview.shtml I´m
going to read more about it.
I have a lot of different steps, in a sequence. Actually, 586 different
possible steps, but I have 4269 legal cases, with a maximum of 379 steps
each one.
If you want, I can send this
Hi all,
I have no idea if this question is to easy to be answered, but I´m starting
with R. So, here we go.
I have a large dataset with a lot of steps a judicial case. A sample is
attached.
I´d like to do a cluster analysis to try to understand with one is the most
usual path followed by this
abanero wrote:
Do you know something like “knn1” that works with categorical variables
too?
Do you have any suggestion?
There are surely plenty of clustering algorithms around that do not require
a vector space structure on the inputs (like KNN does). I think
agglomerative clustering would
Dear abanero,
In principle, k nearest neighbours classification can be computed on
any dissimilarity matrix. Unfortunately, knn and knn1 seem to assume
Euclidean vectors as input, which restricts their use.
I'd probably compute an appropriate dissimilarity between points (have a
look at
Hi,
thank you Joris and Ulrich for you answers.
Joris Meys wrote:
see the library randomForest for example
I'm trying to find some example in randomForest with categorical variables
but I haven't found anything. Do you know any example with both categorical
and numerical variables? Anyway I
Hi Abanero,
first, I have to correct myself. Knn1 is a supervised learning algorithm, so
my comment wasn't completely correct. In any case, if you want to do a
clustering prior to a supervised classification, the function daisy() can
handle any kind of variable. The resulting distance matrix can
@r-project.org
Subject
Re: [R] cluster analysis and
05/27/2010 07:56 supervised classification: an
AMalternative to knn1?
Hi
I had a look at the documentation of the package apcluster.
That's interesting but do you have any example using it with both
categorical
and numerical variables? I'd like to test it with a large dataset..
Your posting has opened my eyes: problems where both numerical and
categorical
Sorry, Joris, I overlooked that you already mentioned daisy() in your
posting. I should have credited your recommendation in my previous message.
Cheers, Ulrich
--
View this message in context:
Ulrich wrote:
Affinity propagation produces quite a number of clusters.
I tried with q=0 and produces 17 clusters. Anyway that's a good idea,
thanks. I'm looking to test it with my dataset.
So I'll probably use daisy() to compute an appropriate dissimilarity then
apcluster() or another
Christian wrote:
and the implement
nearest neighbours classification myself if I needed it.
It should be pretty straightforward to implement.
Do you intend modify the code of the knn1() function by yourself?
No; if you understand what the nearest neighbours method does, it's not
very
What do you suggest in order to assign a new observation to a determined
cluster?
As I mentioned already, I would simply assign the new observation to the
cluster to whose exemplar the new observation is most similar to (in a
knn1-like fashion). To compute these similarities, you can use the
Hi,
I have a 1.000 observations with 10 attributes (of different types: numeric,
dicotomic, categorical ecc..) and a measure M.
I need to cluster these observations in order to assign a new observation
(with the same 10 attributes but not the measure) to a cluster.
I want to calculate for
Not a direct answer, but from your description it looks like you are better
of with supervised classification algorithms instead of unsupervised
clustering. see the library randomForest for example. Alternatively, you can
try a logistic regression or a multinomial regression approach, but these
Hello everyone!
My data is composed of 277 individuals measured on 8 binary variables
(1=yes, 2=no).
I did two similar cluster analyses, one on SPSS 18.0 and one on R 2.9.2. The
objective is to have the means for each variable per retained cluster.
1) the R analysis ran as followed:
call
Hi Jeoffrey,
How stable are the results in general ?
If you repeat the analysis in R several times, does it yield the same
results ?
Tal
Contact
Details:---
Contact me: tal.gal...@gmail.com | 972-52-7275845
Read me:
I'm not sure why you'd expect Euclidean distance and squared Euclidean
distance to
give the same results.
Euclidean distance is the square root of the sums of squared
differences for each variable, and that's exactly what dist() returns.
http://en.wikipedia.org/wiki/Euclidean_distance
On a map,
hi,
how can i do cluster analysis on spatial data? (longitude latitude)
Because i used the function clust of the clustTool package and it did'nt work
at all:
cl - clust(dv,3,method=hclustAverage,distMethod=euclidean)
thanks a lot
Karine HEERAH
Master 2 , océanographie et
Hi Samantha,
Did you check out the help for plclust? There's a labels argument that
is used to label the leaves of your dendrogram. By default, the rownames
of your dataframe are used.
Sarah
On Wed, Mar 10, 2010 at 9:01 PM, Samantha samantha.fra...@gmail.com wrote:
Hi,
I am clustering data
Hi Samantha,
You can check out the graph and source code on this page:
http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=79
best, Xian
--
View this message in context:
http://n4.nabble.com/cluster-analysis-labels-for-dendrogram-tp1588347p1588790.html
Sent from the R help
Hi,
I am clustering data based on three numeric variables. I have a fourth
variable that is categorical (site) which I would like to use to label the
leaves of my dendrogram, so I can see how the different sites are grouped
throughout the tree, but I do NOT want to use this variable in the
Hi Folks,
I want to apply cluster analysis on a categorical data set, could you
recommend me some R package and suggestion?
Thanks!
Dong
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
- cc
project.org
Subject
[R] cluster analysis
02/18/2010 04:54
On 16.11.2009 19:13, Charles C. Berry wrote:
The question: Can this be accomplished in the *dendrogram plot*
by manipulating the resulting hclust data structure or by some
other means, and if yes, how?
Yes, you need to study
?hclust
particularly the part about 'Value' from which
Original Message
Subject: Re: [R] Cluster analysis: hclust manipulation possible?
Date: Mon, 16 Nov 2009 19:22:54 -0800
From: Charles C. Berry cbe...@tajo.ucsd.edu
To: Jopi Harri jopi.ha...@utu.fi
References: 4b016237.7050...@utu.fi
pine.lnx.4.64.0911160906420.27
On 17.11.2009 5:22, Charles C. Berry wrote:
Once you get the hang of it, you'll be in a position to modify an existing
hclust object.
I believe that I managed to solve the problem. (The code may not
be too refined, and my R is perhaps a bit dialectal. The function
may fail especially if the
I am doing cluster analysis [hclust(Dist, method=average)] on
data that potentially contains redundant objects. As expected,
the inclusion of redundant objects affects the clustering result,
i.e., the data a1, = a2, = a3, b, c, d, e1, = e2 is likely to
cluster differently from the same data
On Mon, 16 Nov 2009, Jopi Harri wrote:
I am doing cluster analysis [hclust(Dist, method=average)] on
data that potentially contains redundant objects. As expected,
the inclusion of redundant objects affects the clustering result,
i.e., the data a1, = a2, = a3, b, c, d, e1, = e2 is likely to
Hi folks,
I tried for the first time hclust. Unfortunately, with missing data in my
data file, it doesn't seem
to work. I found no information about how to consider missing data.
Omission of all missings is not really an option as I would loose to many
cases.
Thanks in advance
Holger
--
View
...@r-project.org] On Behalf Of
Hollix [holger.steinm...@web.de]
Sent: 14 July 2009 16:42
To: r-help@r-project.org
Subject: [R] Cluster analysis with missing data
Hi folks,
I tried for the first time hclust. Unfortunately, with missing data in my
data file, it doesn't seem
to work. I found
On Mon, 2009-07-13 at 23:42 -0700, Hollix wrote:
Hi folks,
I tried for the first time hclust. Unfortunately, with missing data in my
data file, it doesn't seem
to work. I found no information about how to consider missing data.
Omission of all missings is not really an option as I would
I use kmeans to classify spectral events in high and low 1/3 octave bands:
#Do cluster analysis
CyclA-data.frame(LlowA,LhghA)
CntrA-matrix(c(0.9,0.8,0.8,0.75,0.65,0.65), nrow = 3, ncol=2, byrow=TRUE)
ClstA-kmeans(CyclA,centers=CntrA,nstart=50,algorithm=MacQueen)
This works well when the actual
Dear Alex,
actually fixing the number of clusters in kmeans end then ending up with a
smaller number because of empty clusters is not a standard method of
estimating the number of clusters. I may happen (as apparently in some of
your examples), but it is generally rather unusual. In most
Hi all!
I'm new to R and don't know many about it. Because it is free, I managed to
learn it a little bit.
Here is my problem: I did a cluster analysis on 30 observations and 16
variables (monde, figaro, liberation, etc.). Here is the .txt data file:
jgaspard wrote:
Hi all!
I'm new to R and don't know many about it. Because it is free, I managed to
learn it a little bit.
Here is my problem: I did a cluster analysis on 30 observations and 16
variables (monde, figaro, liberation, etc.). Here is the .txt data file:
data point is a vector, and my distance
measurement is a weighted dot product between vectors.
I would like to use R to perform a cluster analysis on this data. Does
one of the R cluster analysis routines provide for a user provided
distance function?
Dan Stanger
Eaton Vance Management
Hello All,
I have data where each feature data point is a vector, and my distance
measurement is a weighted dot product between vectors.
I would like to use R to perform a cluster analysis on this data. Does
one of the R cluster analysis routines provide for a user provided
distance function
point is a vector, and my distance
measurement is a weighted dot product between vectors.
I would like to use R to perform a cluster analysis on this data. Does
one of the R cluster analysis routines provide for a user provided
distance function?
Dan Stanger
Eaton Vance Management
255
Hi,
Are there any algorithms that handle numeric and factor variables
together in a cluster analysis?
Thank you,
Nagu
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
into distance matrix.
Regards,
Moshe.
--- On Wed, 11/6/08, Nagu [EMAIL PROTECTED] wrote:
From: Nagu [EMAIL PROTECTED]
Subject: [R] Cluster analysis using numeric and factor variables
To: r-help@r-project.org
Received: Wednesday, 11 June, 2008, 11:49 AM
Hi,
Are there any algorithms that handle
Dear all,
I would like to perform a clustering analysis on a data frame with two
coordinate variables (X and Y) and a categorical variable where only a != b can
be established. As far as I understood classification analyses, they are not
an option as they partition the training set only in k
Dear Miha,
a general way to do this is as follows:
Define a distance measure by aggregating the
Euclidean distance on the (X,Y)-space and the trivial 0-1 distance (0 if
category is the same) on the categorial variable. Perform cluster analysis
(whichever you want) on the resulting distance
Subject: [R] cluster analysis
Hi Sir
How can we select the optimum number of clusters?
Best Regards
--
AMINA SHAHZADI
Department of Statistics
GC University Lahore, Pakistan.
[[alternative HTML version deleted]]
__
R-help@r-project.org
Hi Sir
How can we select the optimum number of clusters?
Best Regards
--
AMINA SHAHZADI
Department of Statistics
GC University Lahore, Pakistan.
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
Dear all,
I would like to know if I can do a hierarchical cluster analysis in R using
my own similarity matrix and how. Thanks. Katia Freire.
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
take a look at hclust()
Dieter
Katia Freire wrote:
Dear all,
I would like to know if I can do a hierarchical cluster analysis in R using
my own similarity matrix and how. Thanks. Katia Freire.
[[alternative HTML version deleted]]
Subject: [R] Cluster Analysis
Dear all,
I would like to know if I can do a hierarchical cluster analysis in R
using my own similarity matrix and how. Thanks. Katia Freire.
Yes. ;)
Reading the help for dist() and hclust() should make the procedure for
doing this appear fairly
Hi Sir
How to perform cluster analysis using Ward's method and K- means clustering?
Regards
--
AMINA SHAHZADI
Department of Statistics
GC University Lahore, Pakistan.
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
On 10/18/07, amna khan [EMAIL PROTECTED] wrote:
Hi Sir
How to perform cluster analysis using Ward's method and K- means clustering?
For beginning, try to perform it using the GUI Rcmdr.
Regards,
Liviu
__
R-help@r-project.org mailing list
91 matches
Mail list logo