I am not sure this might help, but you are perhaps lookign at variable
selection. There is a 2006 JASA paper by Raftery and Dean which may
help.
Many thanks,
Ranjan
On Fri, 27 Jul 2007 17:32:02 +1000 [EMAIL PROTECTED] wrote:
Hi List,
How would I go about best identifying the variables
Scott:
Suggest you look at using Discrimnant Analysis (don't know which R
package has it).
Take the Clusters created, using Discrimnant Analysis, Get Fisher Scores
for the clusters.
Then you can take new dataset applying fisher scores to see what which
defined cluster the new dataset
will be
You can't do Discrimnant Analysis without a quadratic metric in a
Euclidean space. 'Scott Bearer' explicitly does not want to assume that
sort of distance measure.
I am not sure how he used Agnes to form 20 clusters: it forms a
hierarchical clustering, so it really is not possible to predict
It seems nobody else was willing to help here
(when the original poster did not at all follow the posting
guide).
In the mean time, someone else has asked me about part of this,
so let me answer in public :
MM == Martin Maechler [EMAIL PROTECTED]
on Mon, 12 Mar 2007 17:23:30 +0100 writes:
Hi Vallejo,
I'm pretty busy currently, and feel your question has much more
to do with how to use R more generally than with using the
functions from the cluster package.
So you may get help from other R-help readers,
but maybe only after you have followed the posting-guide
and give a
Dear Kris,
a) how would one go about calculating the matrix of Dmax/KS distance values?
Hmm, I'd implement this directly by comparing the curves on a dense
sequence of equidistant points over a given value
range (hope you know a suitable one) and looking for the maximum
difference...
b) of
On Wed, 18 Oct 2006, Weiwei Shi wrote:
Dear Chris:
I tried to use cor+1 but it still gives me sil width 0 for average.
Well, then it seems that the clustering is not that good.
I don't know your data and there is no theoretical reason why it has to
be positive. You should read the Kaufman
Dear Weiwei,
1. Is there a way of evaluate the effecitives (or seperation) of
clustering (rather than by visualization)?
The function cluster.stats in package fpc computes several cluster
validation statistics (among them the average silhouette width).
Function clusterboot in the same package
Dear Christian:
This is really a good summary. Most of my prev experience was on
classification instead of clustering and this is really a good start
for me. Thank you!
And also hope someone can provide more info and answers to the other questions.
cheers,
weiwei
On 10/18/06, Christian Hennig
Dear Weiwei,
btw, ?cluster.stats does not work on my Mac machine.
version
_
platform i386-apple-darwin8.6.1
arch i386
os darwin8.6.1
system i386, darwin8.6.1
status
major 2
minor 3.1
year 2006
month
Dear Chris:
thanks for the prompt reply!
You are right, dist from pearson has negatives there, which I should
use cor+1 in my case (since negatively correlated genes should be
considered farthest). Thanks.
as to the ?cluster.stats, I double-checked it and I found I need to
restart my JGR, until
Dear Chris:
I have a sample like this
dim(dd.df)
[1] 142 28
and I want to cluster rows;
first of all, I followed the examples for cluster.stats() by
d.dd - dist(dd.df) # use Euclidean
d.4 - cutree(hclust(d.dd), 4) # 4 clusters I tried
cluster.stats(d.dd, d.4) # gives me some results like this:
On 10/17/06, Weiwei Shi [EMAIL PROTECTED] wrote:
is there some good summary on clustering methods in R? It seems there
are many packages involving it.
Gabor provided this very useful link a couple of days back.
http://cran.r-project.org/src/contrib/Views/Cluster.html
jab
--
John
hi,
I just happened to find that page. But it seems too brief to me. For
example, my project involves non-determined cluster number and
non-determined attributes for the would-be-clustered samples. What
kind of methods should I start with?
Thanks a lot for the prompty reply.
W.
On 10/17/06,
Go the R home page (google for R), click on CRAN in left pane, choose
a mirror, click on Task Views in left pane and choose
Cluster.
On 10/17/06, Weiwei Shi [EMAIL PROTECTED] wrote:
hi,
is there some good summary on clustering methods in R? It seems there
are many packages involving it.
And
Mahdi Osman [EMAIL PROTECTED] writes:
Hi list,
I am interested in cluster analysis of microarray data. The data was
generated using cDNA method and a loop design.
I was wondering if any one has a suggestion about which package I can
use to analyse such data.
There are many packages
Wade == Wade Wall [EMAIL PROTECTED]
on Fri, 14 Jul 2006 10:10:11 -0400 writes:
Wade I am trying to run a cluster analysis using Sorenson
Wade (Bray-Curtis) distance measure with flexible beta
Wade linkage method. However, I can't seem to find
Wade flexible beta in any of
Linda;
You might want to look at the package ade4 and inparticular the function
dist.binary. Although you have mentioned Rand Index I would suggest that
you look at the corrected Rand Index for chance agreement as it measures
the agreement between two clusterings resulting from two
Have you checked the amap package? It has been updated just recently and if
I am not wrong there is a method which indicates the best number of k groups
for your data.
Best wishes,
P. Olsson
2006/2/5, John Janmaat [EMAIL PROTECTED]:
Hello,
I'm playing around with cluster analysis, and am
Hi,
as said before, some statistics to estimate the number of clusters are in
the cluster.stats function of package fpc. These are distance-based,
not pseudo F or T^2. They are documented in the book
of Gordon (1999) Classification (see ?cluster.stats for more references).
It also includes
Dear John,
You can play around with cluster.stats function in library fpc, e.g. you
can try:
library(fpc)
library(cluster)
data(xclara)
dM - dist(xclara)
cl - vector()
for(i in 2:7){
cl[i] - cluster.stats(d=dM, clustering=clara(d,i)$cluster,
silhouette=FALSE)$wb.ratio
}
plot(1:6,cl[2:7],
Le 05.02.2006 17:50, John Janmaat a écrit :
Hello,
I'm trying some cluster analysis, using the hclust command. I am looking for
some help in selecting the 'best' number of clusters. Some software reports
pseudo-F and pseudo-T^2 statistics, for each cluster merge. Is there any way
to
Markus == Markus Preisetanz [EMAIL PROTECTED]
on Thu, 26 Jan 2006 20:48:29 +0100 writes:
Markus Dear R Specialists,
Markus when trying to cluster a data.frame with about 80.000 rows and 25
columns I get the above error message. I tried hclust (using dist), agnes
(entering the
Let's do some simple calculation: The dist object from a data set with
8 cases would have
8 * (8 - 1) / 2
elements, each takes 8 bytes to be stored in double precision. That's over
24GB if my arithmetic isn't too flaky. You'd have a devil of a time trying
to do this on a 64-bit
Dear Weiwei,
your question sounds a bit too general and complicated for the R-list.
Perhaps you should look for personal statistical advice.
The quality of methods (and especially distance choice) for down-sampling
ceratinly depends on the structure of the data set. I do not see at the moment
Dear Chris:
You are right and It IS too general. I think I should ask like what
kind of cluster algorithms or functions are available in R , which
might be easier. But for that, I probably can google or use help() in
R to find out. I want to know more about the performance of clustering
on this
Barbara Diaz wrote:
Hi,
I am using fanny and I have estrange results. I am wondering if
someone out there can help me understand why this happens.
First of all in most of my tries, it gives me a result in
which each
object has equal membership in all clusters. I have read that
Barbara Diaz wrote:
Hi,
I am using fanny and I have estrange results. I am wondering if
someone out there can help me understand why this happens.
First of all in most of my tries, it gives me a result in which each
object has equal membership in all clusters. I have read that that
means the
Hi!
Take a look at the packages mclust and flexmix!
They use the EM algorithm for mixture modelling, sometimes called model
based cluster analysis.
Best,
Christian
On Wed, 26 Jan 2005 [EMAIL PROTECTED] wrote:
Hi,
I am looking for a package to do the clustering analysis using the
On Jan 27, 2005, at 9:06 AM, Morten Mattingsdal wrote:
Hi
I have a problem using the package cluster on my binary data. I want
to try mona at first. But i get the an error.
hc-read.table(all.txt, header=TRUE, sep=\t, row.names=1)
srt(hc)
`data.frame': 51 obs. of 59 variables:
$ G1p : int 2 1
Sean Davis wrote:
On Jan 27, 2005, at 9:06 AM, Morten Mattingsdal wrote:
Hi
I have a problem using the package cluster on my binary data. I want
to try mona at first. But i get the an error.
hc-read.table(all.txt, header=TRUE, sep=\t, row.names=1)
srt(hc)
`data.frame': 51 obs. of 59
Morten,
just a try: is there a constant variable (only 1) in the first dataset?
Christian
On Thu, 27 Jan 2005, Morten Mattingsdal wrote:
Hi
I have a problem using the package cluster on my binary data. I want to
try mona at first. But i get the an error.
hc-read.table(all.txt,
Fernando Prass wrote:
Hi people,
Does anybody know some Density-Based Method for clustering implemented in R?
Have you looked at CRAN package mclust?
Thanks,
Fernando Prass
___
Yahoo! Acesso Grátis - Internet rápida e grátis. Instale o
Yes, but mclust don't have a density-based algorithm. Mclust have the algorithm
BIC, that is a model-based method...
Fernando Prass
--- Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] escreveu:
Fernando Prass wrote:
Hi people,
Does anybody know some Density-Based Method for clustering
maybe ?kmeans is what you're looking for ...
ingmar
On 10/21/04 2:47 PM, Fernando Prass [EMAIL PROTECTED] wrote:
Yes, but mclust don't have a density-based algorithm. Mclust have the
algorithm
BIC, that is a model-based method...
Fernando Prass
--- Kjetil Brinchmann Halvorsen [EMAIL
No, kmeans is a partition method. I need a model-based method, like DBSCAN or
DENCLUE algorithm...
Fernando Prass
--- Ingmar Visser [EMAIL PROTECTED] escreveu:
maybe ?kmeans is what you're looking for ...
ingmar
On 10/21/04 2:47 PM, Fernando Prass [EMAIL PROTECTED] wrote:
Yes, but
I'm no expert in this, but mclust is `density-based' because it estimates
the density with a mixture of Gaussians. If this is not what you want, you
should clarify what you mean by `density-based'. Do you mean an algorithm
based on kernel estimator of the density?
Andy
From: Fernando Prass
Dear Fernando,
below you find a DBSCAN function I wrote for my own purposes.
It comes with no warranty and without proper documentation, but I followed
the notation of the original KDD-96 DBSCAN paper.
For large data sets, it may be slow.
Best,
Christian
On Thu, 21 Oct 2004, Fernando Prass
AndyL == Liaw, Andy [EMAIL PROTECTED]
on Thu, 21 Oct 2004 09:18:54 -0400 writes:
AndyL I'm no expert in this, but mclust is `density-based'
AndyL because it estimates the density with a mixture of
AndyL Gaussians. If this is not what you want, you should
AndyL clarify what
From: Martin Maechler
AndyL == Liaw, Andy [EMAIL PROTECTED]
on Thu, 21 Oct 2004 09:18:54 -0400 writes:
AndyL I'm no expert in this, but mclust is `density-based'
AndyL because it estimates the density with a mixture of
AndyL Gaussians. If this is not what you want, you
Andy,
I can be wrong, I'm no expert too, but density estimation is different of
density-model. MClust is a model-basead method because use model statistics
from clustering data (more information in
ftp://ftp.u.washington.edu/public/mclust/tr415R.pdf).
I need some package that implement
Dear James,
sorry, this is not really an answer.
I use cutree to obtain clusters from an hclust object.
I do not get from the identify help page that identify should do anything
like what you expect it to do... I tried it out and to my surprise it
behaved as you said, i.e., it indeed does
ChrisH == Christian Hennig [EMAIL PROTECTED]
on Fri, 15 Oct 2004 11:43:53 +0200 (MEST) writes:
ChrisH Dear James,
ChrisH sorry, this is not really an answer.
nor this. I'm answering Christian...
ChrisH I use cutree to obtain clusters from an hclust
ChrisH object. I do
On Friday 15 Oct 2004 10:43 am, you wrote:
PS: It seems that each value is typed twice because classi is named, and
each value is also a name. Try as.vector(classi). (Perhaps a little useful
help in the end?)
Indeed. I have tried, for example:
as.vector(classi[[1]])
and
On Friday 15 Oct 2004 11:02 am, you wrote:
or unname(classi) -- which is slightly more expressive in this
case and possibly more desirable in other situations.
Martin Maechler, ETH Zurich
Thanks, Martin.
I've tried, like you suggested:
un_classi - unname(classi)
but
James == James Foadi [EMAIL PROTECTED]
on Fri, 15 Oct 2004 11:36:14 +0100 writes:
James On Friday 15 Oct 2004 11:02 am, you wrote:
or unname(classi) -- which is slightly more expressive in this
case and possibly more desirable in other situations.
Martin
Hi,
testing the randomness of a cluster analysis is not a well defined
problem, because it depends crucially on your null model. In fpc, there is
nothing like this. Function prabtest in package prabclus performs such a
test, but this is for a particular data structure, namely presence-absence
Martin Wegmann wrote:
Hello,
After reinstalling the whole OS and R as well, I tried to update.packages()
and get the follwing error message:
concerning the mgcv update: atlas2-base is installed and blas as well (on
debian). I haven't found lf77blas, I assume it's a library or something
You need to add atlas2-base-dev:
$ apt-get install atlas2-base-dev
I installed atlas2-base-dev and g77 but know I get the error messages pasted
below. Both (cluster and mgcv) requires lfrtbegin, but that does not seem to
be programm which I can install via apt-get.
Martin
* Installing
On Tue, Sep 30, 2003 at 02:04:23PM +0200, Martin Wegmann wrote:
You need to add atlas2-base-dev:
$ apt-get install atlas2-base-dev
I installed atlas2-base-dev and g77 but know I get the error messages pasted
below. Both (cluster and mgcv) requires lfrtbegin, but that does not seem to
On Tue, Sep 30, 2003 at 02:04:23PM +0200, Martin Wegmann wrote:
You need to add atlas2-base-dev:
$ apt-get install atlas2-base-dev
I installed atlas2-base-dev and g77 but know I get the error messages pasted
below. Both (cluster and mgcv) requires lfrtbegin, but that does not seem to
Hi,
it seems that you mix something up. hclust is for dissimilarity based
hierarchical cluster analysis, which has nothing to do with R squared,
Pseudo F
Informative output about the clustering is given as value of the hclust
object, function cutree may help to extract a concrete clustering
52 matches
Mail list logo