Re: [R] CLARA and determining the right number of clusters

pacomet Tue, 30 Sep 2008 07:13:14 -0700

Hi Christian and thanks

I've tried your suggestion and it seems promising. But I have a couple of
questions. I am reading a three column ASCII file (lon, lat, sst)


> mydata <- read.table("INFILE", header=FALSE,sep="",
na.strings="99.00",dec=".",strip.white=TRUE,col.names=c("lon","lat","sst"))

then I extract a subset of the data and try to get the right number of
clusters just for third var, sst

> x<-mydata$sst
> asw <- numeric(10)
> for (k in 4:10)
+  asw[k] <- clara(x, k) $ silinfo $ avg.width
>  k.best <- which.max(asw)
> cat("silhouette-optimal number of clusters:", k.best, "\n")
silhouette-optimal number of clusters: 5


I've changed the maximum number of clusters in your example from 20 just to
10 as I am expecting a number between 5 and 8 clusters would be right. Is
there any problem with this change? Maybe this restriction is too strict if
I just consider the data are just numbers but as it is sea surface
temperature under certain "environmental-meteorological conditions" in this
particular case I think there should not be more than 8-9 clusters (If 20 is
retained I get 11 clusters).

The second question is how should one understand the plot? Is the right
number the one with greater "average silhouette width"?

Thanks again


2008/9/30 Christian Hennig <[EMAIL PROTECTED]>

> Hi there,
>
> generally finding the right number of clusters is a difficult problem and
> depends heavily on the cluster concept needed for the particular
> application.
> No outcome of any automatic mathod should be taken for granted.
>
> Having said that, I guess that something like the example given in
>
>> ?pam.object
>>
> (replacing pam by clara) should work with clara, too.
>
> Regards,
> Christian
>
>
>
> On Tue, 30 Sep 2008, pacomet wrote:
>
>  Hi everyone
>>
>> I have a question about clustering. I've managed using CLARA to get a
>> clustering analysis of a large data set. But now I want to find which is
>> the
>> right number of clusters.
>>
>> The clara.object gives some information like the ratio between maximal and
>> minimal dissimilarity that says (maybe if lower than 1??) if a cluster is
>> well-separated from the other. I've also read something about silhouette
>> and
>> abut cluster.stats but can't manage to get how to find the right number of
>> clusters.
>>
>> I've tried a suggestion from the mailing list but when using dist
>>
>> d1<-dist(mydata$sst)
>>
>> it says that "specified vector size is too big"
>>
>> Is there any method to find the right number of clusters when using clara?
>> Maybe something I've tried but with a small and simple trick I can't find
>>
>> Thanks in advance
>>
>> --
>> _________________________
>> El ponent la mou, el llevant la plou
>> Usuari Linux registrat: 363952
>> -------
>> Fotos: http://picasaweb.google.es/pacomet
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> *** --- ***
> Christian Hennig
> University College London, Department of Statistical Science
> Gower St., London WC1E 6BT, phone +44 207 679 1698
> [EMAIL PROTECTED], 
> www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
>



-- 
_________________________
El ponent la mou, el llevant la plou
Usuari Linux registrat: 363952
-------
Fotos: http://picasaweb.google.es/pacomet

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] CLARA and determining the right number of clusters

Reply via email to