[R] Clustering with clara

2010-01-14 Thread pacomet
Hello everyone

I am trying to use CLARA method for finding clusters in my spatial surface
temperature data and noticed one problem. My data are in the form
lat,lon,temperature. I extract lat,lon and cluster number for each point in
the dataset. When I plotted a map of cluster numbers I found empty areas in
the map. The point is that the number of points that were assigned a cluster
number are less than the original temperature analyzed points.

Why are there less points in the clustering results? is there any option in
the CLARA method to retain every single point? is there another clustering
method that preserves all the points?

Thanks in advance

Paco

-- 
_
El ponent la mou, el llevant la plou
Usuari Linux registrat: 363952
---
Fotos: http://picasaweb.google.es/pacomet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] CLARA and determining the right number of clusters

2008-09-30 Thread pacomet
Hi everyone

I have a question about clustering. I've managed using CLARA to get a
clustering analysis of a large data set. But now I want to find which is the
right number of clusters.

The clara.object gives some information like the ratio between maximal and
minimal dissimilarity that says (maybe if lower than 1??) if a cluster is
well-separated from the other. I've also read something about silhouette and
abut cluster.stats but can't manage to get how to find the right number of
clusters.

I've tried a suggestion from the mailing list but when using dist

d1-dist(mydata$sst)

it says that specified vector size is too big

Is there any method to find the right number of clusters when using clara?
Maybe something I've tried but with a small and simple trick I can't find

Thanks in advance

-- 
_
El ponent la mou, el llevant la plou
Usuari Linux registrat: 363952
---
Fotos: http://picasaweb.google.es/pacomet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CLARA and determining the right number of clusters

2008-09-30 Thread pacomet
Hi Christian and thanks

I've tried your suggestion and it seems promising. But I have a couple of
questions. I am reading a three column ASCII file (lon, lat, sst)

 mydata - read.table(INFILE, header=FALSE,sep=,
na.strings=99.00,dec=.,strip.white=TRUE,col.names=c(lon,lat,sst))

then I extract a subset of the data and try to get the right number of
clusters just for third var, sst

 x-mydata$sst
 asw - numeric(10)
 for (k in 4:10)
+  asw[k] - clara(x, k) $ silinfo $ avg.width
  k.best - which.max(asw)
 cat(silhouette-optimal number of clusters:, k.best, \n)
silhouette-optimal number of clusters: 5


I've changed the maximum number of clusters in your example from 20 just to
10 as I am expecting a number between 5 and 8 clusters would be right. Is
there any problem with this change? Maybe this restriction is too strict if
I just consider the data are just numbers but as it is sea surface
temperature under certain environmental-meteorological conditions in this
particular case I think there should not be more than 8-9 clusters (If 20 is
retained I get 11 clusters).

The second question is how should one understand the plot? Is the right
number the one with greater average silhouette width?

Thanks again


2008/9/30 Christian Hennig [EMAIL PROTECTED]

 Hi there,

 generally finding the right number of clusters is a difficult problem and
 depends heavily on the cluster concept needed for the particular
 application.
 No outcome of any automatic mathod should be taken for granted.

 Having said that, I guess that something like the example given in

 ?pam.object

 (replacing pam by clara) should work with clara, too.

 Regards,
 Christian



 On Tue, 30 Sep 2008, pacomet wrote:

  Hi everyone

 I have a question about clustering. I've managed using CLARA to get a
 clustering analysis of a large data set. But now I want to find which is
 the
 right number of clusters.

 The clara.object gives some information like the ratio between maximal and
 minimal dissimilarity that says (maybe if lower than 1??) if a cluster is
 well-separated from the other. I've also read something about silhouette
 and
 abut cluster.stats but can't manage to get how to find the right number of
 clusters.

 I've tried a suggestion from the mailing list but when using dist

 d1-dist(mydata$sst)

 it says that specified vector size is too big

 Is there any method to find the right number of clusters when using clara?
 Maybe something I've tried but with a small and simple trick I can't find

 Thanks in advance

 --
 _
 El ponent la mou, el llevant la plou
 Usuari Linux registrat: 363952
 ---
 Fotos: http://picasaweb.google.es/pacomet

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 *** --- ***
 Christian Hennig
 University College London, Department of Statistical Science
 Gower St., London WC1E 6BT, phone +44 207 679 1698
 [EMAIL PROTECTED], 
 www.homepages.ucl.ac.uk/~ucakchehttp://www.homepages.ucl.ac.uk/%7Eucakche




-- 
_
El ponent la mou, el llevant la plou
Usuari Linux registrat: 363952
---
Fotos: http://picasaweb.google.es/pacomet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exporting data to a text file

2008-08-04 Thread pacomet
Hi John

I don't get an error message but a warning

 write.table(myclara$clustering,cluster.dat,append=TRUE)
Warning message:
In write.table(myclara$clustering, cluster.dat, append = TRUE) :
  appending column names to file


Here it is the output of str(myclara), it looks strange to me. I think
clustering are integers and data are real numbers

str(myclara)
List of 10
 $ sample: chr [1:56] 32356 33277 43230 52386 ...
 $ medoids   : num [1:8, 1:14]  7.888 12.019  5.427  0.725 17.688 ...
  ..- attr(*, dimnames)=List of 2
  .. ..$ : chr [1:8] 109056 98194 56959 109806 ...
  .. ..$ : chr [1:14] lon lat sst01 sst02 ...
 $ i.med : int [1:8] 20482 16158 5137 20722 48599 56033 68028 64308
 $ clustering: Named int [1:75459] 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, names)= chr [1:75459] 12296 12297 12298 12299 ...
 $ objective : num 3.22
 $ clusinfo  : num [1:8, 1:4] 15055  9474  5164 13702 11340 ...
  ..- attr(*, dimnames)=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:4] size max_diss av_diss isolation
 $ diss  :Classes 'dissimilarity', 'dist'  atomic [1:1540] 1.11 6.54
4.62 3.30 4.32 ...
  .. ..- attr(*, Size)= int 56
  .. ..- attr(*, Metric)= chr euclidean
  .. ..- attr(*, Labels)= chr [1:56] 32356 33277 43230 52386 ...
 $ call  : language clara(x = mydata, k = 8)
 $ silinfo   :List of 3
  ..$ widths : num [1:56, 1:3] 1 1 1 1 1 1 1 1 2 2 ...
  .. ..- attr(*, dimnames)=List of 2
  .. .. ..$ : chr [1:56] 96250 109056 130058 116317 ...
  .. .. ..$ : chr [1:3] cluster neighbor sil_width
  ..$ clus.avg.widths: num [1:8] 0.343 0.355 0.533 0.265 0.308 ...
  ..$ avg.width  : num 0.362
 $ data  : num [1:75459, 1:14] 8.68 8.72 8.77 8.81 8.86 ...
  ..- attr(*, dimnames)=List of 2
  .. ..$ : chr [1:75459] 12296 12297 12298 12299 ...
  .. ..$ : chr [1:14] lon lat sst01 sst02 ...
 - attr(*, class)= chr [1:2] clara partition



I can't output the two variables in two different files without any problem.

Thanks


2008/8/1 John Kane [EMAIL PROTECTED]

 try
 str(myclara)
 to see what you have - a data frame , matrix etc

 Are you getting any error messages?

 I tried your write.table commands and they work okay.


 --- On Fri, 8/1/08, pacomet [EMAIL PROTECTED] wrote:

  From: pacomet [EMAIL PROTECTED]
  Subject: [R] Exporting data to a text file
  To: r-help@r-project.org
  Received: Friday, August 1, 2008, 12:49 PM
  HI R users
 
  With clara function I get a data frame (maybe this is not
  the exact word,
  I'm new to R) with the following variables:
 
   names(myclara)
   [1] sample medoids
  i.med  clustering
  objective
   [6] clusinfo   diss
  call   silinfo
  data
 
  I want to export clustering and
  data to a new text file so I try
 
   write.table(myclara$data,cluster.dat)
  
  write.table(myclara$clustering,cluster.dat,append=TRUE)
 
  Variable data is properly exported but clustering is not
  appended to the
  output file.
 
  Please, where is the mistake? is it possible to export the
  two variables in
  just a sentence?
 
  thanks in advance
 
  Paco
 
  --
  _
  El ponent la mou, el llevant la plou
  Usuari Linux registrat: 363952
  ---
  Fotos: http://picasaweb.google.es/pacomet
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.


  __
 Connect with friends from any web browser - no download required. Try the
 new Yahoo! Canada Messenger for the Web BETA at
 http://ca.messenger.yahoo.com/webmessengerpromo.php




-- 
_
El ponent la mou, el llevant la plou
Usuari Linux registrat: 363952
---
Fotos: http://picasaweb.google.es/pacomet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Exporting data to a text file

2008-08-01 Thread pacomet
HI R users

With clara function I get a data frame (maybe this is not the exact word,
I'm new to R) with the following variables:

 names(myclara)
 [1] sample medoidsi.med  clustering objective
 [6] clusinfo   diss   call   silinfodata

I want to export clustering and data to a new text file so I try

 write.table(myclara$data,cluster.dat)
 write.table(myclara$clustering,cluster.dat,append=TRUE)

Variable data is properly exported but clustering is not appended to the
output file.

Please, where is the mistake? is it possible to export the two variables in
just a sentence?

thanks in advance

Paco

-- 
_
El ponent la mou, el llevant la plou
Usuari Linux registrat: 363952
---
Fotos: http://picasaweb.google.es/pacomet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About clustering techniques

2008-07-30 Thread pacomet
Hi Christian

I've been reading about daisy and think I need to do something like..

 mydaisydata - daisy(mydata,metric=c(euclidean),stand=FALSE)
Error en vector(double, length) :
  tamaƱo del vector especificado es muy grande(which means, specified
vector size is too big)


mydata is an anual file with 14 columns by 124716 rows. Is it possible that
daisy can't handle this data? maybe I'm missing something when using daisy.

Another question, if I get daisy running I can use kmeans like this?

mykmeansdata - kmeans(mydaisydata, 5)

or pamk that I've read it gives the optimal number of clusters.

Thanks again

-- 
_
El ponent la mou, el llevant la plou
Usuari Linux registrat: 363952
---
Fotos: http://picasaweb.google.es/pacomet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] About clustering techniques

2008-07-29 Thread pacomet
Hello R users

It's some time I am playing with a dataset to do some cluster analysis. The
data set consists of 14 columns being geographical coordinates and monthly
temperatures in annual files

latitutde - longitude - temperature 1 -. - temperature 12

I have some missing values in some cases, maybe there are 8 monthly valid
values at some points with four non valid. I don't want to supress the whole
row with 8 good/4 bad values as I wanna try annual and monthy analysis.

I first tried kmeans but found a problem with missing values. When trying
without omitting missing values kmeans gives an error and when excluding
invalid data too many values are excluded in some years of the data series.

Now I have been reading about pam, pamk and clara, I think they can handle
missing values. But can't find out the way to perform the analysis with
these functions. As I'm not an statistics nor an R expert the fpc or cluster
package documentation is not enough for me. If you know about a website or a
tutorial explaining the way to use that functions, with examples to check if
possible, please post them.

Any other help or suggestion is greatly appreciated.

Thanks in advance

Paco

-- 
_
El ponent la mou, el llevant la plou
Usuari Linux registrat: 363952
---
Fotos: http://picasaweb.google.es/pacomet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.